How to fix the “indexed, though blocked by robots.txt” error in GSC

Rebecca
Rebecca
22 Min Read
indexed, though blocked by robots.txt

Google Search Console says “Indexed but blocked by robots.txt” ( GSC ) when Google has indexed URLs that it cannot crawl.

Most of the time it will be a simple crawl problem stuck in your robots.txt file. But some additional conditions can trigger the problem, so let’s look at the following troubleshooting process to diagnose and fix the problems as efficiently as possible:

You can see that the first step is to ask yourself if you want Google to index URLs.

indexed, though blocked by robots.txt

if you don’t want an indexed URL …

Indexed, though blocked by robots.txt: Just add the robot’s no index meta tag and make sure to allow indexing, assuming it’s canonical.

If you block a page from being crawled, Google may still crawl it, because downloading and indexing are two different things. Unless Google can crawl the page, you won’t see the noindex meta tag and can still crawl it because it contains links.

If canonical URL to another page, don’t add robots noindex meta tag. Just make sure the correct canonical tokens are available, including the canonical tag on the canonical page, and allow the indexing to pass through and consolidate properly.

Read more: NET:: ERR_CERT_DATE_INVALID error in Chrome – How to fix

if you want an indexed URL …

Indexed, though blocked by robots.txt: You need to find out why Google cannot index URL and remove the block.

The most likely cause is a crawl crash in the robots.txt file. But there are a couple of other scenarios where you can see messages saying that it is blocked. Let’s break them down in the order you should probably look for them.

  • Check the crawl blocking in the robots.txt file
  • Check for intermittent locks
  • Check the user’s client lock
  • Check out IP Unit

Check the crawl blocking in the robots.txt file

Indexed, though blocked by robots.txt: The easiest way to check the problem is to use the robots.txt tester in the program. GSC stands for blocking rule.

If you know what you are looking for or do not have access to it, GSC can go to domain.com/robots.txt to find the file. See our robots.txt article for more information, but you’re probably looking for a ban statement like:

Disallow: /
You can mention a specific user-agent or you can block everything. If your site is new or recently launched, you can search for:

User-agent: *
Disallow: /

Can’t find the problem?

Indexed, though blocked by robots.txt: Someone may have already fixed the robots.txt crash and fixed the problem before you address it. This is the best scenario. However, if the problem appears to be resolved but reoccurs shortly after, you may have an intermittent crash.

How to repair

Indexed, though blocked by robots.txt: You’ll want to remove the rejected statement causing the crash. How you do this varies depending on the technology you are using.

WordPress

If the problem is with your entire site, the most likely cause is that you have selected a setting in WordPress to avoid indexing. This error is common on new websites and after the site migration. Follow these steps to check:

  1. Click on “Settings”
  2. Click “Read”
  3. Make sure the “Search Engine Visibility” option is not selected.

Yoast WordPress

If you are using Yoast SEO you can directly edit the robots.txt file to remove the blocking statement.

  1. Click on ‘Yoast SEO ‘
  2. Click on “Tools”
  3. Click on “File Editor”

Rank Math WordPress

Like Yoast, Rank Math allows you to directly edit the robots.txt file.

  1. Click on “Classify Math”
  2. Click on “General Settings”
  3. Click on “Edit robots.txt.”

FTP or hosting

If you have FTP to access the site, you can directly edit the robots.txt file to remove the problematic reject statement. Your hosting provider may also provide a file manager that allows direct access to the robots.txt file.

Read more: How to fix the error sorry you are not allowed to access this page. wordpress

Check for intermittent locks

Indexed, though blocked by robots.txt: Intermittent problems can be more difficult to fix, as the conditions causing the hang may not always be present.

I recommend checking the history of your robots.txt file. For example, in the GSC tester of the robots.txt file, if you click the menu you will see the older versions of the file that you can click on and see what they contain.

The Wayback Machine at archive.org also has a history of robots.txt files for the sites it indexes. You can click on any date they have data for and see what file contained that particular day.

Or you can use the beta version of the change report, which allows you to easily see content changes between two different versions.

How to repair

Indexed, though blocked by robots.txt: The process to repair intermittent crashes will depend on the cause of the problem. For example, a possible cause could be a cache shared between the test and active environments. When the sandbox cache is active, the robots.txt file can contain a blocking directive. When the live environment cache is active, the site can be indexed. In this case, you would like to split the cache or maybe exclude the .txt files from the cache in the test environment.

Check client user locks

User client blocks occur when a site blocks a specific user client, such as Googlebot or AhrefsBot. In other words, the site detects the specific bot and blocks the corresponding user client.

If you can see the page fine in your regular browser, but it hangs after changing the user client, it means that the specific user-agent entered is blocked.

You can specify a specific user client with Chrome Devtools. Another option is to use a browser extension to change user agents like this.

Alternatively, you can check the user agent blocks with the cURL command. Here’s how to do it on Windows:

  1. Press Windows + R to open the “Run” window.
  2. Enter “cmd” and then click ” ok. “
  3. Enter the cURL command like this:
    curl -A “user-agent-name-here” -Lv [URL] curl -A “Mozilla / 5.0 (compatible; AhrefsBot / 7.0; + http: //ahrefs.com/robot/)” -Lv https: // ahrefs.com

Read more: Fix The uploaded file exceeds the upload_max_filesize directive in php.ini in WordPress

How to repair

Indexed, though blocked by robots.txt: Unfortunately, this is another case where knowing how to fix it will depend on where you find the crash. Many different systems can block the bot, including .htaccess, server settings, firewalls, CDN, or even something your hosting provider is not seeing. It may be best to contact your hosting provider or CDN and ask them where the crash is coming from and how you can fix it.

For example, here are two different ways to block a user’s client in .htaccess that you may need to look for.

RewriteEngine On
RewriteCond% {HTTP_USER_AGENT} Googlebot [NC] RewriteRule. * – [F, L]
OR…

BrowserMatchNoCase “Googlebot” bots
Order Allow, Deny
Allow from ALL
Deny from env = bots

Check out IP Blocks

Indexed, though blocked by robots.txt: If you have confirmed that the robots.txt file is not blocking it and you have excluded blocking of user clients, it is probably Unit IP.

How to repair

IP blocks are difficult issues to track down. As with user agent blocking, your best bet is to contact your hosting provider or CDN and ask them where the blocking is coming from and how you can fix it.

Here’s an example of something you can search for in .htaccess:

deny from 123.123.123.123

Read more: DNS_PROBE_FINISHED_BAD_CONFIG error – How to fix

In most cases, the warning “Indexed but blocked by robots.txt” is caused by a blocked robots.txt file. We hope this guide has helped you find and fix the problem if that was not the case for you.

Index Coverage: How to Fix Search Console Errors

Indexed, though blocked by robots.txt: Messages about Index Coverage issues are quite common, however, there is not much information on the subject on the internet. As a result, webmasters and website owners do not know how to solve this problem and prevent Google Search Console from displaying this website Coverage error .

Thus, the term “Index Coverage” is a type of alert about changes that have occurred in the indexing coverage of a website’s pages and possible errors on the page.

When Google cannot or cannot index a page for some reason, Search Console will generate an alert for you to fix it.

Google’s Index Coverage is very common, but there isn’t much information on the subject on the internet. Thus, this makes it difficult for webmasters and website owners to carry out their work, as they often do not know how to go about solving this problem and do not even know its main causes!

Usually, Google Search Console sends you a message by e-mail such as “A new Index Coverage issue has been detected on http://mysite.com.br”.

This error message does not indicate any problems with Index Coverage of a serious nature, so in most cases, the solution is quite simple. To help you, we created a Complete Guide with 7 ways to solve Index Coverage errors. Check it out and enjoy reading!

What is Index Coverage?

Indexed, though blocked by robots.txt: Coverage Index is the report covering the index Google Search Console which provides a detailed view of all pages of a site that Google has indexed or tried to crawl. The report also records any errors found by Googlebot while crawling a page.

Thus, this report allows you to identify problems related to Index Coverage that may prevent Google from indexing pages on your site, caused by different reasons.

In summary, to facilitate the identification of possible errors and problems, Google Search Console offers the Index Coverage report, which indicates all the causes for pages not opening.

Read more: ERR_CONNECTION_TIMED_OUT error in Google Chrome – how to fix it

Index Coverage error, what is it?

Indexed, though blocked by robots.txt: Index Coverage errors are common and occur when Google finds a URL and when trying to index it can’t due to an error, a problem on the site, or even a user action.

After a while, Google tends to remove this page from the search indexes, but before that, it still tries to index the page several times to confirm that there was no configuration error or problem with the site, such as a virus.

Thus, this report is just for you to correct errors that may cause your page to be incorrectly removed from search results.

The main errors of Index Coverage

Indexed, though blocked by robots.txt: The Index Coverage errors can be caused by several different reasons, so they must be properly evaluated so that can be found the best solution.

To make it easier, the Google Search Console error message gives you a clue as to what might be causing this problem. As you can see in the image below, both the email and the message displayed in Google Search Console show a message as to why the page cannot be indexed.

See below for the most common errors and how to resolve them.

1. Indexed though blocked by robots.txt

This is not an error, but a warning. Despite this, you need to take into account why you are saying that the page was indexed, even though it is blocked. In this case, the following message is likely appearing in the search result: “There is no information available for this page”.

If this happens, first check your site’s robots.txt file directives.

Use the test tool robots.txt of Google Search Console to test if the page is being blocked by the robots.txt file on your website.

Later, to resolve the issue related to Index Coverage, make sure you have not blocked your WordPress site from tracking in “Settings > Reading”. Make sure the option “Prevent search engines from indexing this site” is unchecked, otherwise your entire site will be blocked.

It is also necessary to analyze if you have any plugin that changes the robots.txt file. SEO plugins may have some configuration to change the robots.txt file.

2. URL sent blocked by Robots.txt

This message is the same case as the problem above. The difference here is that the page was not indexed by Google.

To solve this problem, just follow the same steps suggested in the previous error.

3. Server Error (5xx)

The message “Server Error (5xx)” occurs when the server returned some error status when Google tried to access the page.

The reasons for this can be different, but it is usually caused by a temporary failure of the server and after some time it returns to normal.

Even so, you need to make sure that there is nothing wrong with your hosting or website. So, check if this error did not happen for any of the reasons below:

  • Server failure – Make sure your website hosting server isn’t getting overloaded too often;
  • Script errors – Some scripts may be causing errors on the website;
  • Blocking by Firewall – Check with your website hosting the possibility that Googlebot is being blocked;

4. Submitted URL marked as “noindex”

The page has a “noindex” metatag that indicates to Google that it should not be crawled. In this case, the solution is to check the page settings if there is an option, such as “do not track”, active.

SEO plugins often have the option to disable tracking of a page or post. So, go to the post or page editing page and look for some “no-index ” option enabled, then just change it to “crawl” (index).

For those who use the SEO By Yoast plugin, follow the steps below:

  • Access the list of pages/posts in the WordPress admin panel
  • Click on “Edit” displayed right below the page/post that is showing this error in the Console
  • Scroll down to “Yoast SEO”
  • Click on “Advanced”
  • Under “Allow search engines to show Post in search results?” check the option “Yes (current default for Posts)”
  • Click the “Update” button to save the changes to the post/page

5. Submitted URL not found (404)

When the message “Submitted URL seems to be a Soft 404” appears, it means that the page was not found and therefore cannot be indexed by Google.

It is necessary to analyze why this page no longer exists. Maybe you have changed the address (URL) of the same, in this case just perform a 301 redirect of the page.

Top causes to return 404 error :

6. Submitted URL appears to be a Soft 404

This is the same case as the problem above. The difference is that the message “Sent URL not found (404)” returned the 404 error status.

In the example above, the server shows an “error” page, but the warning does not return the code 404, which is important for browsers and cache systems to be able to identify that this is an error page.

This is a misconfiguration as error pages should correctly return the error status. However, to solve the Index Coverage error, just follow the same steps suggested above.

The missing 404 status code could be a server or application configuration issue. This error does not interfere with Index Coverage, but it is important to keep this setting correct to prevent browsers and caching systems from storing an error page in place of the legitimate page.

7. Blocked due to another 4xx issue

This is a generic error that doesn’t fit into any of the above categories. The recommendation is that you review the above tips and see how they fit together to solve this problem.

When the Google Console Index Coverage report returns this error there are no clues as to the cause of the problem, so you need to do basic checks such as whether the page (URL) is online and fully functioning.

  • Access the Googlebot fetch tool
  • Under “Inspect URL” (top bar), enter the site URL shown in the Index Coverage report
  • Check if there is any error or difficulty for Google to access the URL in question

Read more: 4 ways to fix the error “Failed to load resource: net::err_blocked_by_client”

Always pay attention and avoid future problems!

Indexed, though blocked by robots.txt: The Index Coverage, a famous warning generated by the Webmaster, is Google’s way to warn about possible errors that must be resolved. These, in addition to harming your page’s indexing, also decrease optimization, making browsing slower, especially for mobile devices.

The Index Coverage report is essential at times like this as it indicates which pages are being indexed by Google. For this reason, you should be very cautious when you receive any warning or error messages from Index Coverage.

So, if the issue related to Index Coverage is not resolved, the tendency is for the page to be removed from Google’s indexes after some time, so be aware of the alerts and resolve the issues as soon as possible!

Remember that after Google removes a page from the search index, it may take some time for it to reappear in the results.

Check the status of your Search Console frequently to take preventive action and consequently prevent errors from lingering for a long time.

 

 

For website maintenance service contact us.

Share this Article
Posted by Rebecca
Follow:
Rebecca is an Independent content writer for breldigital, She writes content on any given topic. She loves to write a case study article or reviews on a brand, Be it any topic, she nails it
Leave a comment

Leave a Reply