Skip to content
Home » Latest News » Why You Shouldn’t Use 403s or 404s to Restrict Googlebot’s Crawl Rate

Why You Shouldn’t Use 403s or 404s to Restrict Googlebot’s Crawl Rate

If you're using 403 or 404 errors to restrict Googlebot's crawling rate, it is essential that you read this entire blog post. This article will explain why using 404 or 403 errors to limit Googlebot's rate is not recommended. Additionally, it will provide tips on the best practices for managing Googlebot's crawl rate on your website. Without wasting time further, let's explore this topic in detail below. However, before moving forward, let us first understand, what 4xx errors are and how they can be utilized for rate-limiting Googlebot. What are 4xx client errors? 4xx errors, also known as client-side errors, occur when a server is unable to fulfill a client's request. This means that a page that once existed on a website may no longer be live and has not been redirected elsewhere. One of the most common 4xx errors is 404, also known as the 'Not Found' error. However, in addition to 404, there are many other 4xx errors that you may encounter. These errors are:- 400 Bad Request 401 Unauthorized 402 Payment Required 403 Forbidden 404 Not Found 405 Method Not Allowed 406 Not Acceptable 407 Proxy Authentication Required 408 Request Timeout 409 Conflict 410 Gone 411 Length Required 412 Precondition Failed 413 Payload Too Large 414 URI Too Long 415 Unsupported Media Type 416 Range Not Satisfiable 417 Expectation Failed 418 I'm a teapot 421 Misdirected Request 422 Unprocessable Entity 423 Locked 424 Failed Dependency 426 Upgrade Required 428 Precondition Required 429 Too Many Requests 431 Request Header Fields Too Large 451 Unavailable For Legal Reasons How 4xx errors are used to limit the crawl rate? When a website owner or content delivery network (CDN) notices that Googlebot is crawling their site too quickly and causing issues, they might consider using 403s or 404s to slow down the crawl rate. To do this, they might set up their web server to return a 403 or 404 error code in response to Googlebot's requests, which would cause Googlebot to slow down and reduce its crawling rate. In short, Googlebot reduces the crawl rate in return for a 403 or 404 error code received from the client. Why 4xx is not good for rate-limiting Googlebot? Except for 429, all 4xx errors indicate that the client's request was wrong in some sense. However, it does not indicate that there is something wrong going on with the server. In simple words, 4xx errors are for client-side errors not for errors related to the server. If you use 403 or 404 codes to limit Googlebot's crawl rate, you risk having your web content removed from Google search or having Googlebot crawl content you don't want to be indexed. This situation may get more worsen if you serve your robots.txt file with a 4xx error. In this case, Googlebot will ignore your robot.txt file and crawl accordingly. Rate-limiting Googlebot prevents Google from crawling your page and updates its index accordingly. This may result in updating the content lately. For example, if you're running an online store and use 4xx errors to limit Googlebot's crawl rate, it may not be able to update its index with the latest content in a timely manner. This can result in outdated product information, such as incorrect pricing and descriptions, being displayed in search results. Best way to rate limit Googlebot The most effective way to manage Googlebot's crawl rate is by using the Google Search Console. Within the Google Webmaster tools, you can limit the rate at which Googlebot crawls your website. Follow the steps below to reduce the crawl rate: Log in to your Google Search Console account and select the website you want to manage. Click on 'Crawl stats under the 'Settings' menu. Select the 'Limit Google's maximum crawl rate' option. Choose your preferred crawl rate using the slider. To save the changes, click the "Save" button. By following these steps, you can effectively manage Googlebot's crawl rate and ensure that your website performs optimally. By default, only the Crawl Stats page is visible to all webmasters. However, the crawl rate setting page is visible to a few webmasters. You will find this stats page under the Setting page from the left navigational menu of the Google Search Console account. It will look something like this below. Crawl Stats in Google Webmaster Account If you want to access the Crawl Rate Settings for your website, you can submit a special request to reduce the crawl rate. However, it's important to note that you cannot request an increase in the crawl rate. Over to you If Googlebot is consuming your bandwidth and you're planning to limit its crawl rate. Please do not use 4xx error types to limit it. Instead, use specific HTTP status codes like 500, 503, or 429 to indicate to Googlebot that it needs to slow down its crawl rate. For a beginner, limiting the crawl rate using Google webmaster is the best option. It is simple to use and no programming or technical knowledge requires to use it. At last, if you have any questions or need help in implementing these methods, feel free to contact me through the contact page or else leave your message below in the comment box. News Source: Don't use 403s or 404s error to rate limiting Googlebot

If you’re using 403 or 404 errors to restrict Googlebot’s crawling rate, it is essential that you read this entire blog post.

This article will explain why using 404 or 403 errors to limit Googlebot’s rate is not recommended. Additionally, it will provide tips on the best practices for managing Googlebot’s crawl rate on your website.

Without wasting time further, let’s explore this topic in detail below. However, before moving forward, let us first understand, what 4xx errors are and how they can be utilized for rate-limiting Googlebot.

What are 4xx client errors?

4xx errors, also known as client-side errors, occur when a server is unable to fulfill a client’s request.

This means that a page that once existed on a website may no longer be live and has not been redirected elsewhere.

One of the most common 4xx errors is 404, also known as the ‘Not Found‘ error. However, in addition to 404, there are many other 4xx errors that you may encounter.

These errors are:-

  • 400 Bad Request
  • 401 Unauthorized
  • 402 Payment Required
  • 403 Forbidden
  • 404 Not Found
  • 405 Method Not Allowed
  • 406 Not Acceptable
  • 407 Proxy Authentication Required
  • 408 Request Timeout
  • 409 Conflict
  • 410 Gone
  • 411 Length Required
  • 412 Precondition Failed
  • 413 Payload Too Large
  • 414 URI Too Long
  • 415 Unsupported Media Type
  • 416 Range Not Satisfiable
  • 417 Expectation Failed
  • 418 I’m a teapot
  • 421 Misdirected Request
  • 422 Unprocessable Entity
  • 423 Locked
  • 424 Failed Dependency
  • 426 Upgrade Required
  • 428 Precondition Required
  • 429 Too Many Requests
  • 431 Request Header Fields Too Large
  • 451 Unavailable For Legal Reasons

How 4xx errors are used to limit the crawl rate?

When a website owner or content delivery network (CDN) notices that Googlebot is crawling their site too quickly and causing issues, they might consider using 403s or 404s to slow down the crawl rate.

To do this, they might set up their web server to return a 403 or 404 error code in response to Googlebot’s requests, which would cause Googlebot to slow down and reduce its crawling rate.

In short, Googlebot reduces the crawl rate in return for a 403 or 404 error code received from the client.

Why 4xx is not good for rate-limiting Googlebot?

Except for 429, all 4xx errors indicate that the client’s request was wrong in some sense.

However, it does not indicate that there is something wrong going on with the server.

In simple words, 4xx errors are for client-side errors not for errors related to the server.

If you use 403 or 404 codes to limit Googlebot’s crawl rate, you risk having your web content removed from Google search or having Googlebot crawl content you don’t want to be indexed.

This situation may get more worsen if you serve your robots.txt file with a 4xx error. In this case, Googlebot will ignore your robot.txt file and crawl accordingly.

Rate-limiting Googlebot prevents Google from crawling your page and updates its index accordingly. This may result in updating the content lately.

For example, if you’re running an online store and use 4xx errors to limit Googlebot’s crawl rate, it may not be able to update its index with the latest content in a timely manner.

This can result in outdated product information, such as incorrect pricing and descriptions, being displayed in search results.

Best way to rate limit Googlebot

The most effective way to manage Googlebot’s crawl rate is by using the Google Search Console. Within the Google Webmaster tools, you can limit the rate at which Googlebot crawls your website.

Follow the steps below to reduce the crawl rate:

  1. Log in to your Google Search Console account and select the website you want to manage.
  2. Click on ‘Crawl stats under the ‘Settings’ menu.
  3. Select the ‘Limit Google’s maximum crawl rate‘ option.
  4. Choose your preferred crawl rate using the slider.
  5. To save the changes, click the “Save” button.

By following these steps, you can effectively manage Googlebot’s crawl rate and ensure that your website performs optimally.

By default, only the Crawl Stats page is visible to all webmasters. However, the crawl rate setting page is visible to a few webmasters.

You will find this stats page under the Setting page from the left navigational menu of the Google Search Console account. It will look something like this below.

Crawl Stats in Google Webmaster Account

If you want to access the Crawl Rate Settings for your website, you can submit a special request to reduce the crawl rate. However, it’s important to note that you cannot request an increase in the crawl rate.

FAQs on Rate Limiting Googlebot

Q. What are 4xx client errors?

A. 4xx error indicates that the page is no more live or else redirected elsewhere. In simple words, it indicates the client-side errors in the given URL.

Q. What is limiting crawling?

A. Limiting Crawling means asking Search Engine crawl not to crawl webpages to avoid overloading of the server.

Q. How are 4xx errors used to limit the crawl rate?

A. 4xx errors in response to Googlebot requests will slow down its crawl rate.

Q. Why 4xx is not good for rate-limiting Googlebot?

A. This is because using 4xx may results in Google ignoring your setting and may index the contentt that you don’t want to be or vice versa.

Q. What is the best way to rate limit Googlebot?

A. The best way to rate limiting Googlebot is using Google Webmaster Accountntnt. Logged in to your webmaster account and limit the crawl rates under the ‘Setting’ menu from the left navigational menu.

Q. What is the limit of Googlebot crawler?

A. Googlebot can crawl the first 15MB of an HTML file or supported text-based file. For more information, check this link.

Over to you

If Googlebot is consuming your bandwidth and you’re planning to limit its crawl rate. Please do not use 4xx error types to limit it.

Instead, use specific HTTP status codes like 500, 503, or 429 to indicate to Googlebot that it needs to slow down its crawl rate.

For a beginner, limiting the crawl rate using Google webmaster is the best option. It is simple to use and no programming or technical knowledge requires to use it.

At last, if you have any questions or need help in implementing these methods, feel free to contact me through the contact page or else leave your message below in the comment box.

News Source: Don’t use 403s or 404s errors to rate limiting Googlebot

Leave a Reply

Your email address will not be published. Required fields are marked *

Pinterest