3rd Party domain Robots.txt blocking JavaScript, CSS - Limited indexing and Google Search Console warnings

Submitted by dryer on Wed, 07/29/2015 - 19:11

Google has sent warnings about about subpar results for search indexing if your robots.txt blocks CSS or JS. GoogleBot is very capable at excecuting JavaScript nowadays, so the reason for this is clear.

If you've got dynamic content generated by JavaScript (which is blocked) then GoogleBot will respect the robots.txt block and thus will not be able to display content:

  1. GoogleBot can't see your content
  2. Content is not indexed
  3. People cannot find your content because it's not indexed.

Simple to fix, right? Just remove protection from you JS and CSS files. Maybe not so simple. Popular dynamic site generating tools like WordPress and Drupal have their own directory structure and style concatenation, limiting to allow indexing per file type can be tricky as the Robots.txt standard has limited support for wildcards, etc.

In addition to your own server, your site is probably loading resources from third party servers as well. These are out of your control, and can block resource loading on their own. A practical example:

  1. Your site uses the popular Yandex Maps API to provide driving directions to your store
  2. You load the map JavaScript widget from Yandex servers (as you need to do)
  3. The Yandex server blocks GoogleBot from fetching the Yandex code: https://api-maps.yandex.ru/robots.txt
  4. Driving directions are not indexed leading to a content poor indexing compared to a JavaScript enriched version