The importance of robots.txt
June 20, 2020
Since I've started this blog, it was not indexed by any of the big search engines. Ever. I was wondering why and even made a sitemap.xml, but no luck.
From time to time, some of the articles appeared in the search results, but it was always when somebody posted a link on social media, GitHub, or some website. Google Search Console was silent all this time, only displaying not explanatory messages that it cannot retrieve pages or that the page is crawled but cannot be indexed.
Well, that was lasting more than a year, and then I saw in the Lighthouse report that the
robots.txt is returning Error 500.
I've created the most opened to everybody
robots.txt available. Since there are no secret pages on this website and no admin interfaces (yet?) that should not be available to the crawlers, I can afford that.
sitemap.xml with Sapper
With the sitemap, it was not just creating a text file. I didn't want to manually redeploy the website, each time the sitemap is changed (usually, when a new article is available). And I was pretty sure that Sapper is able to generate not just HTML, but XML files too.
The solution was in one of the Rich Harris' Sapper issue replies, where he's telling to see how the RSS feeds for the HackerNews are working. Here's the file:
For my sitemap.xml it is almost the same logic:
A few words about it:
- it's invoking @polka/send to form the response;
- it's using the same Firestore request/transform methods as the blog itself.
As a result: No more Error 500 when accessing robots.txt. Google accepts the sitemap.xml and is displaying that will index the blog soon.