[Python-Dev] Googlebot and the mail.python.org python-dev archive

Nick Coghlan ncoghlan at gmail.com
Sat Feb 28 12:53:10 CET 2009


Antoine Pitrou wrote:
> Georg Brandl <g.brandl <at> gmx.net> writes:
>> Guido van Rossum schrieb:
>>> I think the better syntax would be to add site:mail.python.org to the
>>> query, but you're right, that doesn't seem to find recent messages.
>>> Maybe the absence of a robots.txt file on mail.python.org could be a
>>> partial explanation?
>> Doesn't the absence of a robots.txt mean "you may index everything"?
> 
> It does.
> However, pages such as:
>     http://mail.python.org/pipermail/python-dev/
> (and, it seems, all other pipermail-generated archive pages)
> have the following HTML tag in them:
>     <META NAME="robots" CONTENT="noindex,follow">
> which explicitly instructs Web spiders *not* to index contents nor follow links.

That's not quite true - that meta tag says not to index the current
page, but *do* follow the links to other pages. The archive page and the
monthly summary pages say the same two things.

Once you get down to the individual post level, then it switches around
- the meta tags on those pages say to index the page and NOT to follow
links.

Those settings actually makes a certain amount of sense - it should
encourage the actual messages to turn up in search results rather than
the index pages pointing to those messages.

The top-level list of mailing lists and the description pages for each
list don't have the meta tag at all, so they should all be both indexed
and the links followed.

However, I checked on Wayback and it hasn't archived anything from
mail.python.org since late 2007, suggesting there may be something about
the current setup that well behaved web crawlers don't like.

Is pydotorg-www still the place for website questions?* If so, I should
probably take this over there...

Cheers,
Nick.

* I ask because that list doesn't appear to have seen any traffic since
May last year...

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------


More information about the Python-Dev mailing list