[Mailman-Developers] Indexing pipermail archives
Nigel Metheringham
Nigel.Metheringham@VData.co.uk
Thu, 22 Jun 2000 13:51:20 +0100
I've just been looking at the problem of indexing/searching a large
Mailman/pipermail archive.
htdig will do the job, but the (htdig) indexes get bloated by the
pipermail index pages (which are *very* rarely what you want when
searching for something).
Indexing can be controlled by meta tags. [See http://info.webcrawler.co
m/mak/projects/robots/meta-user.html ]. The pipermail HTML generation
is hard coded into the archiver code.
As a short term fix, would people be happy with me adding the following
meta tags to the pipermail HTML generation:-
On top level (ie list of weeks/months etc) and by-date index pages:-
<meta name="robots" content="noindex,follow">
[ie do not index the page, follow links down to the articles]
On thread/subject/author indexes
<meta name="robots" content="noindex,nofollow">
[skip page and linked pages - nofollow is superluous since the
indexing robot should realise that the pages are already included
but it doesn't hurt much]
On article pages
<meta name="robots" content="index,nofollow">
[you may disagree with the nofollow, but I think there is no general
requirement for the indexer to follow links off the list]
Its a hack for now, but will make htdig and other indexing robots
behave better.
Comments?
Nigel.
--
[ - Opinions expressed are personal and may not be shared by VData - ]
[ Nigel Metheringham Nigel.Metheringham@VData.co.uk ]
[ Phone: +44 1423 850000 Fax +44 1423 858866 ]