[Mailman-Users] archive 404 strangness

Mark Sapiro mark at msapiro.net
Wed Apr 2 18:24:26 CEST 2008


Jim Popovitch wrote:
>
>Lately I've noticed lots of 404 log errors for archive pages where the
>first letter of the Month is not capatilized (i.e. 2007-september,
>instead of the correct 2007-September).  These first started in
>appearing in the logs around December 2007, but continue today.  The
>IPs are all over the place, contain no referrer, and the user agent
>varies but does include some search engines like Y!Slurp.   I've
>checked every file in the archive for references to the lower case
>month, but didn't find any.
>
>In the case of the Y!Slurp hits, the IPs don't show as whois'ed by
>Yahoo!, but rather OC3 Networks.
>
>So... is this an archive harvesting miscreant?


I don't know if it's a miscreant or not, but I sometimes see this with
private archives in the form of "Private archive file not found:"
messages in the error log.

When I first saw this, I was concerned, and, as you, I looked for bad
links in the archives, looked in apache logs for IPs, looked at other
web activity for those IPs, and looked for those IPs in the
listname.mbox file.

Note that in the case of a private archive, the user has to be logged
in to generate that error log message.

I ultimately satisfied myself that these were cases of people manually
typing URLs of archive posts or manually altering a URL to go to a
different month's index and mistyping the case.

One thing you could try is googling the specific URL with something like

allinanchor:".../2006-september/..."

to see if there are any hits with the 'bad' URLs.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Users mailing list