[Catalog-sig] User-agents / download hit

Tarek Ziadé ziade.tarek at gmail.com
Tue Aug 24 01:43:04 CEST 2010


2010/8/23 "Martin v. Löwis" <martin at v.loewis.de>:
>> Proposals: let's remove z3c.pypimirror and pep381client from the download stats.
>
> This isn't really implementable as formulated: for many of the files, I
> just don't know what user agent has downloaded them.

How come ? I though all calls were made through Apache via the same root.

[..]
> Also, what about other automatic downloaders, such as Googlebot, wget,
> or buildout?

I would count buildout and wget calls. For instance, I manually download
files using wget, so its a legitimate hit. But yes, the definition of
what should
be counted as a hit is quite fuzzy.

The only way to know what hits are from mirrors or bots without
relying on the UA
would be to detect a client that acts as a bot and discard its hits.

This can be done by grouping calls issued from the same IP, that are
scanning the whole index in a short time. But that's some work :)

> I plan to display each download counter broken down by UA, so that users
> could form their own opinion on how many downloads the file has really
> seen. Implementing this would take some time, though (as would
> implementing anything else, for that matter).

That would be the best/simplest option.

Regards
Tarek


More information about the Catalog-SIG mailing list