space-efficient top-N algorithm

David Garamond lists at zara.6.isreserved.com
Sun Feb 9 13:32:13 EST 2003


William Park wrote:
> What's wrong with something like
> 
>     awk '/Jan 1, 2003/,/Feb 1, 2003/ {print $1}' log \
> 	| sort | uniq -c | sort | head -50

The toolbox philosophy in action :-)

> where I am assuming that URL is the first field in your log and URL
> doesn't contain any spaces.

There's nothing wrong with something like that. It's just that I'll also 
need to parse the URL to get hostnames, do some reverse lookups/lookups 
to get both IP and hostnames, produce graphs, and do several other 
stuffs. I don't even want to think about how many pipe characters will 
be needed for these.

-- 
dave






More information about the Python-list mailing list