space-efficient top-N algorithm
David Garamond
lists at zara.6.isreserved.com
Sun Feb 9 13:32:13 EST 2003
William Park wrote:
> What's wrong with something like
>
> awk '/Jan 1, 2003/,/Feb 1, 2003/ {print $1}' log \
> | sort | uniq -c | sort | head -50
The toolbox philosophy in action :-)
> where I am assuming that URL is the first field in your log and URL
> doesn't contain any spaces.
There's nothing wrong with something like that. It's just that I'll also
need to parse the URL to get hostnames, do some reverse lookups/lookups
to get both IP and hostnames, produce graphs, and do several other
stuffs. I don't even want to think about how many pipe characters will
be needed for these.
--
dave
More information about the Python-list
mailing list