space-efficient top-N algorithm

Peter Hansen peter at engcorp.com
Sun Feb 9 18:24:46 EST 2003


David Garamond wrote:
> 
> William Park wrote:
> > What's wrong with something like
> >
> >     awk '/Jan 1, 2003/,/Feb 1, 2003/ {print $1}' log \
> >       | sort | uniq -c | sort | head -50
> 
> The toolbox philosophy in action :-)
> 
> > where I am assuming that URL is the first field in your log and URL
> > doesn't contain any spaces.
> 
> There's nothing wrong with something like that. It's just that I'll also
> need to parse the URL to get hostnames, do some reverse lookups/lookups
> to get both IP and hostnames, produce graphs, and do several other
> stuffs. I don't even want to think about how many pipe characters will
> be needed for these.

No more than one more, if you *then* pipe the output into a nice
little Python script that does the "easy" stuff...  

-Peter




More information about the Python-list mailing list