space-efficient top-N algorithm

William Park opengeometry at yahoo.ca
Wed Feb 12 00:45:04 EST 2003


David Garamond <lists at zara.6.isreserved.com> wrote:
> William Park wrote:
>> What's wrong with something like
>> 
>>     awk '/Jan 1, 2003/,/Feb 1, 2003/ {print $1}' log \
>>       | sort | uniq -c | sort | head -50
> 
> The toolbox philosophy in action :-)
> 
>> where I am assuming that URL is the first field in your log and URL
>> doesn't contain any spaces.
> 
> There's nothing wrong with something like that. It's just that I'll also 
> need to parse the URL to get hostnames, do some reverse lookups/lookups 
> to get both IP and hostnames, produce graphs, and do several other 
> stuffs. I don't even want to think about how many pipe characters will 
> be needed for these.

Then, spit out all three fields (hostname, IP, URL) in one go.

-- 
William Park, Open Geometry Consulting, <opengeometry at yahoo.ca>
Linux solution for data management and processing. 




More information about the Python-list mailing list