space-efficient top-N algorithm
William Park
opengeometry at yahoo.ca
Wed Feb 12 00:45:04 EST 2003
David Garamond <lists at zara.6.isreserved.com> wrote:
> William Park wrote:
>> What's wrong with something like
>>
>> awk '/Jan 1, 2003/,/Feb 1, 2003/ {print $1}' log \
>> | sort | uniq -c | sort | head -50
>
> The toolbox philosophy in action :-)
>
>> where I am assuming that URL is the first field in your log and URL
>> doesn't contain any spaces.
>
> There's nothing wrong with something like that. It's just that I'll also
> need to parse the URL to get hostnames, do some reverse lookups/lookups
> to get both IP and hostnames, produce graphs, and do several other
> stuffs. I don't even want to think about how many pipe characters will
> be needed for these.
Then, spit out all three fields (hostname, IP, URL) in one go.
--
William Park, Open Geometry Consulting, <opengeometry at yahoo.ca>
Linux solution for data management and processing.
More information about the Python-list
mailing list