Apache log munging

Joe Python jopython at gmail.com
Wed Oct 8 15:51:11 EDT 2008


I am currently using the following technic to get the info above:

all = defaultdict(int)
hosts = defaultdict(int)
filename = defaultdict(int)

for r in log:
   all[r['host'],r['file']] += 1
   hosts[r['host']] += 1
   filename[r['file']] = 1


for host in sorted(hosts,key=hosts.get, reverse=True):
    for file in filename:
      print host, all[host,file]
    print hosts[host]
I was looking for a better option instead of building 'three' collections
to improve performance.

- Jo

On Wed, Oct 8, 2008 at 2:15 PM, Joe Riopel <goon12 at gmail.com> wrote:

> On Wed, Oct 8, 2008 at 1:55 PM, Joe Python <jopython at gmail.com> wrote:
> > I want to find the top '100' hosts (sorted in descending order of total
> > requests) like follows:
> > Is there a fast way to this without scanning the log file many times?
>
> As you encounter a new "host" add it to a dict (or another type of
> collection), and if encountered again, use that "host" as the key to
> retrieve the dict entry and increment it's request count. You should
> only have to read the file once.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20081008/ae6d26c4/attachment-0001.html>


More information about the Python-list mailing list