better way to do this in python

Mag Gam magawake at gmail.com
Sun Apr 3 08:06:32 EDT 2011


Thanks for the responses.


Basically, I have a large file with this format,

Date INFO username command srcipaddress filename


I would like to do statistics on:
total number of usernames and who they are
username and commands
username and filenames
unique source ip addresses
unique filenames

Then I would like to bucket findings with days (date).

Overall, I would like to build a log file analyzer.



On Sat, Apr 2, 2011 at 10:59 PM, Dan Stromberg <drsalists at gmail.com> wrote:
>
> On Sat, Apr 2, 2011 at 5:24 PM, Chris Angelico <rosuav at gmail.com> wrote:
>>
>> On Sun, Apr 3, 2011 at 9:58 AM, Mag Gam <magawake at gmail.com> wrote:
>> > I suppose I can do something like this.
>> > (pseudocode)
>> >
>> > d={}
>> > try:
>> >  d[key]+=1
>> > except KeyError:
>> >  d[key]=1
>> >
>> >
>> > I was wondering if there is a pythonic way of doing this? I plan on
>> > doing this many times for various files. Would the python collections
>> > class be sufficient?
>>
>> I think you want collections.Counter. From the docs: "Counter objects
>> have a dictionary interface except that they return a zero count for
>> missing items instead of raising a KeyError".
>>
>> ChrisA
>
> I realize you (Mag) asked for a Python solution, but since you mention
> awk... you can also do this with "sort < input | uniq -c" - one line of
> "code".  GNU sort doesn't use as nice an algorithm as a hashing-based
> solution (like you'd probably use with Python), but for a sort, GNU sort's
> quite good.
>
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>



More information about the Python-list mailing list