parsing a file for analysis

Rita rmorgan466 at gmail.com
Sat Feb 26 10:58:08 EST 2011


Thanks Andrea. I was thinking that too but I was wondering if there were any
other clever ways of doing this.
I also though, I can build a filesystem structure depending on the __time.
So, for January 01, 2011. I would create /tmp/data/20110101/data . This way
I can have a fast index of the data. And next time I read thru this file, I
can skip all of Jan 01, 2011




On Sat, Feb 26, 2011 at 10:29 AM, Andrea Crotti
<andrea.crotti.0 at gmail.com>wrote:

>
> Il giorno 26/feb/2011, alle ore 06.45, Rita ha scritto:
>
> > I have a large text (4GB) which I am parsing.
> >
> > I am reading the file to collect stats on certain items.
> >
> > My approach has been simple,
> >
> > for row in open(file):
> >   if "INFO" in row:
> >     line=row.split()
> >     user=line[0]
> >     host=line[1]
> >     __time=line[2]
> >     ...
> >
> > I was wondering if there is a framework or a better algorithm to read
> such as large file and collect it stats according to content. Also, are
> there any libraries, data structures or functions which can be helpful? I
> was told about 'collections' container.  Here are some stats I am trying to
> get:
> >
> > *Number of unique users
> > *Break down each user's visit according to time, t0 to t1
> > *what user came from what host.
> > *what time had the most users?
> >
> > (There are about 15 different things I want to query)
> >
> > I understand most of these are redundant but it would be nice to have a
> framework or even a object oriented way of doing this instead of loading it
> into a database.
> >
> >
> > Any thoughts or ideas?
>
> Not an expert, but maybe it might be good to push the data into a database,
> and then you can tweak the DBMS and write
> smart queries to get all the statistics you want from it.
>
> It might take a while (maybe with regexp splitting is faster) but it's done
> only once and then you work with DB tools.
>
>
>


-- 
--- Get your facts first, then you can distort them as you please.--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110226/8403cc2f/attachment-0001.html>


More information about the Python-list mailing list