Python object overhead?

Matt Garman matthew.garman at gmail.com
Mon Mar 26 11:10:03 EDT 2007


On 3/23/07, Bjoern Schliessmann
<usenet-mail-0306.20.chr0n0ss at spamgourmet.com> wrote:
> "one blank line" == "EOF"? That's strange. Intended?

In my case, I know my input data doesn't have any blank lines.
However, I'm glad you (and others) clarified the issue, because I
wasn't aware of the better methods for checking for EOF.

> > Example 2: read lines into objects:
> > # begin readobjects.py
> > import sys, time
> > class FileRecord:
> >     def __init__(self, line):
> >         self.line = line
>
> What's this class intended to do?

Store a line :)  I just wanted to post two runnable examples.  So the
above class's real intention is just to be a (contrived) example.

In the program I actually wrote, my class structure was a bit more
interesting.  After storing the input line, I'd then call split("|")
(to tokenize the line).  Each token would then be assigned to an
member variable.  Some of the member variables turned into ints or
floats as well.

My input data had three record types; all had a few common attributes.
 So I created a parent class and three child classes.

Also, many folks have suggested operating on only one line at a time
(i.e. not storing the whole data set).  Unfortunately, I'm constantly
"looking" forward and backward in the record set while I process the
data (i.e., to process any particular record, I sometimes need to know
the whole contents of the file).  (This is purchased proprietary
vendor data that needs to be converted into our own internal format.)

Finally, for what it's worth: the total run time memory requirements
of my program is roughly 20x the datafile size.  A 200MB file
literally requires 4GB of RAM to effectively process.  Note that, in
addition to the class structure I defined above, I also create two
caches of all the data (two dicts with different keys from the
collection of objects).  This is necessary to ensure the program runs
in a semi-reasonable amount of time.

Thanks to all for your input and suggestions.  I received many more
responses than I expected!

Matt



More information about the Python-list mailing list