memory usage

John Machin sjmachin at lexicon.net
Wed May 7 17:16:09 EDT 2003


Nagy Gabor <linux42 at freemail.c3.hu> wrote in message news:<mailman.1052227330.15773.python-list at python.org>...
> I wrote a simple datafile parser, and it is quite memory hungry, and I
> don't know if this is what I should expect, or is there a bug in my code.
[snip]
> def ParseFields():
>   fields = []
>   for ...:
>     Data = StringIO.read( length)

You probably have trailing spaces or leading zeroes here; if so, this
would be adding to your memory problem.

>     tmp = TD()
>     tmp.Tag = T(name = 'name')
>     tmp.Data = Data
>     fields.append(tmp)

So for each field, you create a TD instance. One of the attributes of
this TD instance is a *NEW* T instance. This is *TWO* *NEW* class
instances per field. I repeat, PER FIELD.

You don't say what, if anything, you are doing with the Flag, Class,
and Name attributes of the T-instance. Given that you have field.Data,
what is field.Tag.Value? Does *every* instance of a field need a
field.Tag.Name? You should exploit the (presumed) homogeneity of your
data by factoring out the data description to a higher level e.g. one
per column per table, instead of one per field. E.g. all the
descriptive info for the 23rd column/field in table/record-type "XYZ"
can be found in field_description["XYZ"][22] i.e. field_description is
a dictionary of lists. You would also build an inverted index to get
the field number (e.g. 22) from the field name (e.g.
"salary_of_data_modeller")

You should also consider combining the T and TD classes -- it is not
apparent from your code if the current separation achieves anything
positive; negatives include waste of memory and CPU, plus visual and
mental clutter.

Of course you could avoid the effort involved in a design strategy
rethink and get some big-enough tactical wins by (1) using __slots__
[mentioned by others, but they didn't remind you that this works only
with new-style classes; __slots__ is silently ignored in a classic
class] (2) ensuring trailing spaces etc. are stripped off your field
contents (3) ensuring that each field has a reference to an
appropriate existing T instance, instead of creating a new T instance
each time.

Looking at the other dimension of your MxN problem, why do you think
you need to keep each row/record in memory?

HTH,
John




More information about the Python-list mailing list