Dictionaries as records

Quinn Dunkan quinn at hurl.ugcs.caltech.edu
Tue Dec 18 23:35:54 EST 2001


On Tue, 18 Dec 2001 18:24:44 -0600, Skip Montanaro <skip at pobox.com> wrote:
>
>    Bill> I have a file with 200K records and 16 fields.  This file is
>    Bill> parsed and each row is put into a dictionary and the dictionary is
>    Bill> added to a list.  The raw file is only about 50mb.
>
>    Bill> I was shocked to see that my memory use jumped to 500MB!  When I
>    Bill> delete the list the memory is returned to the system, so I know
>    Bill> that the memory is being used in the dictionaries.
>
>    ...
>
>    Bill> Can someone who has faced this issue and found a workaround please
>    Bill> fill me in. I know one can use a list of lists or a list of
>    Bill> tuples, but had rather stick to the dictionaries because of some
>    Bill> library issues.
>
>You might want to consider storing it in an on-disk mapping object.  Check
>out the anydbm, bsddb, gdbm modules and the like.  Aside from not chewing up
>gobs of RAM, script startup should be faster as well, because you won't have
>to parse the file and initialize the dict.

You could use a list of tuples stored in a class which will wrap up an element
however you want:

class DB:
    records = [(foo, bar, ...), (baz, faz, ...), ...]
    def name(self, n):
        name, age, spin = self.index_by_name[n]
        return {'name': name, 'age': age, 'spin': spin}
        # or return Foo_object(name, age, spin)

I've used this for largish (but not large enough to bother with a 'real' DB)
DBs.

This won't work if your 'library issues' want to mutate the record in place.
Well, it would work, but could be clumsy:

ent = db.name('fred')
library.issue(ent)
db.update(ent)

... but maybe that's good because then you don't have to worry about
library.issue exploding half way through and leaving the record broken.  If
you want to be all cute you could have DB wrap records in a dict-like that
keeps a (weak, cuz of cycles) reference to the DB and whose __setitem__
mutates the DB.


You could then turn around and marshal the whole thing to disk, but if you can
control the format of the data input file, you might be better of with Skip's
suggestion.



More information about the Python-list mailing list