optimizing memory utilization

Wes Crucius wcrucius at sandc.com
Tue Sep 14 17:55:56 EDT 2004


"Frithiof Andreas Jensen" <frithiof.jensen at die_spammer_die.ericsson.com> wrote in message news:<ci6f73$921$1 at newstree.wise.edt.ericsson.se>...
> "Anon" <anon at ymous.com> wrote in message
> news:pan.2004.09.14.04.38.02.647096 at ymous.com...
> 
> > Any data structures suggestions for this application?  BTW, the later
> > accesses to this database would not benefit in any way from being
> > presorted,
> 
> You keep saying it yourself: Use a Database ;-).
> 
> Databases "knows" about stuff in memory, as well as any searching and
> sorting you might dream up later.
> 
> Python supports several, the simplest is probably PySQLite:
> http://sourceforge.net/projects/pysqlite/

Thanks for the specific suggestion, but it's an approach I'd hoped to
avoid.  I'd rather get 2Gig of RAM if I was confident I could make it
work that way...  The problem with a file-based database (versus an
in-memory data-structure-based database as I was meaning) is
performance.

Maybe someone has a better suggestion if I give little more info:  I
want to iterate through about 10,000 strings (each ~256 characters or
less) looking for occurances (within those 10,000 strings) of any one
of about 500,000 smaller (~30-50 characters average) strings.  Then I
generate an XML doc that records which of the 500,000 strings were
found within each of the 10,000 strings.

I guess I am hoping to optimize memory usage in order to avoid
thrashing my drives to death (due to using a file based database).  I
can't believe that python requires memory 200% "overhead" to store my
data as lists of lists.  I gotta believe I've chosen the wrong
data-type or have mis-applied it somehow??

BTW, my illustration wasn't my input data, it was literally the output
of "print Albums" (with only a couple albums worth of dummy data
loaded).  Albums is a list containing a list for each album, which in
turn contains another list for each track on the album, which in-turn
contains a couple of items of data about the given track.

Bottom line question here:  Is a 'list' the most efficient data type
to use for storing "arrays" of "arrays" of "arrays" and strings",
given that I'm happy to iterate and don't need any sort of lookup or
search functionality??


Thanks again,
Wes



More information about the Python-list mailing list