Tremendous slowdown due to garbage collection

Aaron Watters aaron.watters at gmail.com
Tue Apr 15 12:35:27 EDT 2008


On Apr 14, 11:18 pm, Carl Banks <pavlovevide... at gmail.com> wrote:

> However, that is for the OP to decide.  The reason I don't like the
> sort of question I posed is it's presumptuous--maybe the OP already
> considered and rejected this, and has taken steps to ensure the in
> memory data structure won't be swapped--but a database solution should
> at least be considered here.

Yes, you are right, especially if the index structure will be needed
many times over a long period of time.  Even here though, these days,
you can go pretty far by loading everything into core (streaming from
disk) and dumping everything out when you are done, if needed
(ahem, using the preferred way to do this from python for
speed and safety: marshal ;) ).

Even with Btree's if you jump around in the tree the performance can
be
awful.  This is why Nucular, for example, is designed to stream
results sequentially from disk whenever possible.  The one place where
it doesn't do this very well (proximity searches) shows the most
problems with performance (under bad circumstances like searching
for two common words in proximity).
   -- Aaron Watters
===
http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=joys



More information about the Python-list mailing list