Storing pairs of (int, int) in a database : which db to choose ?

Stormbringer andreif at mail.dntis.ro
Tue Dec 23 15:56:46 EST 2003


Paul Rubin <http://phr.cx@NOSPAM.invalid> wrote in message news:<7xad5jh8q2.fsf at ruckus.brouhaha.com>...
> John Hunter <jdhunter at ace.bsd.uchicago.edu> writes:
> >     Stormbringer> in my opinion (especially considering the range of
> >     Stormbringer> those integers - one is in the range 1..100000 and
> >     Stormbringer> the other in the range 1..500000).
> > 
> > What about using a binary file of unsigned ints which you load into a
> > python dictionary and do everything in memory?  There would be no
> > extra overhead in the file and it would be very fast, if you are able
> > to hold the 100,000 ints in memory.
> 
> No it's much worse than that.  The 100,000 ints are index numbers
> for individual words.  The 500,000 ints are articles and there can
> be thousands of words in each article.  So you may need to store
> billions of ints, not just 100,000 of them.

Yes, I agree. The messages aren't very large (under 5K each, although
I've seen one or two of 500K) but there are lots of words.

Turns out there is a much better solution than what I could have
written, and it's named Lupy.

Andrei




More information about the Python-list mailing list