dict vs kjBuckets vs ???

Gordon McMillan gmcm at hypernet.com
Sat Jun 12 00:23:53 EDT 1999


Mark R. This:
> >> I plan to write a program that would store lots (in range of 10M or even
> >> more) of relatively small objects (a few hundred bytes at most), so what
> >> do you think I should use?
[Tim]
> >Let's do a little math <wink>:  10M * 100 = ?, a lower bound on what you're
> >contemplating.  Do you have gigabytes of RAM?
[Brian, er, Mark]
> I'm opening a boutique.
[Tim]
> >...Memory-based data structures aren't
> >going to work for the size of thing you have in mind.  If you can make it
> >fly it all, you'll likely require a powerful database, so of those choices
> >Metakit is the only approach that's not dead on arrival.
[Mark]
> A few additional informations: items stored would be natural
> language text fragments (several sentences at most, several words
> typically)
> + binary descriptions, primary operation would be lots of searching. 
> Is there anything else that would be better for this kind of
> program? Object database?

Searching 10M items is going to strain just about anything. I get 
wonderful response times out of MetaKit, but I've never tried 
anything approaching that size. The state of the art for searching 
big amounts of data are the big boys of SQL databases. But even 
there you'll have an index of 6 or 10 levels, and unless you have 
huge amounts of RAM, only a few of them will be in memory. So the 
disk will grind and grind for each search. I'd try MetaKit first, 
since it's a whole lot simpler and lighter wieght.

The best solution would be to exploit any interrelationships in your 
items. If you could reduce it to 100K items, you could burn a whole 
lot of CPU and still come out ahead.

and-good-luck-with-the-boutique-Brian-ly y'rs

- Gordon




More information about the Python-list mailing list