Flat DB seeking speed

Alex Martelli aleax at mac.com
Sun Apr 22 00:54:28 EDT 2007


Jia Lu <Roka100 at gmail.com> wrote:

> Hello all
> 
>  I see there are lots of flat db or db-like modules in the standard
> python modules.
>  What about the keywords seeking speed of them ?
> 
>  (I want to put about 10000 articles with 10000 IDs, and I can do
> searching keywords with them)
> 
>  The db-like modules are :
>  dbm, gdbm, dbhash,anydbm,
>  pickle(cPickle), shelve, marshal

Your question is somewhat hard to parse (I sympathize, since English
isn't my mother-tongue, either) -- "keywords seeking speed" is very hard
to understand in this context.

Marshal and pickle/cPickle aren't "db-like" at all -- they're just
sequences of serialized objects; marshal is low-level (you can only
serialize objects of some fundamental built-in types), pickle (and the
faster cPickle) higher-level (you can serialize objects of many
different types), but in either case there is no "seeking", "keywords"
or otherwise, just sequential reloading of the objects you serialized.
Handy when that's what you need -- persist a bunch of objects to a disk
file, rebuild them in memory later -- but it doesn't appear to have much
to do with anything in your question and I'm really perplexed as to why
you think it might (what materials did you use to study these subjects,
that gave you such a horrendously wrong impression?!).

All forms of dbm files (including the implementation known as gdbm, as
well as the lookalike one built on top of bsddb), as well as bsddb
(which you don't even mention), represent on disk a map from "key"
strings to "value" strings.  shelve is a modest extension to this
concept: it maps from "key" strings to arbitrary picklable objects
(using cPickle to map such objects to and from strings of bytes).
anydbm is a very thin layer on top of various dbm implementation (it
uses whichdbm to make an informed guess as to what kind of dbm module
was used to make a given dbm-like file, and also uses the best dbm
implementation you have installed to build a new file of some dbm type).

If by "keywords seeking" you mean performing full-text search, none of
these modules are at all suitable.  Perhaps you might want to try
pylucene, <http://pylucene.osafoundation.org/>, or other such
third-party modules, for the purpose -- it that IS indeed your purpose.


Alex



More information about the Python-list mailing list