cPickle on speed and size

Christian Tismer tismer at appliedbiometrics.com
Fri Oct 29 14:05:03 EDT 1999


Thomas Weholt wrote:
> 
> Hi,
> 
> Just wondering :
> 
> I want to store objects in a simple key/value-based database. These
> objects are pickled, maybe compressed using zlib and then stored. If I
> use cPickle instead of pickle, would the speed and size of the database
> be ok when the amount of data got bigger? Is cPickle aimed at small
> "databases"? I`m not talking complex objects, and waiting 2-3 seconds
> for a search is just fine, but I need to store millions ( actually
> objects containing info on files on my cd-roms) of objects.
> 
> Should I go for Berkley DB or any of the other available dbms, or is
> cPickle sufficient for personal use ??

It depends on how you use cPickle.
I think cPickle can be said to be "very fast" on loading/storing
stuff. But if you intend to always loade several megabytes into
main memory and look into a huge dictionary, then you will have
created a slow solution, and some db like thing would be better.

You could of course use the persitent module from Bobo or Zope
which is fast, and is based upon cPickle.

If you want to write your won and if you can make sure that
your master index is not huge, then you can use this approach:

Create an index file which is a pickled dictionary of keys
and file positions/lengths.
Create a data file which is a sequence of pickled data snippets.
For every entry, you append some pickled data to the data file,
this way:

dfile.seek(0, 2) # find the end
pos = self.dfile.tell() # here we are

data = cPickle.dumps(yourthingtoputintothedatabase)

dfile.write(data)
keys["whatisyourkey"] = (pos, len(data))

# and later on, don't forget to save your dict as a pickle.

BTW if you do not intend to save structured data like classes,
it might be even faster to save data using marshal.

Another, even better approach: Use metaKit which has a nice
wrapper module MkWrap. You can simply stuff keys and values
into a database and get it back if you just use native objects
like strings and numbers.
If you want to store structured data, cPickle it and put that
into the MK database. This is what we do for a customer at
the moment, in a real application.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101   :    *Starship* http://starship.python.net
10553 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home




More information about the Python-list mailing list