pickle->zlib->shelve

eric jones ej at ee.duke.edu
Thu Oct 21 14:19:04 EDT 1999


Hey Thomas,

I have run into similar issues storing large arrays of numerical data using
"shelve".  The files were in the 10 MB range, and reading them took several
seconds.  Using binary pickles instead of string pickles cut the files size
by 3 and the read time by 5-10.

The module below is binary pickle replacement for shelve.  Really, the only
change to the class is the __setitem__() method which uses a binary pickle.
The __getitem__() method will automatically recognize whether the pickle is
binary or ascii, so you don't need to fool with it.  This is very nice
because the files are read compatible with the standard shelve module.

Also, note the use dumbdbm instead of anydbm in __init__().  This isn't
necessary, it was just more convenient since I use a lot of different
machines, and never know what modules are installed.

eric

from shelve import Shelf
try:
 import cPickle
 pickle = cPickle
except ImportError:
 import pickle

class DbfilenameShelf(Shelf):
 """ Regular shelf that always uses dumbdbm for compatability purposes
 """
 def __init__(self, filename, flag='c'):
  import dumbdbm
  Shelf.__init__(self, dumbdbm.open(filename, flag))

class fast_DbfilenameShelf(DbfilenameShelf):
 """ Pickles object before storing them
 """
 def __setitem__(self, key, value):
  self.dict[key] = pickle.dumps(value,1)  ########### this is the magic line
########

def open(filename, flag='c'):
 """Open a persistent dictionary for reading and writing.

 Argument is the filename for the dbm database.
 See the module's __doc__ string for an overview of the interface.
 """
 return fast_DbfilenameShelf(filename, flag)

def slow_open(filename, flag='c'):
 return DbfilenameShelf(filename, flag)

Thomas Weholt <thomas at bibsyst.no> wrote in message
news:380D994F.E6E775EA at bibsyst.no...
> Hi,
>
> I got an idea ( and I`m a newbie, so bare with me ) :
>
> I got an database in which I want to store some customized objects.
> These objects are in my opinion not very big, but they seem to take up a
> lot of space anyhow, when pushed into a database. On solution I`ve used
> in a similar situation using a different programming language, was to
> use plain strings, and use some sort of delimiter between each
> attribute. That will of course not include any methods defined in any
> object. Then the string is stored in a database in the "d[id] = str"
> fashion.
>
> These strings can also be be some length, and often lots of repetitive
> info. Then it struck me that I could use zlib to compress the string
> before inserting it into the database, and decompress it when it was
> taken out. The number of items to be processed couldn`t be huge due to
> the decompression time of course, but with few items this could possibly
> work, and result in a smaller database.
>
> I was thinking maybe I could use pickle and compress the pickled object
> too, before storing it into the database and save some space. Pickled
> objects also seem to have lots of repetitive data.
>
> Now my question is :
>
> 1. has this been tested and found ineffective in relation to a) speed of
> decompression or b) size of objects not big enough to gain any
> significant size advantage by compression
> 2. if not tested, why? Maybe a subclass of pickle could implement
> compression and decompression of objects?
>
> Since these scripts are to be run on 400Mhz computers and the number of
> items will not exceed 20-30, objects being relativly small, how will the
> speed be? Any clues??
>
> Awaiting flames and harsh words of discouragement,
> Thomas Weholt
>






More information about the Python-list mailing list