[Spambayes] Optimization to DBDict (Was: read-only DBDict in hammie?)

Neale Pickett neale@woozle.org
Fri Nov 15 07:00:44 2002


I have sitting here on my hard drive some changes to DBDict which make
for much smaller databases by introducing an optimization for WordInfo
classes (getting rid of Administrative Pickle Bloat).  However, if I
submit this, everyone's hammie database will slowly be rewritten to the
new format, so I want to solicit feedback first.  Here are the two new
methods:

    def __getitem__(self, key):
        v = self.hash[key]
        if v[0] == 'W':
            val = pickle.loads(v[1:])
            # We could be sneaky, like pickle.Unpickler.load_inst,
            # but I think that's overly confusing.
            obj = classifier.WordInfo(0)
            obj.__setstate__(val)
            return obj
        else:
            return pickle.loads(v)

    def __setitem__(self, key, val):
        if isinstance(val, classifier.WordInfo):
            val = val.__getstate__()
            v = 'W' + pickle.dumps(val, 1)
        else:
            v = pickle.dumps(val, 1)
        self.hash[key] = v

Note that this makes the assumption that if a "W" pickle type is ever
added to Python's pickler, it won't be pickled in a DBDict.  Otherwise,
you're in for trouble.  If someone knows of a better way to do this,
please step forward before I submit it and hammie starts to rewrite
everyone's database.


So then, Tim Stone - Four Stones Expressions <tim@fourstonesExpressions.com> is all like:

> On a related note, should DBDict actually have it's own module, rather
> than be part of hammie?

You know what's funny is after I wrote DBDict I discovered python's
shelve module, which does the same thing.  I should probably rewrite
DBDict to wrap the shelve class, but shelve is so minimal, maybe shelve
should be rewritten to incorporate the DBDict class <0.2 wink> (gah, I'm
doing the wink thing now).




More information about the Spambayes mailing list