[Spambayes] Persisting a pickled bayes database

Tim Peters tim.one@comcast.net
Sat Nov 9 18:35:43 2002


[Tim Stone]
> I can see the nice createbayes function in hammie, but I don't see any
> persistence function anywhere.  I do see several places where code
> to write a pickled bayes database is hard coded, and I understand the
> PersistentBayes thing.  I might be missing something...

Just experience with idiomatic Python persistence.  The persistence was all
in DBDict.__init__'s:

        self.hash = anydbm.open(dbname, 'c')

The tradition in Python is that "a persistent database" supplies an
interface much like a Python dict, but persists almost purely by magic.

For example, here's a brief Python session:

C:\Code\python\PCbuild>python
Python 2.3a0 (#29, Nov  8 2002, 10:51:55) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import anydbb
>>> d = anydbm.open('example.dat', 'n')
>>> d['an'] = 'example'
>>> # and quit Python at this point

Then in another session:

>>> import anydbm
>>> d = anydbm.open('example.dat')
>>> print d
<bsddb.bsddb object at 0x0064E158>
>>> print d.keys()
['an']
>>> print d['an']
example
>>>

Note that anydbm used bsddb as the underlying database mechanism on my box.
It may use some other database mechanism on some other box (it depends on
what it finds available).  I could have used bsddb directly instead, of
course, but then my code would require that bsddb be available.  anydbm uses
whatever it can scrounge up.

Subclassing the builtin dict type can give a similar "by magic" facility;
e.g., here's temp.py:

"""
import cPickle as pickle
import os

class PDict(dict):
    def __init__(self, fname):
        self.fname = fname
        if os.path.exists(fname):
            f = file(fname, 'rb')
            guts = pickle.load(f)
            f.close()
            self.update(guts)
        self.is_open = True

    def close(self):
        if self.is_open:
            f = file(self.fname, 'wb')
            pickle.dump(self, f, 1)
            f.close()
            self.is_open = False

    def __del__(self):
        self.close()
"""

That just adds a few methods to a regular dict, arranging to dump its value
to a pickle when .close() is called or when it becomes unreachable.  It's
intended that .close() be called explicitly, though (by-magic shutdown
semantics are never something to bet your life on).

Then in one Python session:

>>> from temp import PDict
>>> d = PDict('example.pck')
>>> d['another'] = 'example'

and in another:

>>> from temp import PDict
>>> d = PDict('example.pck')
>>> d
{'another': 'example'}
>>>

In your example helper class, you decided you don't necessarily want to
persist.  That may or may not be a useful ability, but "the usual" simple
Python database facilities don't give you a choice about that:  they commit
changes to disk *as* mutations occur.  In DB terms, they view each mutation
as a transaction.  The ZODB-based stuff Jeremy is doing is different that
way:  changes to a ZODB db have to be explicitly committed.  That's what the

    get_transaction().commit()

lines in the pspam directory are doing.  ZODB is much more of "a real
database" than these other gimmicks, by which I mean it has an explicit and
pretty rich transactional model and API.




More information about the Spambayes mailing list