[Spambayes] Failure for two users

Tim Peters tim.one at comcast.net
Sat Jul 12 22:40:42 EDT 2003


[G. Armour Van Horn]
> ...
> I thought both pickles and dumb.dbm were deprecated?

Definitely not pickles.  "pickling" is a fundamental data serialization
operation in Python, and even if you use a modern Berkeley database backend
in spambayes, the data will still get stored in pickles.  That's what
shelve.py *does* -- transparently pickles and unpickles Python objects so
that they look like plain strings to databases.

dumbdbm is just stupid in this app -- dumbdbm consumes more memory than
using a plain dict without any database backend, is much slower than using a
plain dict, and is much brittler.  Using a plain dict is still an excellent
choice (despite the misguided whining of the databaseheads around here
<wink>).

> ...
> File "C:\Program Files\spambayes\spambayes\storage.py", line 159, in
>      load
>   t = self.db[self.statekey]

The database is trying to load the number of ham and spam msgs trained on
here.

> File "C:\Python22\lib\shelve.py", line 71, in __getitem__
>    return Unpickler(f).load()

Regardless of database backend, shelve.py will always create an Unpickler,
to change the string stored in the database into a tuple of integers here.

> EOFError

But-- alas --the pickle string stored in your database is corrupt for some
(unknown) reason.  EOFError is one of many exceptions an Unpickler can
raise, and means that the pickle string the database gave it appears to end
prematurely.

> Exception exceptions.AttributeError: "'NoneType' object has no attribute
> 'error'
> in <bound method _Database.__del__ of <dumbdbm._Database instance at
> 0x008D6AC0 ignored

This is a shutdown race in the dumbdbm implementation.  Hmm!  It *looks*
like the spambayes database driving code never closes its DBDictClassifier
object explictly, trusting __del__ methods to do any necessary cleanup.
Whether that's a bug is arguable, but is at best a poor design decision.  A
dumbdbm database absolutely needs to be closed properly.  The dumbdbm code
needs to be made more robust against the shutdown race you showed above too.




More information about the Spambayes mailing list