[spambayes-dev] Re: [Spambayes] fatal error?

Skip Montanaro skip at pobox.com
Wed Aug 27 10:45:07 EDT 2003


    Tim> [Skip]
    >> I suspect that the Outlook plugin simply makes it easier to find
    >> problems (more users, more worm mail, more concurrent threads,
    >> whatever).

    Tim> Is that relevant?  I've never seen a database corruption complaint
    Tim> from someone using the Outlook addin (did I miss one?), and I
    Tim> deliberately switched my 3 classifiers to Berkeley in order to try
    Tim> to provoke one.  No luck.  IIRC, Mark has never seen this either.

I guess I was mistaken.  Sorry about that.

    Tim> If so, the OP was running on Windows, but was almost certainly not
    Tim> using the Outlook addin:

    Tim>    Now I'm getting an error message in the email my
    Tim>    headers: X-Spambayes-Exception: bsddb._DBRunRecoveryError
    Tim>    ((-30982, 'DB_RUNRECOVERY: Fatal error, run database recovery --
    Tim>    fatal region error detected; run recovery')) in __getitem__() at
    Tim>    C:\PTYTHON23\lib\bsddb\__init.py line 86: return self.db[key]

    Tim> The Outlook addin never inserts email headers, so I don't believe
    Tim> that fellow's problem had anything to do with the addin.

I have this bad habit of jumping to the conclusion that the user was running
the Outlook plugin if a traceback is posted which includes "C:\...".  This
would have then been an error in pop3proxy I guess.

    >> I think the same (or a similar) problem would exist were two
    >> instances of hammiefilter running at the same time, both trying to
    >> update the file.  I'm just fortunate enough to have never encountered
    >> that problem.  Even using a pickle, you really ought to use some sort
    >> of lock protocol when reading or writing the pickle file if there's
    >> any chance of concurrent access by another process or thread.  That
    >> you only read it at the beginning and write it at the end only limits
    >> the opportunity for collision.

    Tim> Python dicts are safe for multiple-reader single-writer access
    Tim> without explicit synchronization, and per-access locks are so
    Tim> bloody expensive that I don't want to change anything in the
    Tim> absence of proof that there's a problem that can't be wormed around
    Tim> more cheaply.  To date, I don't believe we've seen any report of
    Tim> corruption via the Outlook addin, which suggests it's doing
    Tim> something right <wink>.

    Skip> ... Startup time is dramatically different:

    Tim> Of course.

    [ times elided ]

    >> This is not to imply that my huge database is typical or that my
    >> usage of hammiefilter is either.

    Tim> I don't know about hammiefilter (haven't used it).  

My only reason for referring to hammiefilter is that its runtime is
dominated by startup and shutdown costs, since all it does is train on or
score a single message.  That makes the pickle/dict solution painfully slow.
Were it not for the presence of one-shot apps like hammiefilter, we could
probably just use a pickle for storage and be done with it.

Skip



More information about the spambayes-dev mailing list