[spambayes-dev] RE: [Spambayes] Amazing sloth

Tony Meyer tameyer at ihug.co.nz
Thu Apr 22 01:16:11 EDT 2004


> I should mention that it happened two more times for me after
> starting over from scratch, with very few msgs trained on 
> each time (certainly less than 50 total).

Yay, something to look forward to <wink>.  I've managed to get up to 11 ham
and 11 spam without any problems, though.

[...]
> You have scanpst.exe, but you may
> have to search your disk to find it.

Indeed I did.  It was in C:\Program Files\Common Files\System\Mapi\1033.  Of
course.

> Since I moved to a giant pickled dict, I don't care anymore
> <0.5 wink>.

I suppose I do (assuming it may happen to me again), since I don't really
want to switch to a pickled dict, because I open and close Outlook
reasonably often, and have other uses for my (much smaller) memory.

> An interesting experiment would be to open it
> directly from a non-SpamBayes Python program, and just time 
> lookups and inserts.

Lookups don't appear to be affected at all, but inserts definitely are.
I've tried really simple (just multiple insertions) tests comparing a new
database, a database around the same size (which is about 5500 keys), the
slow database, and another Berkeley db with the same data (exporting the
slow one to text and then using that to create a new db) in case it was just
some quirk of entry order or the file itself.

There doesn't seem to be any difference between the dbs with the same data,
but they are 3 to 4 times slower than either the new db or the similarly
sized one.  This is with Python 2.3.3 and bsddb or Python 2.2.3 and bsddb3.

Playing around with creating dbs of the same size doesn't seem to be getting
me any closer to creating another database with this odd effect.  I realise
that you don't have time to look into this, but any chance you have further
suggestions about how I might investigate it?

> There was a disturbing Python bug report against bsddb that I
> closed as hopeless:
> 
>    http://www.python.org/sf/881522

I read this, and it does seem like it could be related, but I'm not sure how
to test that :)

=Tony Meyer




More information about the spambayes-dev mailing list