[spambayes-bugs] [ spambayes-Feature Requests-859339 ] Add sqlite storage option

SourceForge.net noreply at sourceforge.net
Mon Dec 5 22:29:32 CET 2005


Feature Requests item #859339, was opened at 2003-12-13 17:45
Message generated for change (Comment added) made by anadelonbrin
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=859339&group_id=61702

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Priority: 5
Submitted By: Vladimir Ulogov (vulogov)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add sqlite storage option

Initial Comment:
In addition to the BerkleyDB, I'd like to use sqlite as
well

----------------------------------------------------------------------

>Comment By: Tony Meyer (anadelonbrin)
Date: 2005-12-06 10:29

Message:
Logged In: YES 
user_id=552329

If anyone is, two things to consider would be:

  1. Looking at the way the DBClassifier works and copying
some ideas from there.  IIRC it caches non-hapax tokens and
so tries to minimise actually accessing the db.

  2. Using sqlite for the token database (hammie.db) and
something else for the messageinfo database.  The
messageinfo db gets written a lot more often (once per
message train/classify).  pickle also does poorly here at
the moment, and dbm is the one that gave us all the trouble,
so I'm not sure what to suggest, though.

----------------------------------------------------------------------

Comment By: Kenny Pitt (kpitt)
Date: 2005-12-06 04:04

Message:
Logged In: YES 
user_id=859086

I actually had a mostly working SQLite storage class back 
when SQLite 2.x was current, but the performance was so 
abysmal that I didn't go any further with it.  I never got 
around to digging it out and testing it with the 3.x 
version of SQLite, which is supposed to have better 
performance.

SQLite generally performs pretty well for reads and for 
writes that are batched together in a transaction.  
Unfortunately, the current SpamBayes database access 
includes a fairly large number of writes (especially when 
training, although it also tracks statistics on every 
message received), and the writes are generally committed 
after each change rather than batched until the end of a 
large operation.  This mode of access is pretty much the 
worst case scenario for SQLite performance.

If anyone is interested in doing any more work with this, 
I'll see if I can locate my old code and post it as a 
patch.

----------------------------------------------------------------------

Comment By: Tony Meyer (anadelonbrin)
Date: 2005-12-05 22:11

Message:
Logged In: YES 
user_id=552329

Note also that there have been a few people on the mailing
list that have mentioned intending to do this.  You could
try seeing if any have, and if they'd be willing to
contribute the code either to you or back to the project.

----------------------------------------------------------------------

Comment By: Tony Meyer (anadelonbrin)
Date: 2004-07-16 13:40

Message:
Logged In: YES 
user_id=552329

This is a feature request, not a patch, so changing type.

----------------------------------------------------------------------

Comment By: Tony Meyer (anadelonbrin)
Date: 2004-02-05 21:45

Message:
Logged In: YES 
user_id=552329

Note that you can use mysql or postgresql.  Are either of
those good enough?  If not, then you could maybe write your
own SQLiteClassifier class, based on the other SQL ones in
storage.py.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=859339&group_id=61702


More information about the Spambayes-bugs mailing list