[Spambayes] Modified Classifier storage

Tony Meyer tameyer at ihug.co.nz
Thu Jan 5 10:50:45 CET 2006


> I'm looking to store a bunch of storage.Classifiers in a database
> indexed by user (in other words, so that each user gets his own
> classifier). It seems I could do this easily enough by modifying
> mySQLClassifier or similar (though I'd like to use SQLObject so I'm
> not tied to a specific database server), but before I go and mess
> with that, I was wondering if there's an easier way.

Are you after easy or efficient?  Assuming that tokens are likely to  
be found in more than one user's database (this would certainly be  
true with email; I have no idea whether it is true for whatever you  
are doing) then a more efficient database system might be to have a  
'token' table and then references to that in a user's table (which  
would have 'token reference', 'ham count', 'spam count' entries).  I  
believe someone did something like this a long time back and posted  
here about it - google might help find it.

> For example, I could probably create a classifier.Classifier  
> instead, and
> just pickle it to and from a database record (one per user), but  
> without
> looking closer at the code I'm unclear if that's completely batty or
> not.

This is basically what the PickledClassifier class does (but stores  
the pickle in a file, rather than a database), so I can't see  
anything particularly batty about it.  This would certainly be easy  
and fast.  This involves keeping the entire object in memory, so  
whether it's feasible depends somewhat on how many users there will  
be and how large the classifiers will be (without knowing what will  
be in them, we have no way of knowing that).

=Tony.Meyer




-- 
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.




More information about the SpamBayes mailing list