[spambayes-dev] correlated clues

Richie Hindle richie at entrian.com
Fri Jul 2 14:51:40 EDT 2004


[Kenny]
> the latest version of POPfile has switched from
> BerkeleyDB to SQLite for its default database because of the reliability
> problems with Berkeley.  Anyone have any experience with SQLite?  Would it
> be worth implementing a SpamBayes storage option for it to test it out?

I considered it recently for a project whose database requirements were
similar to Spambayes - lots of small rows, lots of lookups required in
quick succession.  I learned that PySQLite can't use precompiled
parameterised queries.  That is, if you need to do this:

  select H S from words where word='get';
  select H S from words where word='your';
  select H S from words where word='viagra';
  select H S from words where word='here';

then it needs to parse and compile the SQL statement for each request.
SQLite itself supports precompiled parameterised queries, but PySQLite
doesn't wrap that API.  That made it too slow for this project.

Perhaps it wouldn't be too hard to change classifier.Classifier so that
the SQL could say:

  select H S from words where word in ('get', 'your', 'viagra', 'here');

POPFile's database requirements are presumably the same, and they must be
happy with the performance.  And finding an alternative to Berkeley DB
(other than pickle), would be a Good Thing.

Same question to any MetaKit/Mk4Py users out there...?

-- 
Richie Hindle
richie at entrian.com




More information about the spambayes-dev mailing list