[spambayes-dev] PGClassifier checked in

Skip Montanaro skip at pobox.com
Wed Aug 6 21:45:10 EDT 2003


The storage module gained two new classes:

    SQLClassifier - a base class for people wishing to store their hammie
    info in SQL databases

    PGClassifier - a concrete implementation using the psycopg module to
    access a PostgreSQL database

This code has a number of problems, not the least of which is that none of
the other modules and scripts in the system know about it yet.  For those of
you not subscribed to spambayes-checkins, Here's the checkin message:

----------------------------------------------------------------------------
**** Danger, Will Robinson!  Do not use the PGClassifier class yet! ****

This is an initial stab at SQLClassifier and PGClassifier classes.  This
still needs a lot of work, to wit:

    * I've tried to break functionality into the two classes in such a way
      that adding other SQLClassifier subclasses should be reasonably easy,
      but I don't know much about writing portable SQL.  Python's DB API
      helps, to be sure, but isn't perfect.

    * Scoring messages is dreadfully slow.  I don't know if I'm commit()ing
      too frequently, creating too many cursors or if I have some other
      problem.  My past use of SQL has generally been of the "scads of
      SELECTs per INSERT" sort of thing, so I've never paid a lot of
      attention to commit().

    * I've encountered a couple bad cases.  With the word column defined as
      bytea (PostgreSQL's binary string type), both of these calls fail if c
      is a cursor object:

        c.execute("select * from bayes where word=%s", ('report.\\n";',))
        c.execute("select * from bayes where word=%s", ('reserved\x00',))

      If the word column is defined as the more traditional varchar(128),
      the first call succeeds but the second still fails.
----------------------------------------------------------------------------

Skip



More information about the spambayes-dev mailing list