[Spambayes-checkins] spambayes/spambayes Options.py, 1.103, 1.104 OptionsClass.py, 1.21, 1.22 hammie.py, 1.13, 1.14 hammiebulk.py, 1.13, 1.14 storage.py, 1.38, 1.39

Tony Meyer anadelonbrin at users.sourceforge.net
Thu Feb 5 03:13:31 EST 2004


Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv5122/spambayes

Modified Files:
	Options.py OptionsClass.py hammie.py hammiebulk.py storage.py 
Log Message:
** Incompatible change **

Let's call this 1.0a9, because 7+1 is a bad number, and that makes us match the Outlook plug-in.

In imapfilter, hint at a good imaplib debug level, and don't put the todo list in the docstring.

If the allowed values for an option is a tuple of values, then the type is correct if the value is one
of those types.  Also remove duplicate code from OptionsClass.py

Get all the scripts to determine the database name and type via a utility function in storage.py,
so that everything is consistent across scripts (and it removes some code duplication, although the
options list is looped through twice).

The consistent style is to use "-p FILE" for pickle FILE, and "-d FILE" for dbm FILE.  The two
scripts that already used -p now use -P instead (imapfilter, notesfilter).

Dumped the "mysql::"/"pgsql::" format for electing to use mySQL/pgSQL (I doubt anyone is
using this, anyway).  Replaced it with a proper selection, by making the persistent_use_database
option multi-choice, instead of boolean (it's backwards compatible, so that True/"True" and False/"False"
are "dbm" and "pickle", respectively).  This also provides room for a "zodb" option, at some point
<wink>.

I've tested all the scripts except for notesfilter, which doesn't want to run here, and hammie/hammiebulk.  (By tested, I mean that I've run them far enough to see that they're opening the correct db with -p/-d/none and raising an error with both -p and -d).  That said, testing by day-to-day users of the scripts would be great.

Fixes [ 824628 ] sb_dbexpimp.py, sb_filter.py option incompatibility
Closes [857595] Same argument on the command line
Closes [857597] On command line use -D for pickle
Closes [861656] Missing file for patch

Index: Options.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/Options.py,v
retrieving revision 1.103
retrieving revision 1.104
diff -C2 -d -r1.103 -r1.104
*** Options.py	29 Jan 2004 15:02:11 -0000	1.103
--- Options.py	5 Feb 2004 08:13:26 -0000	1.104
***************
*** 543,551 ****
    # loaded by the appropriate application only.
    "Storage" : (
!     ("persistent_use_database", "Use database for storage", True,
       """SpamBayes can use either a database (quick to score one message)
!      or a pickle (quick to train on huge amounts of messages). Set this to
!      use a database by default.""",
!      BOOLEAN, RESTORE),
  
      ("persistent_storage_file", "Storage file name", "hammie.db",
--- 543,557 ----
    # loaded by the appropriate application only.
    "Storage" : (
!     ("persistent_use_database", "Use database for storage", "dbm",
       """SpamBayes can use either a database (quick to score one message)
!      or a pickle (quick to train on huge amounts of messages). There is
!      also (currently experimental) the ability to use a mySQL or
!      PostgrepSQL database.  For historical reasons, if you set this to
!      "True" you are selecting "dbm" and if you set this to "False" you
!      are selecting "pickle".  We recommend explicitly selecting the type,
!      (i.e. changing "True" to "dbm" and "False" to "pickle", or sticking
!      with the default.""",
!      # True == "dbm", False == "pickle", "True" == "dbm", "False" == "pickle"
!      ("mysql", "pgsql", "dbm", "pickle", "True", "False", True, False), RESTORE),
  
      ("persistent_storage_file", "Storage file name", "hammie.db",

Index: OptionsClass.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/OptionsClass.py,v
retrieving revision 1.21
retrieving revision 1.22
diff -C2 -d -r1.21 -r1.22
*** OptionsClass.py	16 Dec 2003 05:06:34 -0000	1.21
--- OptionsClass.py	5 Feb 2004 08:13:27 -0000	1.22
***************
*** 245,248 ****
--- 245,252 ----
              # already the correct type
              return value
+         if type(self.allowed_values) == types.TupleType and \
+            value in self.allowed_values:
+             # already correct type
+             return value
          if self.is_boolean():
              if str(value) == "True" or value == 1:
***************
*** 512,517 ****
  
      def convert_and_set(self, section, option, value):
-         if self.multiple_values_allowed(section, option):
-             value = self.convert(section, option, value)
          value = self.convert(section, option, value)
          self.set(section, option, value)
--- 516,519 ----

Index: hammie.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/hammie.py,v
retrieving revision 1.13
retrieving revision 1.14
diff -C2 -d -r1.13 -r1.14
*** hammie.py	26 Nov 2003 23:01:24 -0000	1.13
--- hammie.py	5 Feb 2004 08:13:27 -0000	1.14
***************
*** 257,267 ****
  
  
! def open(filename, useDB=True, mode='r'):
      """Open a file, returning a Hammie instance.
  
!     If usedb is False, open as a pickle instead of a DBDict.  mode is
! 
!     used as the flag to open DBDict objects.  'c' for read-write (create
!     if needed), 'r' for read-only, 'w' for read-write.
      """
      return Hammie(storage.open_storage(filename, useDB, mode))
--- 257,265 ----
  
  
! def open(filename, useDB="dbm", mode='r'):
      """Open a file, returning a Hammie instance.
  
!     mode is used as the flag to open DBDict objects.  'c' for read-write
!     (create if needed), 'r' for read-only, 'w' for read-write.
      """
      return Hammie(storage.open_storage(filename, useDB, mode))

Index: hammiebulk.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/hammiebulk.py,v
retrieving revision 1.13
retrieving revision 1.14
diff -C2 -d -r1.13 -r1.14
*** hammiebulk.py	12 Jan 2004 14:15:38 -0000	1.13
--- hammiebulk.py	5 Feb 2004 08:13:27 -0000	1.14
***************
*** 6,22 ****
      -h
          show usage and exit
!     -d
          use the DBM store.  A DBM file is larger than the pickle and
          creating it is slower, but loading it is much faster,
          especially for large word databases.  Recommended for use with
          hammiefilter or any procmail-based filter.
!     -D
!         use the pickle store.  A pickle is smaller and faster to create,
!         but much slower to load.  Recommended for use with pop3proxy and
!         hammiesrv.
      -p FILE
!         use file as the persistent store.  loads data from this file if it
!         exists, and saves data to this file at the end.
!         Default: %(DEFAULTDB)s
      -U
          Untrain instead of train.  The interpretation of -g and -s remains
--- 6,20 ----
      -h
          show usage and exit
!     -d FILE
          use the DBM store.  A DBM file is larger than the pickle and
          creating it is slower, but loading it is much faster,
          especially for large word databases.  Recommended for use with
          hammiefilter or any procmail-based filter.
!         Default filename: %(DEFAULTDB)s
      -p FILE
!         use the pickle store.  A pickle is smaller and faster to create,
!         but much slower to load.  Recommended for use with sb_server and
!         sb_xmlrpcserver.
!         Default filename: %(DEFAULTDB)s
      -U
          Untrain instead of train.  The interpretation of -g and -s remains
***************
*** 53,57 ****
  
  from spambayes.Options import options, get_pathname_option
! from spambayes import classifier, mboxutils, hammie, Corpus
  
  Corpus.Verbose = True
--- 51,55 ----
  
  from spambayes.Options import options, get_pathname_option
! from spambayes import classifier, mboxutils, hammie, Corpus, storage
  
  Corpus.Verbose = True
***************
*** 145,149 ****
      """Main program; parse options and go."""
      try:
!         opts, args = getopt.getopt(sys.argv[1:], 'hdDUfg:s:p:u:r')
      except getopt.error, msg:
          usage(2, msg)
--- 143,147 ----
      """Main program; parse options and go."""
      try:
!         opts, args = getopt.getopt(sys.argv[1:], 'hd:Ufg:s:p:u:r')
      except getopt.error, msg:
          usage(2, msg)
***************
*** 170,179 ****
              spam.append(arg)
              mode = 'c'
-         elif opt == '-p':
-             pck = arg
-         elif opt == "-d":
-             usedb = True
-         elif opt == "-D":
-             usedb = False
          elif opt == "-f":
              do_filter = True
--- 168,171 ----
***************
*** 184,187 ****
--- 176,180 ----
          elif opt == '-r':
              reverse = 1
+     pck, usedb = storage.database_type(opts)
      if args:
          usage(2, "Positional arguments not allowed")

Index: storage.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/storage.py,v
retrieving revision 1.38
retrieving revision 1.39
diff -C2 -d -r1.38 -r1.39
*** storage.py	24 Dec 2003 04:08:38 -0000	1.38
--- storage.py	5 Feb 2004 08:13:27 -0000	1.39
***************
*** 66,70 ****
  import types
  from spambayes import classifier
! from spambayes.Options import options
  import cPickle as pickle
  import errno
--- 66,70 ----
  import types
  from spambayes import classifier
! from spambayes.Options import options, get_pathname_option
  import cPickle as pickle
  import errno
***************
*** 638,641 ****
--- 638,645 ----
          return repr(self.invalid_name)
  
+ class MutuallyExclusiveError(Exception):
+     def __str__(self):
+         return "Only one type of database can be specified"
+ 
  # values are classifier class and True if it accepts a mode
  # arg, False otherwise
***************
*** 646,650 ****
                    }
  
! def open_storage(data_source_name, useDB=True, mode=None):
      """Return a storage object appropriate to the given parameters.
  
--- 650,654 ----
                    }
  
! def open_storage(data_source_name, db_type="dbm", mode=None):
      """Return a storage object appropriate to the given parameters.
  
***************
*** 652,671 ****
      the same given the same options.
  
!     If useDB is false, a pickle will be used, otherwise if the data
!     source name includes "::", whatever is before that determines
!     the type of database.  If the source name doesn't include "::",
!     then a DBDictClassifier is used."""
!     if useDB:
!         if data_source_name.find('::') != -1:
!             db_type, rest = data_source_name.split('::', 1)
!             if _storage_types.has_key(db_type.lower()):
!                 klass, supports_mode = _storage_types[db_type.lower()]
!                 data_source_name = rest
!             else:
!                 raise NoSuchClassifierError(db_type)
!         else:
!             klass, supports_mode = _storage_types["dbm"]
!     else:
!         klass, supports_mode = _storage_types["pickle"]
      try:
          if supports_mode and mode is not None:
--- 656,666 ----
      the same given the same options.
  
!     db_type must be one of the following strings:
!       dbm, pickle, pgsql, mysql
!     """
!     try:
!         klass, supports_mode = _storage_types[db_type]
!     except KeyError:
!         raise NoSuchClassifierError(db_type)
      try:
          if supports_mode and mode is not None:
***************
*** 683,686 ****
--- 678,721 ----
              sys.exit()
  
+ # The different database types that are available.
+ # The key should be the command-line switch that is used to select this
+ # type, and the value should be the name of the type (which
+ # must be a valid key for the _storage_types dictionary).
+ _storage_options = { "-p" : "pickle",
+                      "-d" : "dbm",
+                      }
+ 
+ def database_type(opts):
+     """Return the name of the database and the type to use.  The output of
+     this function can be used as the db_type parameter for the open_storage
+     function, for example:
+ 
+         [standard getopts code]
+         db_name, db_type = database_types(opts)
+         storage = open_storage(db_name, db_type)
+ 
+     The selection is made based on the options passed, or, if the
+     appropriate options are not present, the options in the global
+     options object.
+ 
+     Currently supports:
+        -p  :  pickle
+        -d  :  dbm
+     """
+     nm, typ = None, None
+     for opt, arg in opts:
+         if _storage_options.has_key(opt):
+             if nm is None and typ is None:
+                 nm, typ = arg, _storage_options[opt]
+             else:
+                 raise MutuallyExclusiveError()
+     if nm is None and typ is None:
+         typ = options["Storage", "persistent_use_database"]
+         if typ is True or typ == "True":
+             typ = "dbm"
+         elif typ is False or typ == "False":
+             typ = "pickle"
+         nm = get_pathname_option("Storage", "persistent_storage_file")
+     return nm, typ
  
  if __name__ == '__main__':




More information about the Spambayes-checkins mailing list