[spambayes-dev] sb_filter -n broke?

Kenny Pitt kennypitt at hotmail.com
Tue Sep 9 16:02:23 EDT 2003


Skip Montanaro wrote:
> Given this ini file referred to in BAYESCUSTOMIZE:
> 
>     [Storage]
>     persistent_use_database: False
>     persistent_storage_file: ~/tmp/sa.db
> 
> this command:
> 
>     sb_filter.py -n
> 
> produces this traceback:
> 
[snip (sorry, traceback just wouldn't quote properly in Outlook)]
> 
> Printing out (klass, data_source_name) yields (wrapped):
> 
>     (<class spambayes.storage.PickledClassifier at 0x712a50>,
>      ('/Users/skip/tmp/sa.db', 'n'))
> 
> Modifying the else: branch of the "if useDB" statement to
> 
>         klass = PickledClassifier
>         data_source_name = data_source_name[0:1]
> 
> solves the problem.  I've never used the pickled classifier before
> though, so I don't know if I was somehow using things wrong or if
> there is a better fix.  
> 

I'm not overly familiar with this bit of code, but I think I can tell what's
going on.  Somebody correct me if I'm off base.

In hammie.py (at line 259 in my slightly out-of-date copy), there is this
call to the open_storage function:
    return Hammie(storage.open_storage((filename, mode), useDB))
This passes the tuple (filename, mode) to open_storage as the
data_source_name param.  Because you passed a tuple, you reach the following
line in storage.py as seen in the traceback:
    return klass(*data_source_name)
This turns the two elements of the tuple into two separate params to the
constructor call, which is correct if klass is a DBDictClassifier but not if
it is a PickledClassifier since PickledClassifier takes only a single
filename parameter.

The problem seems to be that the value of the useDB parameter is ignored
when testing to see which constructor call should be used, and so breaks
your tuple into two parameters even when creating a PickledClassifier.  Your
solution fixes the problem for the case where a tuple is passed for
data_source_name, but breaks if a string is passed for data_source_name.  I
think if you replace your "data_source_name = data_source_name[0:1]" in the
else clause with the following it will work in both cases:

    if isinstance(data_source_name, type(())):
        # For PickledClassifier, use only the filename from the tuple.
        data_source_name = data_source_name[0]

-- 
Kenny Pitt




More information about the spambayes-dev mailing list