[Spambayes] Msg class broken?
T. Alexander Popiel
popiel at wolfskeep.com
Fri Feb 7 08:45:23 EST 2003
Resending, because this one seems to have gone into the bit-bucket,
too...
I'm trying to do a bit more testing (*gasp*), but I'm having a bit
of difficulty: it seems that the tokenizer doesn't like being given
a simple string anymore, as is done in the Msg class in msgs.py.
If I'm reading things right, this breaks all of the automated testing
tools. Have a traceback:
Traceback (most recent call last):
File "testtools/Continuous.py", line 293, in ?
main()
File "testtools/Continuous.py", line 254, in main
tests[j].predict([msg], isspam)
File "testtools/Continuous.py", line 94, in predict
prob = guess(example)
File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/classifier.py", line 217, in chi2_spamprob
clues = self._getclues(wordstream)
File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/classifier.py", line 436, in _getclues
for word in Set(wordstream):
File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/compatsets.py", line 374, in __init__
self._update(iterable)
File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/compatsets.py", line 333, in _update
for element in it:
File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/tokenizer.py", line 1052, in tokenize
for tok in self.tokenize_headers(msg):
File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/tokenizer.py", line 1063, in tokenize_headers
for w in crack_content_xyz(x):
File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/tokenizer.py", line 791, in crack_content_xyz
yield 'content-type:' + msg.get_content_type()
AttributeError: Message instance has no attribute 'get_content_type'
Please ignore the top three lines of the trace; I'm building my own
driver for testing with incremental training after each message.
(What I'm trying to do in the big picture is get graphs of how the
error rates drop off over time with various training modes.)
Anyway, it looks like either msgs.py needs to be updated to pass in
email.Message.Message objects, or tokenizer.py needs to relearn how
to accept raw strings. Am I reading this right? This seems odd
since tokenizer does seem to try to convert the string to a Message
via the auspices of mboxutils... help?
- Alex
More information about the Spambayes
mailing list