[Spambayes] Re: mboxtrain croaks on spam mbox file

Andrew A. Raines aaraines at pobox.com
Thu Sep 18 14:44:15 EDT 2003


Skip Montanaro <skip at pobox.com> writes:

> Can you narrow it down to a single message and then attach it
> to a mail to the spambayes list?

This was, uh, tricky.  Because an mbox doesn't really give me
any placemarkers, and mboxtrain doesn't report the line number
of the file on error, I had to split everything up.  With a
little reformail foo I created a separate one-file-per-directory
directory for every message in the mbox which would explicitly
show me on which message mboxtrain died on.

Anyway, after running mboxtrain on all these baby MH's, I
finally found the culprit, #1688:

,----
| From nobody Thu Aug  7 09:55:57 2003
| Return-Path: <nobody at example.com>
| X-Gnus-Mail-Source: maildir:~/Maildir/inbox/new
| Message-ID: <l6v7k5q55sj.fsf at totally-fudged-out-message-id>
| Delivered-To: aar at williams.mc.vanderbilt.edu
| Received: (qmail 24184 invoked by alias); 7 Aug 2003 06:35:45 -0000
| Delivered-To: postmaster at williams.mc.vanderbilt.edu
| Received: (qmail 24126 invoked from network); 7 Aug 2003 06:35:44 -0000
| Received: from unknown (HELO nessus) (160.129.223.39)
|   by williams.mc.vanderbilt.edu with SMTP; 7 Aug 2003 06:35:44 -0000
| From: nobody at example.com
| To: postmaster@[160.129.208.222]
| Organization: Nessus kabale
| MIME-Version: 1.0
| Subject: Nessus antivirus test 3: alternative base64 attachment
| Content-Type: multipart/mixed; boundary="=-=-="
| Xref: williams spam-archive-1:1689
| Lines: 13
| X-Gnus-Article-Number: 1689   Mon Aug 11 11:08:05 2003
| 
| 
| --=-=-=
| 
| If you can read or execute the attachment, this means that you do not
| have an antivirus, or that it was disabled.
| 
| --=-=-=
| Content-Type: application/octet-stream
| Content-Disposition: attachment; filename*1="eicar."; filename*2="com"
| Content-Description: EICAR test file
| 
| X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
| --=-=-=--
`----


> You might also try changing the msg.get_filename() call in
> tokenizer.py to
>
>     try:
>         fname = msg.get_filename()
>     except TypeError:
>         print >> sys.stderr, "Error:", msg.get_filename()
>         raise

After doing this, the resulting error becomes:

,----
| aar at packer:d-1688(607)$ ~/src/spambayes/mboxtrain.py -d ~/.hammiedb-test -s .
| Training spam (.):
|   Reading as MH mailbox
| Error:Traceback (most recent call last):
| 
| [...]
| 
|   File "/export/home/aar/src/spambayes-1.0a5/spambayes/tokenizer.py", line 1090, in tokenize
|     for tok in self.tokenize_headers(msg):
|   File "/export/home/aar/src/spambayes-1.0a5/spambayes/tokenizer.py", line 1101, in tokenize_headers
|     for w in crack_content_xyz(x):
|   File "/export/home/aar/src/spambayes-1.0a5/spambayes/tokenizer.py", line 813, in crack_content_xyz
|     print >> sys.stderr, "Error:", msg.get_filename()
|   File "/usr/lib/python2.3/email/Message.py", line 711, in get_filename
|     return unicode(newvalue[2], newvalue[0])
| TypeError: unicode() argument 2 must be string, not None
`----

Thanks,

-Drew




More information about the Spambayes mailing list