[spambayes-dev] Re: [Spambayes] Observation

Skip Montanaro skip at pobox.com
Fri Jul 2 10:34:03 EDT 2004


(redirecting to spambayes-dev...)

    David> Below is a variant of an email that has been getting through a
    David> lot recently (perhaps 8 or so variants of this email have gotten
    David> through).  Usually repetitive emails don't get through for very
    David> long.  The problem is when I mark it as spam, it latches on to
    David> the gibberesh hapaxes on the end, so the next one is not well
    David> recognized.

Can you post the clues?  It doesn't matter how much gibberish is in the
message.  If it's never been seen before it won't have an effect on the
outcome. 

I messed around a little with the message.  When I first ran it through
sb_filter.py the classification and clues left me scratching my head:

    X-Spambayes-Classification: titan-unsure; 0.52
    X-Spambayes-Evidence: '*H*': 0.51; '*S*': 0.55; 'subject:through': 0.16;
            'x-mailer:microsoft outlook express 6.00.2800.1409':
            0.23; 'header:Received:2': 0.80; 'subject:sun': 0.84

I then started poking around at the Python prompt:

    >>> len(msg.get_payload())
    4792
    >>> msg.get_payload()
    'zwmxfsrrp dvltfw dugdaeav wujir mjebdt\nrrvejn splkeiw- ...'
    >>> t = tokenizer.Tokenizer()
    >>> body = t.tokenize_body(msg)
    >>> body
    <generator object at 0xb2ee68>
    >>> list(body)
    []

That seems pretty damn odd.  I don't see any massively long html tags.  I
think it's somehow related to the fact that the content-type is
multipart/alternative but that no alternatives given, at least in the
version David posted.  David, can you zip the message up and mail it to me?
(Maybe this is some Outlook damage to the message?)

Skip



More information about the spambayes-dev mailing list