[Spambayes] Just for fun

Tim Peters tim.one@comcast.net
Mon Nov 18 17:28:32 2002


[Moore, Paul]
> ...
> I'm not good at interpreting this stuff yet, but it came out as
> solidly unsure, with some interesting features. The 'sender:no real
> name:2**0' as a solid ham clue is almost certainly due to Exchange
> (basically because Exchange doesn't do real headers, I expect)

If there is no Sender header, no token is generated.  You get 'sender:no
real name:2**0' only if there *is* a Sender header (and it doesn't contain a
real name).  The Outlook client's _GetFakeHeaders() doesn't synthesize a
Sender header, either.  So that token must come from internet mail.  It may
be a ham clue for you because some mailing lists create a Sender field
without a real name.  For example, the mailing-list version of
comp.lang.python adds this to its headers:

    Sender: python-list-admin@python.org

So that makes 'sender:no real name:2**0' a ham clue for me too.  That's
fine!  In my corpus, it is a ham indicator.

> - I see most internet headers as good spam clues, which is mildly
> worrying, although hasn't caused any real issues yet.

If your spam comes from the internet, it's appropriate <wink>.

> The obvious implication is that getting a really good training corpus
> is *hard*. Probably beyond the means of the average user.

The best possible training corpus is the email they actually get, correctly
classified.  If they know their own judgment about ham vs spam, all the rest
should happen by magic.  It's still hard for clients to do that, though.




More information about the Spambayes mailing list