python script as an emergency mailbox cleaner

Tim Peters tim.one at comcast.net
Sun Sep 21 11:02:30 EDT 2003


[Phil Weldon]
> Inboxer for Outlook is a plugin written with Python that will analyze
> collections of what you consider legitimate e-mail and and what you
> consider illegitimate e-mail.

The classification engine in Inboxer comes from the free spambayes project:

    http://www.spambayes.org/

Inboxer is a commercial product (produced by some old friends of mine from
Dragon Systems, but I have no other connection to it), which can afford to
pay people to research and add ease-of-use features.  The spambayes project
is behind on that count, but for the technically-minded should perform
equally well.

> I downloaded it and ran it against a collection of 1500 messages
> generated by the Worm.Automat.AHB and 265 the latest legitimate
> e-mails I've received.

The spambayes engine works best when trained on approximately equal numbers
of ham and spam.  You should actually get better results if you train on far
*fewer* than 1500 of a particular species of spam.  In my home classifier, I
eee I've trained on 6 slightly different instances of Worm.Automat spew, and
that's all.  All the rest I've gotten were classed as spam (but I have my
spam cutoff set to 80, and IIRC Inboxer defaults that to 90).

> After the analysis, Inboxer has detected about 250 Worm.Automat.AHB
> generated messages with no false negatives and no false positives
> (granted there were only three new legitimate e-mails.

If you start getting some, the paradoxical best thing to do would, again, be
to train on *fewer* worm spew messages.






More information about the Python-list mailing list