python script as an emergency mailbox cleaner

Phil Weldon pweldon at mindspring.com
Sun Sep 21 17:15:09 EDT 2003


Yes, I tend to discount your advice because it may be that you aren't
considering the messages generated by Worm.Automat.AHB are a very restricted
subset of spam, the legitimate 'undeliverable e-mail' messages are closely
related, and the 'undelivered e-mail' messages caused by Worm.Automat.AHB
generated e-mail with the target e-mail address in the FROM line are also
closely related.  The current need is a quick way to counter the 'spam'
effects of Worm.Automat.AHB, not correctly categorizing Nigerian fund
transfer and Viagra spam sets.

To further explain, the bogus 'undeliverable e-mail' type messages are
permutating and the database supplying the input to the worm's generator is
growing.  There are at least two classes of bogus 'undeliverable mail';

1.  e-mail generated by the worm
2.  real 'undeliverable e-mail' messages that are the results of the worm
using your e-mail address as the sender on bogus 'undeliverable e-mail'
which then generates a legitimate but unwanted and useless 'undeliverable
e-mail' message.

Now, if you have the time to supply your arguments rather than cv, I'll be
happy to learn.

And, to quote the Inboxer help file,

"The text box in the Create Filters area indicates the number of messages
that were processed to build the filters. Generally, the higher the number,
the more accurate the filters will become."

So far the scoring Inboxer developed on the basis of the ~1500 bad and 264
good examples results in no false negatives or false positives, including
correctly classifing a dozen completely legitimate 'undelivered e-mail'
messages in a set of ~ 400 new messages.  The -1500 bad e-mail messages have
a date spread of 18SEP03 though 20SEP03 while the 265 good e-mail messages
have a date spread of 1AUG03 through 20SEP03.  Both sets were sent to my ISP
mailbox.

I will try dividing the two sets of messages into smaller sets and try the
results of your suggestion on new e-mails as they collect.  By the way, my
current ratio of Worm.Automat.AHB instigated messages to legitimate e-mail
(which for my purposes includes traditional spam) is far greater than
1500:265; it's more like 1500:50.

And I guess I should download from spambayes and donate to PSF since my
daughter is using Python in her physics classes at Carnegie-Mellon.
Concidently, I just happened to be looking at my loose-leafed copy of
Feynman's Lectures on Physics with a reference manual in the back for
FORTRAN IV I had to use for physics classes.



Phil Weldon, pweldon at mindspring.com

"Tim Peters" <tim.one at comcast.net> wrote in message
news:mailman.1064166807.8722.python-list at python.org...
> [Phil Weldon]
> > I don't think 'fewer' examples of bogus 'Undeliverable e-mail'
> > messages will be 'better' because of the permutating and morphing
> > nature of this worm generated message.  'Fewer' examples would result
> > in ALL 'Undeliverable e-mail' message catagorized as objectionable
> > because the number valid messages of this type is so small in the
> > save e-mail that most users have.
>
> Which is exactly why training on "too many" such unwanted messages will
make
> it very difficult for the handful of legitimate messages of that sort to
> score as ham.  I started the spambayes project, and did most of the
research
> for, and coding of, its tokenizer and classifier, but you're certainly
free
> to ignore my ill-informed advice <wink>
>
.
.
.
> > Now, if I can just find a way to charge the cost to Earthlink because
> > of their failure to perform their implicit contract to provide
> > reliable e-mail service.
>
> I suspect they already thought of that trick <wink> -- a good start would
be
> to read your service contract with them.
>
>






More information about the Python-list mailing list