Ham or Spam? (was RE: [Spambayes] RE: Central Limit Theorem??!! :))

Tim Peters tim.one@comcast.net
Fri, 27 Sep 2002 13:21:36 -0400


[Charles Cazabon]
> I'd be most curious as to how ham HTML messages vs. spam HTML
> messages compare with the above scheme if you longer strip HTML tags.

"No longer", right?

> I realize you don't have unlimited time for testing, but it might be
> useful if HTML spam  message rate as "high likelihood, high
> confidence" while HTML ham is "high likelihood, lower confidence" ...

I can run tests in the background easily enough, but this is something I
can't test at all:  there is almost no HTML ham in c.l.py traffic.  About
the only instances of it are inappropriate (but not spam) postings from
first-time posters, with bodies usually of the form "confirm" or
"unsubscribe" (i.e., newbies who can't follow directions and post their
mailing list administrivia to the newsgroup).

If anyone else can test this, be my guest.  In the absence of volunteers,
I'll appoint Charles <wink>.