Jeremy Hylton : weblog : 2003-11-14

Spambayes False Positive

Friday, November 14, 2003, 3:50 p.m.

An email I sent Tim was scored as 100% spam by his Spambayes filter. The email contained lots of charts and text graphics summarizing the contents of a Zope database. Most of the numbers were spam clues.

The message did contain a few good ham clues -- like my email address and words like "class" -- but an overwhelming number of spam clues. Many of the three digit numbers in the report were mild spam indicators -- seen in three spam and one ham or two spam and no name. Tim noted that IP addresses often show up in URLs in spam, so the individual components are probably in the training data.

He trained on one of the messages and the next report email came up as 100% ham.