[Spambayes] Spam in Images

skip at pobox.com skip at pobox.com
Thu Aug 3 05:54:51 CEST 2006


    Peter> For my spam and non-spam, a good indicator is that I very seldom
    Peter> receive non-spam messages with a .gif image attached (attachments
    Peter> are usually .jpg or various document types). And if a wanted mail
    Peter> has a .gif attachment it has much more text than the usual
    Peter> gibberish in the spam messages (because it is usually just a
    Peter> company logo or similar, and not essential to the message). So if
    Peter> spambayes can score attachment type and text size it may help.

True, but for those people with correspondents who do send them mail with
image attachments ("Subject: Cute pictures of my new granddaughter"), the
presence or absence of images may fall around the middle and thus either not
be used at all, or only provide a negligible bump in one direction or the
other.

Scoring images can run the entire gamut, from running OCR software to (try
to) extract the text it contains to ignoring them altogether.  Right now we
note the presence of images by their content-type.  My image size patch adds
another measurement.  We probably can develop other measures.  My feeble
attempts to use the open source OCR tool ocrad yielded no useful results.
Do we want to require PIL and start digging into the images that way?

Skip



More information about the SpamBayes mailing list