[Spambayes] Spam in Images
skip at pobox.com
skip at pobox.com
Thu Aug 3 05:54:51 CEST 2006
Peter> For my spam and non-spam, a good indicator is that I very seldom
Peter> receive non-spam messages with a .gif image attached (attachments
Peter> are usually .jpg or various document types). And if a wanted mail
Peter> has a .gif attachment it has much more text than the usual
Peter> gibberish in the spam messages (because it is usually just a
Peter> company logo or similar, and not essential to the message). So if
Peter> spambayes can score attachment type and text size it may help.
True, but for those people with correspondents who do send them mail with
image attachments ("Subject: Cute pictures of my new granddaughter"), the
presence or absence of images may fall around the middle and thus either not
be used at all, or only provide a negligible bump in one direction or the
other.
Scoring images can run the entire gamut, from running OCR software to (try
to) extract the text it contains to ignoring them altogether. Right now we
note the presence of images by their content-type. My image size patch adds
another measurement. We probably can develop other measures. My feeble
attempts to use the open source OCR tool ocrad yielded no useful results.
Do we want to require PIL and start digging into the images that way?
Skip
More information about the SpamBayes
mailing list