[Spambayes] Tokenizing ideas (images, attachments)

Meyer, Tony T.A.Meyer at massey.ac.nz
Wed Aug 27 20:07:47 EDT 2003


> Why not tokenize image URLs?
[...]
> While SpamBayes detected this message just fine,

There's a reason why not ;)

> Many times the message is empty or almost
> empty, containing only an image URL.

Not that any URL, including image ones, is tokenized.  If you look at
the clues for a message like the one you used as an example, you should
see some url: tokens.

It has been suggested that tokenizing (textual) information at the end
of the URL would be worthwhile (this includes a token if the URL 404s).
We tested this out (look at the urlslurper.py file), but didn't have
enough people testing to integrate it into the main code (as a
default-to-off option).  Death2Spam (see the related page) does this,
though, and Richard swears by it.

In any case, the best thing is to try these (or any other) ideas out.
See FAQ 6.1:

<file:///D:/cvs/spambayes/website/faq.html#why-don-t-you-implement-cool-
tokenizer-trick-x>

=Tony Meyer



More information about the Spambayes mailing list