[spambayes-dev] spammy subject lines

Paul Sorenson sosman at users.sourceforge.net
Fri Oct 10 19:23:20 EDT 2003


I am getting quite a bit of spam with subject lines like the following:

subject: Lon.g an^d Str;ong al)l Nigh_t j-jcgzies
subject: Ch-eck ou=t ou-r sel)ection _of grea)t R_X -emgffj

Looking at the tokenizer code for subject lines I was wondering if there was
value in stripping punctuation then doing the usual word tokenisation.

I seems there are other special cases taken into account for the subject
line so care would need to be taken not to break those.

I would be happy to have a crack at a patch if this hasn't been tried
already, I just wanted to float the idea first given that I am unfamiliar
with the existing codebase and unsure whether it might have already been
tried.

cheers




More information about the spambayes-dev mailing list