[spambayes-dev] New urls

Matthew Dixon Cowles matt at mondoinfo.com
Sun May 7 01:42:03 CEST 2006


Matt> It took some training for me before my SpamBayes started to
Matt> recognize those reliably, but it seems that my old hack to
Matt> tokenize URL's IPs helps:

> This doesn't seem to be in the code base. 'Zat so?

Yup! The patch is at:

http://www.mondoinfo.com/tokenizerpatch.txt

and the local cache I use it with is at:

http://www.mondoinfo.com/dnscache.py

There was some discussion of it here some time ago. It didn't seem to
help on historical corpora, perhaps because spammers don't maintain
their DNS for long. But on current spam it helps for me.

I haven't experimented with breaking the IP up at anything other than
byte boundaries. I also haven't looked at the related issue of
whether four tokens for an exact match is optimal.

Regards,
Matt



More information about the spambayes-dev mailing list