[Spambayes] spam tokens IibKrw0yteNAtHyZDDw (fwd)

Atom 'Smasher' atom at suspicious.org
Mon Dec 1 21:27:50 EST 2003


two things i've noticed about spam, i'm not sure if either of them are
taken into account with SB, but maybe someone can look into this
further...  or maybe someone already has and they can tell me why these
don't work...

1) so many spams have a *lot* of spaces (and tabs?) in the subject line.
(like above {taken from real spam}).

i know... multiple spaces aren't tokens, they *separate* tokens... but
when there are 20+ in a row, in the subject line, that usually means spam.

2) so many spams are filled with nonsense and random strings
	rldvlzgj coldokiue i q wfup cadrhs r cqufqc e p fnlcgv fipv
which probably don't appear in legit email.

can these be used to detect spam? are they used?

my understanding of bayesian filtering, is that if it never before
encountered the word "rldvlzgj", then it scores 0.5 (or something fairly
neutral). well, after i've trained it on a few hundred or a few thousand
emails, i think it should have a good handle on my vocabulary and maybe be
less forgiving with words i haven't seen before.

i fully understand that the nature of bayesian filtering is often
counter-intuitive when it comes to what to look at and what to ignore, so
i'm fully prepared for someone to tell me exactly why these things don't
work the way my brain thinks they should.


 	...atom

 _______________________________________________
 PGP key - http://smasher.suspicious.org/pgp.txt
 3EBE 2810 30AE 601D 54B2 4A90 9C28 0BBF 3D7D 41E3
 -------------------------------------------------

	"IDEA's key length is 128 bits - over twice as long as DES.
	 Assuming that a brute force attack is the most efficient,
	 it would require 2^128 (10^38) encryptions to recover the
	 key. Design a chip that can test a billion keys per
	 second an throw a billion of the them at the problem,
	 and it will still take 10^13 years - that's longer than
	 the age of the universe. An array of 10^24 such chips can
	 find the key in a day, but there aren't enough silicon
	 atoms in the universe to build such a machine. Now we're
	 getting somewhere - although I'd keep my eye on the dark
	 matter debate."
		-- Bruce Schneier, Applied Cryptography




More information about the Spambayes mailing list