[Spambayes] was no subject (where can find documentation)

yahoo.de mpas1342 at yahoo.de
Mon Jun 12 10:05:27 CEST 2006

-----Ursprungliche Nachricht-----
Von: Tony Meyer [mailto:tameyer at ihug.co.nz]
Gesendet: Montag, 12. Juni 2006 09:39
An: yahoo.de
Cc: Tim Peters; spambayes at python.org
Betreff: Re: [Spambayes] was no subject (where can find documentation)

> Ok, i take a look on it later.

If you took a look at it now, you might not need to ask this <0.5 wink>.

> But there is q Question regarding withespaces
> and token's building.
> Let consiider this sample:
> I get an email with only this paraghraph on the body:
> Sun is shining.
> if you say because of wiithspaces there are only:
> 1-sun
> 2-is
> 3-shining
> to be checked,

In short: yes.  In reality, we skip any tokens less than three  
characters in length, and there are also many tokens from the headers.

> i will ask what is with the substrings in sun and shining
> 1-sun
> 2-su
> 3-un
> and all combinations for shinig like
> 4-shining
> 5-hining
> 6-ining
> 7-ning
> 8-ing
> 9-ng
> ?
> Because the spam email could contain at this paragraph spam words  
> like this:
> sunBuy is shinigViagra
> i hope the sample is understandable:-)

Look for mention of "character n-grams" in the comments in  
tokenizer.py for discussion about this.  In short, 'words' work  
better and have the added bonus of resulting in (mostly) human- 
understandable tokens.

Your example (assuming there are no header tokens) would either be  
spam (another spam using these embedded words has already been  
trained), or unsure (they have never been seen before).  Your example  
is also extremely unclear - it does a very poor job at selling, which  
is the whole point, after all.  So a spammer gains little, and has  
lost a lot.

1-and  if the sample is like this:
sunBuy is shinigViagrawww.xyx.com/dfdf.html

2-how manytokens will be there?

Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.

Telefonate ohne weitere Kosten vom PC zum PC: http://messenger.yahoo.de

More information about the SpamBayes mailing list