[Spambayes] was no subject (where can find documentation)

yahoo.de mpas1342 at yahoo.de
Mon Jun 12 09:31:28 CEST 2006



-----Ursprüngliche Nachricht-----
Von: Tim Peters [mailto:tim.peters at gmail.com]
Gesendet: Montag, 12. Juni 2006 08:21
An: yahoo.de
Cc: spambayes at python.org
Betreff: Re: [Spambayes] was no subject (where can find documentation)


[mpas1342 at yahoo.de]
> I 'm not a Programmer and i have no experience with Python
> (a little with java)
> What should i do in this case, go along the whole code only to
> know how is the technique to create a token?

Yes.  Tokenization is an algorithm, and you simply can't understand
the details without reading some code.  The (very) short course is
that SpamBayes tokenizes by splitting on whitespace, and ignoring case
distinctions.  Most of the time, but not all of the time.

> I dont know at least in which file i will find information about the
token!

tokenizer.py contains all the tokenization code.

> Apart from that, i 'm not sure if i can understand it from only code,
> therefore
> is better for people like me to see some Texts and
> simultaneously read code i think.
> I will be happy if you have such
> documentation and could send me :)
> ??

There is no such documentation, although as Tim Stone said:

    Even if you're not a programmer, the comments are quite readable.

So try that.  Feel free to ask questions if you get stuck.  That _has_
to work better than continuing to ask for something that doesn't exist
:-)


----------------------------------------------------------
Ok, i take a look on it later. But there is q Question regarding withespaces
and
token's building.
Let consiider this sample:
I get an email with only this paraghraph on the body:
Sun is shining.
if you say because of wiithspaces there are only:
1-sun
2-is
3-shining
to be checked,i will ask what is with the substrings in sun and shining


1-sun
2-su
3-un

and all combinations for shinig like
4-shining
5-hining
6-ining
7-ning
8-ing
9-ng

?
Because the spam email could contain at this paragraph spam words like this:
sunBuy is shinigViagra
i hope the sample is understandable:-)








	

	
		
___________________________________________________________ 
Telefonate ohne weitere Kosten vom PC zum PC: http://messenger.yahoo.de


More information about the SpamBayes mailing list