[Spambayes] My first non-personal personal false positive

François Granger francois.granger@free.fr
Thu Nov 7 08:56:13 2002


on 7/11/02 3:16, Tim Peters at tim.one@comcast.net wrote:

> 'mediante', 'pagina', 'tiene', 'clic', 'muy', 'pero', 'saber', 'con', 'bien',
'eso', 'hola', 'que', 'aqu?', 'les', 'por'

Here are the most probable English equivalents of the Spanish words.
> 'using', 'page', 'have', 'click', 'much', 'but', 'know', 'with', 'good',
'this', 'Hi', 'that', 'here', 'the', 'for'

This illustrate he need for properly balanced training sets and re raise the
question of language discrimination. At least prior language discrimination
would allow for a different database for each language or for a systematic
"unsure" flag for not trained languages. If you put my messages in a Ham
training set, you will flag French spams as ham because of my French sig ;-)

All these words should rate around 0.5 since they are among the most common
ones in this language.

-- 
Le courrier est un moyen de communication. Les gens devraient
se poser des questions sur les implications politiques des choix (ou non
choix) de leurs outils et technologies. Pour des courriers propres :
<http://marc.herbert.free.fr/mail/> -- <http://minilien.com/?IXZneLoID0>




More information about the Spambayes mailing list