[Spambayes] RE: solution for the "spam of the future"?

Tue Dec 16 16:07:30 EST 2003

I suppose that would also tend to filter out messages in languages you don't understand (assuming, as we've discussed before, that the orthography of the language at issue lends itself to tokenization).

Bob

MIS Department, City of Cambridge
831 Massachusetts Ave, Cambridge MA 02139  ·  617-349-4217  ·  fax 617-349-6165

> -----Original Message-----
> From: Tiago Estill de Noronha [mailto:TiagoTiago at Globo.com]
> Sent: Tuesday, December 16, 2003 2:01 PM
> To: 'SpamBayes'
> Subject: [Spambayes] solution for the "spam of the future"?
> 
> 
> I have an idea, I dunno if it will work or if it is possible to implement
> it, but my guess is yes for both, k, here it goes:
> 
> Create a "meta token" that will be used everytime a word  not in the
> database is found in the email
> Do the bayesian thing when the user send the email containing a new word to
> spam or ham
> from that, everytime a user gets a email with new words spambayes would
> classify it as ham or spam
> After a while receiveing those random chars emails (and building the
> database of know words, the token database it self) the points for new word
> "meta token" would increase to the spam side