Any Neural Net code in Python? I want to filter out spam email

Remco Gerlich scarblac at pino.selwerd.nl
Thu Apr 19 06:37:42 EDT 2001


Alex Martelli <aleaxit at yahoo.com> wrote in comp.lang.python:
> There may not exist a vector of feature weights that performs
> perfectly, of course.  What one generally wants is a vector of
> feature weights that _optimizes_ some performance score.

Or a set of vectors. Probably a set of vectors that optimize some score.
Something is spam if one|enough of the vectors trigger (pick your favorite).

"A learning approach to personalized information filtering" by Beerud Dilip
Sheth (<http://citeseer.nj.nec.com/sheth94learning.html>) is his MSc thesis
where he uses genetic algorithms to optimize a set of vectors for
information filtering.

My own thesis will be on this subject as well. The problem is that these
things try to solve another information filtering problem; given a huge
stream of articles, find ones that are interesting to the user. Restraints
aren't as tight; most of the articles indicated as interesting should be,
and most of those should be found, but it's no big problem if there are some
mistakes - the stream is huge, and even the stream of interesting articles
will often be too much to read anyway.

I'd hate it when my spam filter filtered normal mail. That's why I'm
personally sticking to a few hand made rules, and just ignore the rest.

On the plus side, the problem is simpler because spam is spam. It could
learn from a big database of spams that people put together, and a working
filter could be made public on some web site. Information filtering systems
have to learn the interests of the individual user. (But then, if it's
public, spammers will learn to avoid it :-().

My own research focuses more on the characteristics that the vectors get
after being optimized for a while. But all the code will be in Python, and I
think I'll get permission to put it online. But that's some months into the
future.

-- 
Remco Gerlich
(see! mentioned python after all)



More information about the Python-list mailing list