Any Neural Net code in Python? I want to filter out spam email

Sébastien Libert sebastien.libert at comexis.com
Thu Apr 19 03:51:13 EDT 2001


Hello,

Nice discussion !!!!!!

The 'algorythm'  that you describe can use this : http://www.awaretek.com/python.html ?
If you take a look at this, let me know and please envolve me ! 

Seb
  
  "Ken Seehof" <kens at sightreader.com> wrote in message news:mailman.987656068.4191.python-list at python.org...
  "Dan Maas" <dmaas at nospam.dcine.com> says:

  > > I've been saving up all the spam messages I get for the past two months.
  > > I have about 1869 spam messages saved.
  > > Now I'd like to develop a neural net based filter for my email program
  > > and train it to recognize these messages as spam.
  >
  > Cool... I assume the main thing you are worrying about is accidentally
  > rejecting non-spam emails, which might happen too easily with a
  > naive keyword-based system.
  >
  > How about this - apply a whole set of tests to the message. Each test
  > gives a "spammness" score - e.g. 10 points for being all caps, 50 points
  > for having the word 'viagara', 100 points for having a suspicious From:
  > address like *@yahoo.com. Add the scores from the different tests, and
  > if the sum exceeds, say, 200 points, then call it "spam."
  >
  > So, how do you figure out a good value for each test score? This is where
  > you could use a neural network or genetic algorithm. Pick a set of
  > scores, feed the program lots of messages (both spam and non-spam), and
  > see how accurate it is. Iterate until it rejects every spam email and
  > accepts every non-spam...
  >
  > Dan
  > --
  > http://mail.python.org/mailman/listinfo/python-list

  Excellent idea, Dan.  That's conveniently sidesteps the most difficult
  issue: getting the neural network to actually come up with linguistic
  rules.  Once an intelligent human specifies the set of rules, the neural
  net should have no difficulty coming up with an optimal non-linear
  function of pre-processed features (i.e. the "rules") to identify spam.
  Analysis of the weights after training will help remove rules that turn
  out to be irrelevant.

  In other words, the input vector is simply the results from your
  arbitrary rule set.

  Since irrelevant rules are fairly harmless (other than decreasing
  performance), one could initialize it to include a rule for every word
  that occurs in spam messages more often than in non-spam
  messages.  Then supplement it with rules like the ones you mention.

  Here's another idea for acquiring sample data.  Send 'please send
  me more info' messages to everyone who has sent you spam, with
  your newly created spam recipient email address.  Your address
  will probably be sold to everyone.  BTW, make sure your spam
  recipient is on an ISP that does -not- defend against spam!

  (Technically, it's not actually spam you'd be receiving since you
  are explicitly requesting it, but close enough :-)

  I want to be involved in this project.  Let's take this offline.

  - Ken
  ----------------------------------------------------
  Copyright (c) 2001 by Ken Seehof
  This document may not be distributed, copied,
  duplicated, or replicated, or duplicated in any
  form without express permission by Ken Seehof.
  Permission is hereby granted.
  kseehof at neuralintegrator.com
  ----------------------------------------------------
  The opinions expressed herein are not necessarily
  those of George W. Bush.




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20010419/2475c4ab/attachment.html>


More information about the Python-list mailing list