[spambayes-dev] Web filter

John Mulholland sl6dt at cc.usu.edu
Tue Jan 20 17:08:26 EST 2004


I was just rereading some of the old discussions about a bayesian web filter.  
I am going to try to write one this semester for grad level class project.  I 
think that I can find some solutions to problems that you have mentioned.  
There are some very interesting characteristics about porn pages.  First of 
all, they are often linked through java script.  It isn't difficult to 
automate a process to find lots of them.  Maybe I am naive but it seems they 
are quite similar and very different from most other web pages.  At least 
that is the case with most of them.  Since one of the purposes of my program 
is to protect people from accidently going to a porn site then a false 
negative is much more serious then a false positive.  I definitely agree that 
an open effort to make a base package of n number of sites we definitely want 
blocked would be very helpful.  To check out sites it is as simple as 
telnet abc.com 80
GET / HTTP/1.1
Then you can get the html and analyze it.
If an open effort does start to list sites we should also make sure to have 
different categories because someone may not want to look at nude art but 
some may think that it is ok.
If people are interested in this please contact me at sl6dt at cc.usu.edu and let 
me know.  I would appreciate any ideas or suggestions because I am fairly new 
to the linux world and there are many things with this project that I have no 
idea how to do.  My goal is to may a very effective, robust, easy to use, 
customizable, free web filter that most people can use, including windows 
users.

John Mulholland





More information about the spambayes-dev mailing list