From skip at pobox.com Wed Feb 6 14:00:37 2013 From: skip at pobox.com (Skip Montanaro) Date: Wed, 6 Feb 2013 07:00:37 -0600 Subject: [spambayes-dev] GitHub In-Reply-To: References: Message-ID: After some discussion with people on chicago at python.org, I decided to at least clone the SpamBayes repository into Git. I put it up on GitHub because that's the hip git place to be. (I'm nothing if not hip. ). That git repo is disconnected from its Subversion source. I will probably push the repo back to SF as well. I'm not sure it buys us anything, though I'm getting a bit better at svn2git. Skip -------------- next part -------------- An HTML attachment was scrubbed... URL: From najw2 at live.com Fri Feb 22 23:49:52 2013 From: najw2 at live.com (=?utf-8?B?bs6sanfOrCBB4oST0L3OrHLQss6v?=) Date: Sat, 23 Feb 2013 01:49:52 +0300 Subject: [spambayes-dev] SpamBayes classifier to the arabic language _ Help Message-ID: Hello .. I have a graduation project and decide to use a spambayes technique to classification arabic spam email by the python environment . My questions how i can to run it with the python programming language ,what is the packages that must to use it with python . How i can to linking this technique with another preprocessing technique . How it can work with arabic language . How can I pass messages to the training and testing process . Please, I need urgent Help . thanks ): -------------- next part -------------- An HTML attachment was scrubbed... URL: From skip at pobox.com Sat Feb 23 00:57:00 2013 From: skip at pobox.com (Skip Montanaro) Date: Fri, 22 Feb 2013 17:57:00 -0600 Subject: [spambayes-dev] [Spambayes] SpamBayes classifier to the arabic language _ Help In-Reply-To: References: Message-ID: > I have a graduation project and decide to use a spambayes technique to > classification arabic spam email by the python environment . > My questions how i can to run it with the python programming language ,what > is the packages that must to use it with python . > How i can to linking this technique with another preprocessing technique . > How it can work with arabic language . > How can I pass messages to the training and testing process . You will need to download the SpamBayes source distribution so you get the test environment and are able to easily make changes to the code. I recently created a Git repository at GitHub: https://github.com/smontanaro/spambayes You can just clone that repository. If you make changes to the code you would like incorporated into SpamBayes, you can create a pull request when you are ready. Once you've downloaded the code you should familiarize yourself with the tokenizer code in spambayes/spambayes/tokenizer.py. (You can ignore everything in the website directory.) The tokenizer file contains many detailed comments about what did and didn't work when SpamBayes was originally developed. Arabic text will be full of non-ASCII characters. Search for "highbit" and "8bit" to decide how you want to handle that. I'm pretty sure you will have to modify that code. Also, if Arabic text uses something other than an ASCII space char to separate words you will have to fix that. It's unlikely you will need to modify the classifier, at least initially, but it will pay to read through that heavily commented code as well. The output of the tokenizer step is the input to the classifier. Knowing how to set its parameters will help when testing. Familiarize yourself with spambayes/TESTING.txt to learn how to test your changes. Finally, you will need fairly large collections of spam and ham emails. The TESTING file should describe the requirements there. Skip Montanaro