Spambayes modifications with web services

benmorganpowell at gmail.com benmorganpowell at gmail.com
Fri Oct 28 04:00:25 EDT 2005


In the last few months many personal website owners (such as myself)
have found that spammers have been using their domain names to
masquerade as valid users to send spam, normally in the form of:

JoeMarkBlogs at mydomain.com

This new tactic has an annoying problem, which is that the bounced
emails end up back with the postmaster at the innocent persons domain.
This is normally the first time that the domain owner realises that
there is a problem.

I am one of those people and currently have nearly 3 thousand bounces
in my catch all POP3 box.

Solutions I can see to this are one of two things:

1) Delete the email as it arrives and ignore it. Realise that the
domain name might end up being blacklisted as a spammer's domain and be
done with it, or

2) Fight back! All of the bounced emails contain at least one URI to a
spammer website, in a effort to sell "Cheap Meds" or "Faked Rolexes" or
similar. The format is usually something like this:

http://www.sickmate.info/?a2fb9e415e74beS9cdee919d78Sa6a7d

The query part of the URI I believe provides the reference between the
email address and the visit. Hence if you visit the website with this
link, your email address is saved in a database as one that is a)
valid, and b) dumb enough to visit the website.

The spammers rely on the fact that some people will visit this website
and buy from them. In fact, Q.E.D., some people must buy from these
websites via spam, otherwise the spammers would have given up a long
time ago*.

So, as a web programmer and someone who specialises in getting good
results on Google, I realised that I could simply post every spammer
website on a Google optimized page, which if searched for on Google
would return something like:

"WARNING: DO NOT BUY FROM THIS WEBSITE. THE SPAMMER IS A RUSSIAN MAFIA
CROOK WHO WILL STEAL YOUR MONEY."

...Or something equally obvious along those lines. In this way we
attack the websites that are the link between the spam and the money.
The real necessity therefore is to:

a) Process the received bounced messages quickly and list them on the
website without delay.
b) Prevent the spammer using the domain

The answer to (b) I cannot find. I thought SPF might help, but it is
not a panacea. The answer to (a) I need help with!

So, I'm on Windows XP. I use Outlook 2002 and I already have the
excellent (and FREE) SpamBayes Outlook add-in** that blocks spam and
loves ham. Spambayes is open source and as such I can modify the source
code, recompile it and install it afresh. However, the problem is that
I'm not a python programmer, and I'm not sure where to start. This is
what I want to do, so if anyone would like to direct me, I'd be
grateful:

1) Add a menu option to the SpamBayes add-in - "Post Spam Site to Web
Service". I'm guessing I can add a new line to the addin.py such as
below, but how do I sink the event?

self._AddControl(popup,
	constants.msoControlButton,
	ButtonEvent, (PostSpamSite, self.manager,),
	Caption="Post Spam Site to Web Service",
	Enabled=True,
	Visible=True,
	Tag = "SpamBayesCommand.PostSpam")

2) Add a configuration setting, so that the web service location can be
set. I'm guessing this is in config.py. Pointers welcome.

4) Add a function to extract all links in a block of text. I have
written a good one of these for .NET, but I'm not sure if, or how it
would work in Python:

string hrefPattern =
@"(?<all>(?:(?<protocol>http(?:s?)|ftp)(?:\:\/\/))"
		+ @"(?<domain>[^/\r\n\:]+)?"
		+ @"(?<port>\:\d+)?"
		+ @"(?<path>[^\?#]*)?"
		+ @"(?<qrystr>\?\w*)?"
		+ @"(?<bookmark>\#\w*)?)";

// Regular Expression
Regex hrefRegex = new Regex(hrefPattern, RegexOptions.Singleline |
RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);

Any help with this welcome. Do I need a specific Python regex library
or can I use the .NET regex library in Python?

4) Connect to web service using SOAP and consume that service. Service
will provide:

a) Authorise (username, password) - returns access
b) Submit (domain) - returns success or failure

Can I use SOAPpy for this? Can anyone give me any examples or point me
in the right direction?

5) Provide another option in the add in to "Scan folder and Post Spam
Sites to Web Service", in the same manner as "Filter messages" works
now. Can I use filter.py as a model to work from?

Summary
=================================
I am not a Python programmer per se but have no problem with getting my
hands dirty. I have already got the basics of this working as a
Windows.Forms application, but running both that and Outlook together
is daft. The Spambayes project already does the hard bit in classifying
the spam, so it makes sense to hang off the back of it.

Has anyone else had similar problems as me with these "phantom" email
addresses being using by spammers and would like to work with me on
this? Would anyone in the Spambayes team like to have a go at this, or
point me in the right direction? Has anyone had a go at hacking around
with the SpamBayes source code and knows what I should do?

Basically any help is extremely welcome!

Regards

Ben

* There must obviously be enough people out there who can't get an
erection or dumb enough to munch pills to get slim rather than endure a
bit of excercise. That being said, they will also trust their credit
card to a bunch of crooks who even if they send you the pills, will
probably sent you rat poison!

** Get the FREE Spambayes Outlook add-in from
http://sourceforge.net/projects/spambayes




More information about the Python-list mailing list