[Spambayes] Beyond Spambayes

Thu Feb 23 23:55:55 CET 2006

On Thursday, February 23, 2006 10:28 AM -0600,
netsecurity at sound-by-design.com wrote:

> Frankly I am in agreement with Billy Y. I myself have gotten black
> holed because someone on the same netblock sent a bunch of spam.
> Getting off the list was impossible because I did not control the
> netblock. It took over three months, and I have a fixed IP!

There are plenty situations like this around and its very unfortunate
when innocent third parties are affected.  This is why DNSBL's are both
loved and hated.  It sounds like you were in what Paul Vixie called a
"bad neighborhood" on the internet.  Other people in your netblock were
likely abusing outside networks and the provider was not cooperative in
fixing the problem.  He wound up with a netblock blackholed which
affected all his customers.

It's similar to a multi-unit in a big city neighborhood where there's a
lot of drug dealing.  Ideally, the police would catch the perpetrators
and the problem would be solved.  But the landlord is lax and continues
to rent to people who conduct illegal activities and create a public
nuisance.  Eventually, it may get to the point where the city condemns
the building and everyone in it loses their apartment.  Some of the
people were clearly innocent, but allowing the situation to continue was
not a good option either.  Whatever the police did, many people would be
unhappy.  Prosecuting the current perpetrators, and watching them
quickly replaced with similar tenants would not satisfy the neighbors
desire for a safe neighborhood.  Seizing the building and throwing
everybody out is clearly unfair to tenants who did nothing wrong.  It's
just a bad situation.

Expanding the blacklisted netblock is a desperate move to put pressure
on the netblock owner to fix the problem, much like condemning a
building that has been a crack house and the police can't solve the
problem.  It is obviously controversial, but so is doing nothing.  There
is no good solution for individuals except to try and avoid moving into
"bad neighborhoods".  Paul Vixie's answer to this was "personal
colocation", meaning run your own server in a low-cost facility that has
a clean record.  It doesn't have to be physically close to you or
connected with your bandwidth provider in any way.  This is obviously
not for the majority of users, but it's good for some people.

> Rather than disruptive RBLs, if we did deep packet inspection to find
> the forged HELO and other headers and dumped them we would be far
ahead.

DNSBL's are controversial, especially among those who have been burned.
I think it's useful to consider the whole situation to avoid throwing
out the baby with the bathwater.  DNSBL's are disruptive to a small
group of innocent third parties, just like condemning crack houses is.
The question is, does the use of DNSBL's in general provide something
that is useful with an acceptable level of breakage?  Many responsible
mail system operators think so, and some don't.

Spam is a problem for large systems because the cost of post-acceptance
filtering is high.  Unlike postal mail, this cost is borne by the
recipient, not the sender.  While MUA solutions like Spambayes do a
wonderful job of classification, most large systems find that their
users won't accept high volumes of spam coming in.  While one solution
is to promiscuously accept most everything that is sent, then run
content filtering tools like SpamAssassin to catch the obvious, and
finally MUA tools like Spambayes to catch the rest, the server side
filtering is expensive.  OTOH, users don't like enormous piles of spam
for them to filter, putting pressure on email providers.  Using a DNSBL
is a very low-cost way of reducing the amount of spam by a large factor
with little collateral damage.  They do have their place.  If you run a
hobby system with a fixed bandwidth bill, then it's not much of an issue
and Spambayes may be all you need.  This doesn't scale well for large
systems and doesn't meet users' expectations.

> While I don't run my own mail server, a friend who does says that a
> sendmail script finds all the forged headers and reports them as
> probable spam. He swears it is a default install so he doesn't know
> exactly what part of sendmail does the trick.

There are a number of heuristics that you can apply during the beginning
of the SMTP conversation that let you reject connections from some
obviously bogus mailers.  This is much harder to do after you've
accepted the message and all you have are the message headers.  This is
possible to some extent (look at the SpamCop header analysis scripts)
but not as effective as when you have the other MTA on the wire.

> BTW, my hosting service uses SpamAssassin instead of SpamBayes
> because of speed and server load. He says that he ran tests and
> couldn't get the performance out of Python that he needs to make
> it work well. Perhaps making a fast, light CPU usage, runtime
> server version might be in order to investigate.

Spambayes works so well because it distinguishes what each user
considers spam.  I don't know about the success anyone's had in using
this technique across a population of users.  The developers normally
recommend against that.  Running Spambayes on each user's mail with
their own database on the MTA is an interesting idea.  It still means
you accept and filter all that junk, so as a large provider, you have to
pay for both the bandwidth and processing time.

--
Seth Goodman