[Moin-devel] antispam = no horror

Thomas Waldmann tw-public at gmx.de
Wed Dec 15 06:03:04 EST 2004


Just to clear some things:

The background
==============

Since quite some time many public wikis experience a major problem:

Wiki spammers regularly polluting numerous pages with their link spam 
(just to get a better google ranking, or maybe some visitors).

At the beginning, they did that manually, but meanwhile I think they 
have scripts that automatically search and spam wikis. Some even did 
make accounts when anon editing was not allowed.

As that annoyance got bigger and bigger, I decided to strike back and 
wrote antispam, first as optional plugin, and since 1.2.4 as built-in thing.

The problem was how to recognize such spam and I chose regular 
expressions as a powerful means to get them all (no matter if they are 
really links or just texts showing some "promoted" URL) and moin uses 
the content of the BadContent page to deny saving a page containing any 
match to those REs - when you have it enabled in your moin_config.py 
(wikiconfig.py for 1.3).

I have decided to switch it on by default because I have too often seen 
the damage of those spammers (and too often helped fixing it again, 
which costs lots of time), knowing that if it is not on by default, many 
people won't switch it on because they simply forget or are not aware of 
the problem.

Another problem was how to update that BadContent page. I chose, until 
we have a better solution, to just fetch the page from the 
MoinMaster:BadContent master page (checking first, if it is newer and 
also not fetching too often [we must pay for traffic, you maybe too]).

So we just need to update that single page, to get that new spam 
patterns banned in all wikis fetching the page. It is quite important to 
be fast as (similar to new windows viruses), the spammers often change 
their URLs and use new domains. If you have new spam, you can "report" 
it on MoinMaster:LocalBadContent and we transfer it to the real 
(protected) page ASAP.

After release 1.2.4 I was made aware of some principal problems, why it 
is switched off by default in 1.3.x:
  * not any wiki has (or is allowed to have) internet connectivity
  * some wikis have unreliable or very slow internet connectivity

The current problems
====================

After 1.3.1 release I upgraded linuxwiki.org from 1.2.4 to 1.3.1 and 
also to participate in the wiki farm running under twisted (before, it 
was cgi). That caused quite some load problems and crashes of that wiki 
farm, making moinmaster wiki unavailable sometimes. I tried to fix by 
switching to the 1.3.1 "standalone" wiki server, but we have noticed 
that it doesn't work as antispam server at all. I have switched back to 
twisted now, hoping it won't overload again.

The 1.2.4 antispam client code just uses some xmlrpc calls of the python 
standard library - and we noticed later, that those calls have NO 
TIMEOUT at all. So if you hit save and it wants to fetch an update for 
BadContent and moinmaster wiki isn't working correctly or down or 
disconnected, it will wait forever (or until some other timeout happens, 
like Apache of Browser ...) and won't save the page.

After we noticed that problems, we developped xmlrpc code WITH timeout, 
so the 1.3.x code has less problems in such a case.

What to do?
===========

If you run your wiki on an intranet not accessible from the internet, 
switch antispam off (1.2.4 default = on - see moin_config.py, 1.3.x 
default = off - see wikiconfig.py).

If your wiki can be accessed from the internet and you allow anon or 
user edits, you principally should use antispam or you risk being raided 
badly and often by some spammers. OTOH, you are free to switch it off if 
you can't live with moinmaster's availability (we try hard to have good 
availability, but, as you know, shit happens).

Thanks for reading this far,

Thomas Waldmann




More information about the Moin-devel mailing list