[SciPy-dev] cleaning out wiki spam
Peter Wang
pwang at enthought.com
Tue Feb 24 10:47:41 EST 2009
Hi everyone,
I have gone through with a blunt grep hammer and moved ~9300 pages off
of the main scipy wiki. This seems to have helped Moin's performance
somewhat. There are still approximately 3300 pages remaining. If
folks are interested in a distributed approach to culling the rest of
the spam, I can send out an 80kb file listing of the remaining pages.
It would be helpful to have both "definite ham" and "definite spam"
lists, especially in the foreign language pages and user pages, which
are the toughest to figure out. (e.g. What is the difference between
French spam and French ham? Surely we have *some* legitimate Chinese
contributors on the wiki?)
In my wild grepping it's possible I've blown away some good pages.
I'm including my list of patterns below, so folks can identify major
or obvious problems. The sketchiest (but also the most effective) was
eliminating pages with '(2b)', but I recognize that was a pretty broad
stroke.
Of course, if anyone notices missing pages, please let me know and I
will restore the page ASAP.
-Peter
---------------------------
*\(2b\)*
*gold*
*ffxi*
*ountertop*
granite*
Gold*
guild*wars*
*Hangzhou*
*hangzhou*
Injection*Molding*
lineage*2*
liuhecai*
Louis*Vuitton*
ltage*
Mabinogi*
maple*story*
Maple*Story*
ok????*
qq\(*
replica*
Rohan* rohan* ROHAN*
rs* RS* Rs*
runescape* Runescape*
(e2*
(e3*
(e4*
(e5*
(e6*
(e7*
(e8*
(e9*
tm?????*
xinggan*
zxcv*
cai*
*d0????*
hare*
Hj*
hj*
hk*
jack*
Lex*
seo*
SEO*
tema*
Tombstone*
usr*
*arhammer*
*arcraft*
*WoW*
*wow*
*WOW
www\(2e\)*
zg*
zhonggo*
315*
200{6,7,8,9}*
1878*
123*
13*
5*
6*
7*
Ajd*
baixiao*
China*
china*
game*
Game*
google*
Google*
GOOGLE*
kcc*
nobye*
oforu*
power*
Power*
tibet*
Tibet*
?urbocharger*
?holesale*
?rusher*
More information about the SciPy-Dev
mailing list