[SciPy-dev] cleaning out wiki spam

Peter Wang pwang at enthought.com
Tue Feb 24 10:47:41 EST 2009


Hi everyone,

I have gone through with a blunt grep hammer and moved ~9300 pages off  
of the main scipy wiki.  This seems to have helped Moin's performance  
somewhat.  There are still approximately 3300 pages remaining.  If  
folks are interested in a distributed approach to culling the rest of  
the spam, I can send out an 80kb file listing of the remaining pages.   
It would be helpful to have both "definite ham" and "definite spam"  
lists, especially in the foreign language pages and user pages, which  
are the toughest to figure out.  (e.g. What is the difference between  
French spam and French ham? Surely we have *some* legitimate Chinese  
contributors on the wiki?)

In my wild grepping it's possible I've blown away some good pages.   
I'm including my list of patterns below, so folks can identify major  
or obvious problems.  The sketchiest (but also the most effective) was  
eliminating pages with '(2b)', but I recognize that was a pretty broad  
stroke.

Of course, if anyone notices missing pages, please let me know and I  
will restore the page ASAP.

-Peter

---------------------------

*\(2b\)*
*gold*
*ffxi*
*ountertop*
granite*
Gold*
guild*wars*
*Hangzhou*
*hangzhou*
Injection*Molding*
lineage*2*
liuhecai*
Louis*Vuitton*
ltage*
Mabinogi*
maple*story*
Maple*Story*
ok????*
qq\(*
replica*
Rohan* rohan* ROHAN*
rs* RS* Rs*
runescape* Runescape*
(e2*
(e3*
(e4*
(e5*
(e6*
(e7*
(e8*
(e9*
tm?????*
xinggan*
zxcv*
cai*
*d0????*
hare*
Hj*
hj*
hk*
jack*
Lex*
seo*
SEO*
tema*
Tombstone*
usr*
*arhammer*
*arcraft*
*WoW*
*wow*
*WOW
www\(2e\)*
zg*
zhonggo*
315*
200{6,7,8,9}*
1878*
123*
13*
5*
6*
7*
Ajd*
baixiao*
China*
china*
game*
Game*
google*
Google*
GOOGLE*
kcc*
nobye*
oforu*
power*
Power*
tibet*
Tibet*
?urbocharger*
?holesale*
?rusher*




More information about the SciPy-Dev mailing list