[Python-Dev] Re: What to do about the Wiki?

Guido van Rossum guido@python.org
Wed, 31 Jul 2002 12:16:56 -0400


>     Guido> Juergen Hermann, Moinmoin's author, said he fixed a few things,
>     Guido> but also said that Moinmoin is essentially vulnerable to
>     Guido> "recursive wget" (e.g. someone trying to suck up the entire Wiki
>     Guido> by following links).  Apparently this is what brought the site
>     Guido> down this weekend -- if I understand correctly, an in-memory log
>     Guido> was growing too fast.
> 
> I'm a bit confused by these statements.  MoinMoin is a CGI script.  I don't
> understand where "recursive wget" and "in-memory log" would come into play.
> I recently fired up two Wikis on the Mojam server.  I never see any
> long-running process which would suggest there's an in-memory log which
> could grow without bound.  The MoinMoin package does generate HTTP
> redirects, but while they might coax wget into firing off another request,
> it should be handled by a separate MoinMoin process on the server side.  You
> should see the load grow significantly as the requests pour in, but
> shouldn't see any one MoinMoin process gobbling up all sorts of resources.
> Jürgen, can you elaborate on these themes a little more?

Juergen seems offline or too busy to respond.  Here's what he wrote on
the matter.  I guess he's reading the entire log into memory and
updating it there.

| Subject: [Pydotorg] wiki
| From: Juergen Hermann <jh@web.de>
| To: "pydotorg@python.org" <pydotorg@python.org>
| Date: Mon, 29 Jul 2002 20:32:31 +0200
| Hi!
| 
| I looked into the wiki, and two things killed us:
| 
| a) apart from google hits, some $!&%$""$% did a recursive wget. And the 
| wiki spans a rather wide uri space...
| 
| b) the event log grows much faster than I'm used to, thus some 
| "simple" algorithms don't hold for this size.
| 
| 
| Solutions: 
| 
| a) I just updated the wiki software, the current cvs contains a 
| robot/wget filter that forbids any access except to "view page" URIs 
| (i.e. we remain open to google, but no more open than absolutely 
| needed). If need be, we can forbid access altogether, or only allow 
| google.
| 
| b) I'll install a cron job that rotates the logs, to keep them short.
| 
| I shortened the logs manually for now. So if you all agree, we could 
| activate the wiki again.
| 
| 
| Ciao, Jürgen

Reading this again, I think we should give it a try again.

>     Guido> I believe that Juergen has fixed the log-growing problem.  Should
>     Guido> we enable the Wiki again and hope for the best?
> 
> With an XS4ALL person at the ready?  Perhaps someone can keep a window open
> on creosote running something like
> 
>     while true ; do
>         ps auxww | egrep python | sort -r -n -k 5,5 | head -1
> 	sleep 15
>     done
> 
> I'm running out for the next few hours.  I'll be happy to run the while loop
> when I return.

We'll watch it here.  I know who to write to have it rebooted.

--Guido van Rossum (home page: http://www.python.org/~guido/)