[pydotorg-www] FWD: Wiki Internal Server Error

Thu Aug 12 20:35:39 CEST 2010

On Wed, Aug 11, 2010 at 2:19 AM, Steve Holden <steve at holdenweb.com> wrote:
> On 8/10/2010 7:44 PM, "Martin v. Löwis" wrote:
>>>> So unless I hear any objection RSN, I start moving it.
>>>
>>> Jumping in late..
>>>
>>> Can we recreate this issue?
>>
>> Assuming "this issue" is the Wiki consuming a lot of time: sure. Just
>> edit some page, and try to save the changes.
>>
>>> Can I get sw versions of the app stack.. apache x.x, moinmoin x,x, python, etc..
>>
>> That will take a while to collect. In short, it's all "debian stable".
>>
>>> following the thread I am left to believe that it is a python process
>>> that is burning CPU and has nothing to with moin or http
>>
>> It is definitely the moin-wsgi process that is burning the CPU.
>>
>>> I've been a volunteer for pydotorg for a couple years, but have been
>>> M.I.A for awhile. I am back and wanting to help with
>>> sysadmin/pydotorg.
>>
>>>From looking at the subversion logs, I know I added your key. But I
>> don't recall why I did - can you please remind me?
>>
>> Regards,
>> Martin
>>
> Let's also remember that Radomir Dopieralski recently joined the team
> with the specific intention of nursemaiding the MoinMoin Wiki. Whether
> there is anything he can do I don't know (though I kind of doubt it),
> but he's a resource we should not ignore.

One particularly expensive thing in MoinMoin is displaying the
category pages, as it involves
a full search of the whole wiki. We have introduced a number of
partial solutions that are supposed
to mitigate the issue, from xapian indexing to using cached versions
of the the search macro on the
category pages. I can see that most of the category pages on the
python wiki were created with
earlier version of MoinMoin and still use the FullSearch macro,
instead of FullSearchCached --
changing this should help considerably, at the expense of not having
the changes in categories
show up on the category pages immediately.

Another thing that I discovered in our company is that often the
web-crawling robots will account for
a lot of traffic. MoinMoin already blocks them from most of the
expensive actions, but adding some of
the pages (notably all pages with canned searches) to robots.txt should help.

Last but not least, we have excellent experiences with using Varnish,
a caching reverse-proxy. It does an
excellent job caching most of the page views for not logged-in users
-- which is usually the majority of traffic.

I can start changing the macros on the category pages, and prepare a
robots.txt file with a list of
potentially expensive pages -- if those solutions are accepted. I can
also help configuring Varnish, or just
provide example configuration.

-- 
Radomir Dopieralski, http://sheep.art.pl