OT: MoinMoin and Mediawiki?

Paul Rubin http
Wed Jan 12 18:54:07 EST 2005


Ian Bicking <ianb at colorstudy.com> writes:
> If the data has to be somewhere, and you have to have relatively
> random access to it (i.e., access any page; not necessarily a chunk of
> a page), then the filesystem does that pretty well, with lots of good
> features like caching and whatnot.  I can't see a reason not to use
> the filesystem, really.

For one thing you waste lots of space for small files because of
partially empty blocks at the end of each page.  Sure, disk space is
cheap, but you similarly waste space in your ram cache, which impacts
performance and isn't so cheap.  For another, you need multiple seeks
to get to each file (scan the directory to get the inode number, read
the inode, get the list of indirect blocks from the file, read each of
those, etc).  With big files, the inode and indirect blocks will be
cached, so you only have to seek once.  Finally, you lose some control
over what's in ram and what needs to be retrieved.  You can do a
better job of tuning your cache strategy to the precise needs of your
wiki, than the file system can with its one-size-fits-all approach.

> For backlink indexing, that's a relatively easy index to maintain
> manually, simply by scanning pages whenever they are modified.  The
> result of that indexing can be efficiently put in yet another file
> (well, maybe one file per page).

Opening and closing the extra files imposes considerable overhead,
though it would take actual benchmarking to get precise figures.

> For full text search, you'll want already-existing code to do it for
> you.  MySQL contains such code.  But there's also lots of that
> software that works well on the filesystem to do the same thing.

Have you ever used the MySQL fulltext search feature?  It's awful.  

> A database would be important if you wanted to do arbitrary queries
> combining several sources of data.  And that's certainly possible in a
> wiki, but that's not so much a scaling issue as a
> flexibility-in-reporting issue.

An RDBMS is a good backend for a medium sized wiki, since it takes
care of so many issues for you.  For a very big wiki that needs high
performance, there are better approaches, though they take more work.



More information about the Python-list mailing list