OT: MoinMoin and Mediawiki?

Ian Bicking ianb at colorstudy.com
Wed Jan 12 15:41:02 EST 2005


Paul Rubin wrote:
> Paul Rubin <http://phr.cx@NOSPAM.invalid> writes:
> 
>>>>How does it do that?  It has to scan every page in the entire wiki?!
>>>>That's totally impractical for a large wiki.
>>>
>>>So you want to say that c2 is not a large wiki? :-)
>>
>>I don't know how big c2 is.  My idea of a large wiki is Wikipedia.
>>My guess is that c2 is smaller than that.
> 
> 
> I just looked at c2; it has about 30k pages (I'd call this medium
> sized) and finds incoming links pretty fast.  Is it using MoinMoin?
> It doesn't look like other MoinMoin wikis that I know of.  I'd like to
> think it's not finding those incoming links by scanning 30k separate
> files in the file system.

c2 is the Original Wiki, i.e., the first one ever, and the system that 
coined the term.  It's written in Perl.  It's a definitely not an 
advanced Wiki, and it's generally relied on social rather than technical 
solutions to problems.  Which might be a Wiki principle in itself. 
While I believe it used full text searches for things like backlinks in 
the past, I believe it uses some kind of index now.

> Sometimes I think a wiki could get by with just a few large files.
> Have one file containing all the wiki pages.  When someone adds or
> updates a page, append the page contents to the end of the big file.
> That might also be a good time to pre-render it, and put the rendered
> version in the big file as well.  Also, take note of the byte position
> in the big file (e.g. with ftell()) where the page starts.  Remember
> that location in an in-memory structure (Python dict) indexed on the
> page name.  Also, append the info to a second file.  Find the location
> of that entry and store it in the in-memory structure as well.  Also,
> if there was already a dict entry for that page, record a link to the
> old offset in the 2nd file.  That means the previous revisions of a
> file can be found by following the links backwards through the 2nd
> file.  Finally, on restart, scan the 2nd file to rebuild the in-memory
> structure.

That sounds like you'd be implementing your own filesystem ;)

If you are just trying to avoid too many files in a directory, another 
option is to put files in subdirectories like:

base = struct.pack('i', hash(page_name))
base = base.encode('base64').strip().strip('=')
filename = os.path.join(base, page_name)

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org



More information about the Python-list mailing list