[moin-user] [Moin-user] Creating a static copy of a MoinMoin site

Paul Boddie paul at boddie.org.uk
Wed Jul 25 12:15:13 EDT 2018


On Tuesday 12. April 2016 17.03.07 Paul Waring wrote:
> On Tue, Apr 12, 2016 at 02:21:16PM +0000, Roger Haase wrote:
> >    It has been a few years since I have used this, but
> >    try: HelpOnMoinCommand/ExportDump - MoinMoin
> >    
> >     HelpOnMoinCommand/ExportDump - MoinMoin
> 
> I've tried that but there are two big problems with the export dump:
> 
> 1. It adds a .html extension to all pages, meaning that all external
> links (i.e. those linking to the wiki from another site) break.
> 
> 2. It changes a forward slash in URLs to (2f), e.g. About/Contact
> because About(2f)Contact.html (again breaking URLs)

Sorry to dredge this thread up again from over two years ago, but after having 
the need to look into this kind of thing myself, I will admit that the output 
is not particularly easy to work with. Hopefully, some more remarks might be 
helpful to others in a similar situation.

My original reply to this indicated that it might be possible to configure the 
software to do the right thing, but in the situation where one just needs to 
get the content exported, it is not so easy to figure this out. Indeed, I 
ended up making changes to the export dump program just to get things done:

MoinMoin/script/export/dump.py

First of all, the .html extension can be removed by setting HTML_SUFFIX to "". 
I think that should deal with this particular problem decisively. However, we 
are then left with the page name rewriting that also affects the links in 
documents.

Changing the fundamental mechanism for encoding page names can have the 
undesirable result of creating phantom page directories in the wiki itself. I 
also strongly dislike the way the export dump program "monkeypatches" 
wikiutil:

wikiutil.quoteWikinameURL = lambda pagename, qfn=wikiutil.quoteWikinameFS: 
(qfn(pagename) + HTML_SUFFIX)

I removed this, in fact, because it seems to me that there is some conflation 
between page names and link targets, and indeed the source states the 
following assumption:

# we have the same name in URL and FS

So the export dump program treats the files containing the page data and the 
names used in links as one and the same thing. This obviously exposes ".html" 
all over the place (until we fix that, as noted above) and inserts the "(xx)" 
sequences, thus breaking all external links to the content.

(I suppose that Web server rewrite rules might be able to help with this, but 
such functionality is generally horrid as well as being rather poorly 
documented.)

The slash-escaping is unhelpful but perhaps necessary in the sense that 
Apache, at least, always treats slashes as path separators, and there is no 
apparently easy way to rewrite them: the interpretation happens too early in 
the request processing. I spent ages looking at this because the documentation 
doesn't obviously mention it.

There are also issues with providing appropriate URLs for the static resources 
and generating the logo. I also don't particularly see the need to generate 
all the underlay pages, given that most of them are help for Moin or other 
things that don't contribute to a static site, or indeed the navigation bar.

In the end, despite wanting only static files to deploy, I had to accept 
filesystem-encoded names, and so I wrote a script to rewrite URLs from their 
proper form to the filesystem encoding, loading and sending the content within 
my script. The solution becomes more like a simple script backed by static 
resources than a proper static site.

Here is an example of the result:

http://projects.boddie.org.uk/

(I reserve the right to migrate this to content not produced using export 
dump, obviously, but since the source content is in Moin format, I intend the 
output to be broadly similar no matter what I end up using.)

Now for some background and another, related topic...

What made me investigate this was some worrying problems with cache files 
being created within my locked-down wikis, plus general log file growth. 
Although I arrived at a solution using export dump, it feels like a nasty 
hack. Consequently, I have been working on tools to parse Moin content and to 
produce static output.

At some point, I will start using these new tools actively for sites where I 
don't need the through-the-Web editing facilities of Moin. I also intend to 
release the code and maybe build on it for other projects. Providing people 
with acceptable migration paths and independent tools is the right thing to 
do, I feel.

Anyway, I hope this was helpful, and apologies if this is an unwelcome return 
to a topic no longer of interest.

Paul


More information about the moin-user mailing list