[moin-user] [Moin-user] Creating a static copy of a MoinMoin site
Paul Boddie
paul at boddie.org.uk
Wed Jul 25 12:15:13 EDT 2018
On Tuesday 12. April 2016 17.03.07 Paul Waring wrote:
> On Tue, Apr 12, 2016 at 02:21:16PM +0000, Roger Haase wrote:
> > It has been a few years since I have used this, but
> > try: HelpOnMoinCommand/ExportDump - MoinMoin
> >
> > HelpOnMoinCommand/ExportDump - MoinMoin
>
> I've tried that but there are two big problems with the export dump:
>
> 1. It adds a .html extension to all pages, meaning that all external
> links (i.e. those linking to the wiki from another site) break.
>
> 2. It changes a forward slash in URLs to (2f), e.g. About/Contact
> because About(2f)Contact.html (again breaking URLs)
Sorry to dredge this thread up again from over two years ago, but after having
the need to look into this kind of thing myself, I will admit that the output
is not particularly easy to work with. Hopefully, some more remarks might be
helpful to others in a similar situation.
My original reply to this indicated that it might be possible to configure the
software to do the right thing, but in the situation where one just needs to
get the content exported, it is not so easy to figure this out. Indeed, I
ended up making changes to the export dump program just to get things done:
MoinMoin/script/export/dump.py
First of all, the .html extension can be removed by setting HTML_SUFFIX to "".
I think that should deal with this particular problem decisively. However, we
are then left with the page name rewriting that also affects the links in
documents.
Changing the fundamental mechanism for encoding page names can have the
undesirable result of creating phantom page directories in the wiki itself. I
also strongly dislike the way the export dump program "monkeypatches"
wikiutil:
wikiutil.quoteWikinameURL = lambda pagename, qfn=wikiutil.quoteWikinameFS:
(qfn(pagename) + HTML_SUFFIX)
I removed this, in fact, because it seems to me that there is some conflation
between page names and link targets, and indeed the source states the
following assumption:
# we have the same name in URL and FS
So the export dump program treats the files containing the page data and the
names used in links as one and the same thing. This obviously exposes ".html"
all over the place (until we fix that, as noted above) and inserts the "(xx)"
sequences, thus breaking all external links to the content.
(I suppose that Web server rewrite rules might be able to help with this, but
such functionality is generally horrid as well as being rather poorly
documented.)
The slash-escaping is unhelpful but perhaps necessary in the sense that
Apache, at least, always treats slashes as path separators, and there is no
apparently easy way to rewrite them: the interpretation happens too early in
the request processing. I spent ages looking at this because the documentation
doesn't obviously mention it.
There are also issues with providing appropriate URLs for the static resources
and generating the logo. I also don't particularly see the need to generate
all the underlay pages, given that most of them are help for Moin or other
things that don't contribute to a static site, or indeed the navigation bar.
In the end, despite wanting only static files to deploy, I had to accept
filesystem-encoded names, and so I wrote a script to rewrite URLs from their
proper form to the filesystem encoding, loading and sending the content within
my script. The solution becomes more like a simple script backed by static
resources than a proper static site.
Here is an example of the result:
http://projects.boddie.org.uk/
(I reserve the right to migrate this to content not produced using export
dump, obviously, but since the source content is in Moin format, I intend the
output to be broadly similar no matter what I end up using.)
Now for some background and another, related topic...
What made me investigate this was some worrying problems with cache files
being created within my locked-down wikis, plus general log file growth.
Although I arrived at a solution using export dump, it feels like a nasty
hack. Consequently, I have been working on tools to parse Moin content and to
produce static output.
At some point, I will start using these new tools actively for sites where I
don't need the through-the-Web editing facilities of Moin. I also intend to
release the code and maybe build on it for other projects. Providing people
with acceptable migration paths and independent tools is the right thing to
do, I feel.
Anyway, I hope this was helpful, and apologies if this is an unwelcome return
to a topic no longer of interest.
Paul
More information about the moin-user
mailing list