[Moin-user] question about encoding inside macros - moin 1.3.4
Nir Soffer
nirs at actcom.net.il
Sun Jul 10 12:46:39 EDT 2005
On 10 Jul, 2005, at 22:24, Alan Ezust wrote:
> I have a macro which gets run from moin, which reads an HTML file,
> does some transformations onto it, and then outputs it. I think it
> worked fine with moin 1.2 but with 1.3, I get these little diamonds
> wherever there were in the input file.
>
> My transformation is just some simple regexps on attribute values.
> You can see the result page at
> http://cartan.cas.suffolk.edu/moin/OopDocbook
>
> There seems to be something lost in the translation however, because
> characters in the input file (charset=ISO-8859-1) show up as
> \xa0 when I print them out from python, and after I return it from the
> macro, they appear as little diamonds with questionmarks inside them,
> from the resultant wiki page.
>
> What is the right way to read and write out a file so that HTML
> entites are preserved?
You must work with Unicode texts.
Lets assume your html is using iso-8859-1 charset:
html = unicode(html, 'iso-8859-1', 'replace')
Now process your html. When you process the unicode text, you might
want to compile your res with re.U. Last, write out the output. Moin
will encode it for you to utf-8, you don't have to worry about that.
Best Regards,
Nir Soffer
More information about the Moin-user
mailing list