[Moin-user] question about encoding inside macros - moin 1.3.4

Nir Soffer nirs at actcom.net.il
Sun Jul 10 12:46:39 EDT 2005


On 10 Jul, 2005, at 22:24, Alan Ezust wrote:

> I have a macro which gets run from moin, which reads an HTML file, 
> does some transformations onto it, and then outputs it. I think it 
> worked fine with moin 1.2 but with 1.3, I get these little diamonds 
> wherever there were   in the input file.
>
>  My transformation is just some simple regexps on attribute values. 
> You can see the result page at 
> http://cartan.cas.suffolk.edu/moin/OopDocbook
>
>  There seems to be something lost in the translation however, because 
>   characters in the input file (charset=ISO-8859-1) show up as 
> \xa0 when I print them out from python, and after I return it from the 
> macro, they appear as little diamonds with questionmarks inside them, 
> from the resultant wiki page.
>
>  What is the right way to read and write out a file so that HTML 
> entites are preserved?
You must work with Unicode texts.

Lets assume your html is using iso-8859-1 charset:
html = unicode(html, 'iso-8859-1', 'replace')

Now process your html. When you process the unicode text, you might 
want to compile your res with re.U. Last, write out the output. Moin 
will encode it for you to utf-8, you don't have to worry about that.


Best Regards,

Nir Soffer





More information about the Moin-user mailing list