From python to LaTeX in emacs on windows

Benjamin Niemann b.niemann at betternet.de
Mon Aug 30 07:55:18 EDT 2004


Brian Elmegaard wrote:
> Hi group
> 
> I hope this is not a faq...
> 
> I try to understand how to use the new way of specifying a files
> encoding, but no matter what I do I get strange characters in the
> output. 
> 
> I have a text file which I have generated in python by parsing some
> html.
> 
> In the file there is international characters like é and ó.
> I can see the file in emacs it is encoded as 
> mule-utf-8-dos
> 
> I read the file into python as a string and suddenly the characters
> when printed looks strange and consists of two characters. 
> 
> First problem: How do I avoid this?
 >
 > Second problem is that I make some string replacements and more in
 > the string to write a latex output file. When I open this file in
 > emacs the characters now are not the same?
 >
 > Second problem: How do I avoid this?

When you read the filecontents in python, you'll have the "raw" byte 
sequence, in this case it is the UTF-8 encoding of unicode text. But you 
probably want a unicode string. Use "text = unicode(data, 'utf-8')" 
where "data" is the filecontent you read. After processing you probably 
want to write it back to a file. Before you do this, you will have to 
convert the unicode string back to a byte sequence. Use "data = 
text.encode('utf')".

Handling character encodings correctly *is* difficult. It's no shame, if 
you don't get it right on the first attempt.



More information about the Python-list mailing list