Finding a \u0096
Gustaf Liljegren
gustafl at algonet.se
Wed Dec 4 09:28:14 EST 2002
I'm using Python to automate some mechanisms in a Word to XML
conversion. The XML file should be encoded in UTF-8. Since Word is
using Microsoft's "ANSI" character set and I want Unicode in UTF-8,
some characters need to be replaced. All these characters reside in
the C1 interval in Unicode (i.e. between DEL and NBSP in Latin 1).
When I try to replace these characters,
text = string.replace(text, '\u0096', '–') # En dash
Python doesn't recognize them. I have to write it in Greek to get
Python to understand what I mean:
text = string.replace(text, '–', '–') # En dash
The first quote is: 'a' with circumflex, 'euro' and a right-slanted
double quote. It works, but it's ugly. Isn't there a better way to
write this?
Gustaf
More information about the Python-list
mailing list