Finding a \u0096

Gustaf Liljegren gustafl at algonet.se
Wed Dec 4 09:28:14 EST 2002


I'm using Python to automate some mechanisms in a Word to XML
conversion. The XML file should be encoded in UTF-8. Since Word is
using Microsoft's "ANSI" character set and I want Unicode in UTF-8,
some characters need to be replaced. All these characters reside in
the C1 interval in Unicode (i.e. between DEL and NBSP in Latin 1).

When I try to replace these characters,

  text = string.replace(text, '\u0096', '–')  # En dash

Python doesn't recognize them. I have to write it in Greek to get
Python to understand what I mean:

  text = string.replace(text, '–', '–')  # En dash

The first quote is: 'a' with circumflex, 'euro' and a right-slanted
double quote. It works, but it's ugly. Isn't there a better way to
write this?

Gustaf




More information about the Python-list mailing list