[Baypiggies] Handling unwanted Unicode \u2019 characters in XML

Wed Jul 2 04:17:58 CEST 2008

On Tue, Jul 1, 2008 at 6:37 PM, Stephen McInerney
<spmcinerney at hotmail.com> wrote:

> For me, it's:
>>>> import locale
>>>> locale.getdefaultlocale()
> ('en_US', 'ISO8859-1')
>
> But should I be changing setdefaultlocale() ?

You need to execute all the statements.  I'm having difficulty
understanding how the unicode literal U+2019 can map to U+00E2 like
you say.

Execute all these statements with cut-n-paste and give us the results:

a = u'\u2019'
b = u'\u00E2'
print a
print b
print a.encode('utf-8')
print b.encode('utf-8')
ord(a)
ord(b)
unichr(ord(a))
unichr(ord(b))
import sys
sys.maxunicode
sys.byteorder

It might be something trivial that I'm overlooking...  Also, you
mentioned an exception when trying to print the literal?  I assume it
was a UnicodeEncodeError?  I'd like to see what it was, in any case.

Also, Windows, I assume (since it's ISO8859-1)?  Could it somehow be
related to this?:

http://en.wikipedia.org/wiki/ISO_8859-1#The_ISO-8859-1.2FWindows-1252_mixup

C