[Baypiggies] Handling unwanted Unicode \u2019 characters in XML
Chad Netzer
chad.netzer at gmail.com
Wed Jul 2 04:17:58 CEST 2008
On Tue, Jul 1, 2008 at 6:37 PM, Stephen McInerney
<spmcinerney at hotmail.com> wrote:
> For me, it's:
>>>> import locale
>>>> locale.getdefaultlocale()
> ('en_US', 'ISO8859-1')
>
> But should I be changing setdefaultlocale() ?
You need to execute all the statements. I'm having difficulty
understanding how the unicode literal U+2019 can map to U+00E2 like
you say.
Execute all these statements with cut-n-paste and give us the results:
a = u'\u2019'
b = u'\u00E2'
print a
print b
print a.encode('utf-8')
print b.encode('utf-8')
ord(a)
ord(b)
unichr(ord(a))
unichr(ord(b))
import sys
sys.maxunicode
sys.byteorder
It might be something trivial that I'm overlooking... Also, you
mentioned an exception when trying to print the literal? I assume it
was a UnicodeEncodeError? I'd like to see what it was, in any case.
Also, Windows, I assume (since it's ISO8859-1)? Could it somehow be
related to this?:
http://en.wikipedia.org/wiki/ISO_8859-1#The_ISO-8859-1.2FWindows-1252_mixup
C
More information about the Baypiggies
mailing list