how can I convert invalid ASCII string to Unicode?
skip at pobox.com
skip at pobox.com
Tue May 8 23:29:20 EDT 2001
I have been blissfully ignoring Unicode. Alas, my bliss has been so rudely
interrupted...
Suppose I have this string:
s = "ö" # "o" with an umlaut
and I'd like to convert it to UTF-8. (I know I can preface string literals
with 'u', but that's not an option here. Pretend s was assigned from a file
read.)
Simply executing
u = unicode(s)
fails because ord(s) is > 127. I eventually figured out that the following
would work:
u = "".join([unichr(ord(c)) for c in s])
but this seems a bit obscure. Is there a cleaner way to convert plain
strings containing characters > 127 to UTF-8? Ideally I guess I'd like
plain strings to be interpreted as Latin-1 instead of ASCII by default, even
though my locale is 'murican.
Thx,
--
Skip Montanaro (skip at pobox.com)
(847)971-7098
More information about the Python-list
mailing list