Q: a simple(?) raw-utf-8 conversion to internal type unicode "\304\246\311\231\316\257\316\271\303\222"

NevilleDNZ nevillednz at gmail.com
Sun Dec 31 21:07:41 EST 2006


Hi,

Apologies first as I am not a unicode expert.... indeed I the details
probably totally elude me.  Not withstanding:  how can I convert a
binary string containing UTF-8 binary into a python unicode string?

cutdown example:
$ cat ./uc.py
#!/usr/bin/env python
imported="\304\246\311\231\316\257\316\271\303\222
\317\216\317\203\305\224\304\271\304\220"
print "English/ASCII quoting:",'"'+imported+'"',"SUCCEEDS :-)" # xterm
encoding if UTF8
print "German/ALCOR quoting:",u"\N{runic cross punctuation}"+"test"
+"\N{runic cross punctuation}","AOK :-)"
print "German/ALCOR quoting:",u"\N{runic cross
punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-("

$ ./uc.py
English/ASCII quoting: "ĦəίιÒ ώσŔĹĐ" SUCCEEDS :-)
German/ALCOR quoting: ᛭test᛭ AOK :-)
German/ALCOR quoting:
Traceback (most recent call last):
  File "./uc.py", line 5, in <module>
    print "German/ALCOR quoting:",u"\N{runic cross
punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-("
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0:
ordinal not in range(128)

The last print statement fails because the ascii "imported" characters
are 8 bit encoded UTF-8 and dont know it! How do I tell "imported" that
it is actually already UTF-8 unicode?

Cheers
NevilleDNZ




More information about the Python-list mailing list