encoding - arabic(IBM 864) to UNICODE

Peter Otten __peter__ at web.de
Sun Mar 18 09:26:44 EDT 2007


Madhu Alagu wrote:

> How to convert IBM 864,IBM 420 & Nafitha(Arabic)  to UNICODE.

You can treat them like every other encoding:

>>> s = '\xe1\xec\xf0\xe8\xe1' # *
>>> print s.decode("cp864") # convert from cp864 to unicode
ﻓﻌﹽﻭﻓ
>>> s.decode("cp864").encode("utf8") # convert from cp864 to utf-8
'\xef\xbb\x93\xef\xbb\x8c\xef\xb9\xbd\xef\xbb\xad\xef\xbb\x93'

To read from or write to a file you can codecs.open() which allows you to
specify an encoding.

Peter

(*) these are arbitrary characters generated by 
"".join(chr(ord(c)+128) for c in "alpha") 
since I don't know Arabic.



More information about the Python-list mailing list