encoding - arabic(IBM 864) to UNICODE
Peter Otten
__peter__ at web.de
Sun Mar 18 09:26:44 EDT 2007
Madhu Alagu wrote:
> How to convert IBM 864,IBM 420 & Nafitha(Arabic) to UNICODE.
You can treat them like every other encoding:
>>> s = '\xe1\xec\xf0\xe8\xe1' # *
>>> print s.decode("cp864") # convert from cp864 to unicode
ﻓﻌﹽﻭﻓ
>>> s.decode("cp864").encode("utf8") # convert from cp864 to utf-8
'\xef\xbb\x93\xef\xbb\x8c\xef\xb9\xbd\xef\xbb\xad\xef\xbb\x93'
To read from or write to a file you can codecs.open() which allows you to
specify an encoding.
Peter
(*) these are arbitrary characters generated by
"".join(chr(ord(c)+128) for c in "alpha")
since I don't know Arabic.
More information about the Python-list
mailing list