Is this a bug? BOM decoded with UTF8
pekka niiranen
pekka.niiranen at wlanmail.com
Thu Feb 10 11:58:50 EST 2005
Hi there,
I have two files "my.utf8" and "my.utf16" which
both contain BOM and two "a" characters.
Contents of "my.utf8" in HEX:
EFBBBF6161
Contents of "my.utf16" in HEX:
FEFF6161
For some reason Python2.4 decodes the BOM for UTF8
but not for UTF16. See below:
>>> fh = codecs.open("my.uft8", "rb", "utf8")
>>> fh.readlines()
[u'\ufeffaa'] # BOM is decoded, why
>>> fh.close()
>>> fh = codecs.open("my.utf16", "rb", "utf16")
>>> fh.readlines()
[u'\u6161'] # No BOM here
>>> fh.close()
Is there a trick to read UTF8 encoded file with BOM not decoded?
-pekka-
More information about the Python-list
mailing list