[issue7651] Python3: guess text file charset using the BOM
STINNER Victor
report at bugs.python.org
Fri Jan 8 00:18:51 CET 2010
STINNER Victor <victor.stinner at haypocalc.com> added the comment:
open_bom.patch is the proof of concept. It only works in read mode. The idea is to delay the creation of the encoding and the decoder. We wait for just after the first read_chunk().
The patch changes the default behaviour of open(): if the file starts with a BOM, the BOM is used but skipped. Example:
-------------
from _pyio import open
with open('test.txt', 'w', encoding='utf-8-sig') as fp:
print("abc", file=fp)
print("d\xe9f", file=fp)
with open('test.txt', 'r') as fp:
print("open().read(): {!r}".format(fp.read()))
-------------
Unpatched Python displays '\ufeffabc\ndéf\n', whereas patched Python displays 'abc\ndéf\n'.
----------
keywords: +patch
Added file: http://bugs.python.org/file15782/open_bom.patch
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue7651>
_______________________________________
More information about the Python-bugs-list
mailing list