Guessing the encoding from a BOM
Chris Angelico
rosuav at gmail.com
Thu Jan 16 13:06:16 EST 2014
On Fri, Jan 17, 2014 at 5:01 AM, Björn Lindqvist <bjourne at gmail.com> wrote:
> 2014/1/16 Steven D'Aprano <steve+comp.lang.python at pearwood.info>:
>> def guess_encoding_from_bom(filename, default):
>> with open(filename, 'rb') as f:
>> sig = f.read(4)
>> if sig.startswith((b'\xFE\xFF', b'\xFF\xFE')):
>> return 'utf_16'
>> elif sig.startswith((b'\x00\x00\xFE\xFF', b'\xFF\xFE\x00\x00')):
>> return 'utf_32'
>> else:
>> return default
>
> You might want to add the utf8 bom too: '\xEF\xBB\xBF'.
I'd actually rather not. It would tempt people to pollute UTF-8 files
with a BOM, which is not necessary unless you are MS Notepad.
ChrisA
More information about the Python-list
mailing list