Guessing the encoding from a BOM
Chris Angelico
rosuav at gmail.com
Thu Jan 16 00:01:56 EST 2014
On Thu, Jan 16, 2014 at 1:13 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> if sig.startswith((b'\xFE\xFF', b'\xFF\xFE')):
> return 'utf_16'
> elif sig.startswith((b'\x00\x00\xFE\xFF', b'\xFF\xFE\x00\x00')):
> return 'utf_32'
I'd swap the order of these two checks. If the file starts FF FE 00
00, your code will guess that it's UTF-16 and begins with a U+0000.
ChrisA
More information about the Python-list
mailing list