tail

MRAB python at mrabarnett.plus.com
Sat May 7 12:58:05 EDT 2022


On 2022-05-07 17:28, Marco Sulla wrote:
> On Sat, 7 May 2022 at 16:08, Barry <barry at barrys-emacs.org> wrote:
>> You need to handle the file in bin mode and do the handling of line endings and encodings yourself. It’s not that hard for the cases you wanted.
> 
>>>> "\n".encode("utf-16")
> b'\xff\xfe\n\x00'
>>>> "".encode("utf-16")
> b'\xff\xfe'
>>>> "a\nb".encode("utf-16")
> b'\xff\xfea\x00\n\x00b\x00'
>>>> "\n".encode("utf-16").lstrip("".encode("utf-16"))
> b'\n\x00'
> 
> Can I use the last trick to get the encoding of a LF or a CR in any encoding?

In the case of UTF-16, it's 2 bytes per code unit, but those 2 bytes 
could be little-endian or big-endian.

As you didn't specify which you wanted, it defaulted to little-endian 
and added a BOM (U+FEFF).

If you specify which endianness you want with "utf-16le" or "utf-16be", 
it won't add the BOM:

 >>> # Little-endian.
 >>> "\n".encode("utf-16le")
b'\n\x00'
 >>> # Big-endian.
 >>> "\n".encode("utf-16be")
b'\x00\n'


More information about the Python-list mailing list