tail

Marco Sulla Marco.Sulla.Python at gmail.com
Sat May 7 14:35:34 EDT 2022


On Sat, 7 May 2022 at 19:02, MRAB <python at mrabarnett.plus.com> wrote:
>
> On 2022-05-07 17:28, Marco Sulla wrote:
> > On Sat, 7 May 2022 at 16:08, Barry <barry at barrys-emacs.org> wrote:
> >> You need to handle the file in bin mode and do the handling of line endings and encodings yourself. It’s not that hard for the cases you wanted.
> >
> >>>> "\n".encode("utf-16")
> > b'\xff\xfe\n\x00'
> >>>> "".encode("utf-16")
> > b'\xff\xfe'
> >>>> "a\nb".encode("utf-16")
> > b'\xff\xfea\x00\n\x00b\x00'
> >>>> "\n".encode("utf-16").lstrip("".encode("utf-16"))
> > b'\n\x00'
> >
> > Can I use the last trick to get the encoding of a LF or a CR in any encoding?
>
> In the case of UTF-16, it's 2 bytes per code unit, but those 2 bytes
> could be little-endian or big-endian.
>
> As you didn't specify which you wanted, it defaulted to little-endian
> and added a BOM (U+FEFF).
>
> If you specify which endianness you want with "utf-16le" or "utf-16be",
> it won't add the BOM:
>
>  >>> # Little-endian.
>  >>> "\n".encode("utf-16le")
> b'\n\x00'
>  >>> # Big-endian.
>  >>> "\n".encode("utf-16be")
> b'\x00\n'

Well, ok, but I need a generic method to get LF and CR for any
encoding an user can input.
Do you think that

"\n".encode(encoding).lstrip("".encode(encoding))

is good for any encoding? Furthermore, is there a way to get the
encoding of an opened file object?


More information about the Python-list mailing list