tail

Barry barry at barrys-emacs.org
Sun May 8 04:04:13 EDT 2022



> On 7 May 2022, at 17:29, Marco Sulla <Marco.Sulla.Python at gmail.com> wrote:
> 
> On Sat, 7 May 2022 at 16:08, Barry <barry at barrys-emacs.org> wrote:
>> You need to handle the file in bin mode and do the handling of line endings and encodings yourself. It’s not that hard for the cases you wanted.
> 
>>>> "\n".encode("utf-16")
> b'\xff\xfe\n\x00'
>>>> "".encode("utf-16")
> b'\xff\xfe'
>>>> "a\nb".encode("utf-16")
> b'\xff\xfea\x00\n\x00b\x00'
>>>> "\n".encode("utf-16").lstrip("".encode("utf-16"))
> b'\n\x00'
> 
> Can I use the last trick to get the encoding of a LF or a CR in any encoding?

In a word no.

There are cases that you just have to know the encoding you are working with.
utf-16 because you have deal with the data in 2 byte units and know if
it is big endian or little endian.

There will be other encoding that will also be difficult.

But if you are working with encoding that are using ASCII as a base,
like unicode encoded as utf-8 or iso-8859 series then you can just look
for NL and CR using the ASCII values of the byte.

In short once you set your requirements then you can know what problems
you can avoid and which you must solve.

Is utf-16 important to you? If not no need to solve its issues.

Barry





More information about the Python-list mailing list