tail

MRAB python at mrabarnett.plus.com
Sat May 7 15:19:04 EDT 2022


On 2022-05-07 19:35, Marco Sulla wrote:
> On Sat, 7 May 2022 at 19:02, MRAB <python at mrabarnett.plus.com> wrote:
> >
> > On 2022-05-07 17:28, Marco Sulla wrote:
> > > On Sat, 7 May 2022 at 16:08, Barry <barry at barrys-emacs.org> wrote:
> > >> You need to handle the file in bin mode and do the handling of line endings and encodings yourself. It’s not that hard for the cases you wanted.
> > >
> > >>>> "\n".encode("utf-16")
> > > b'\xff\xfe\n\x00'
> > >>>> "".encode("utf-16")
> > > b'\xff\xfe'
> > >>>> "a\nb".encode("utf-16")
> > > b'\xff\xfea\x00\n\x00b\x00'
> > >>>> "\n".encode("utf-16").lstrip("".encode("utf-16"))
> > > b'\n\x00'
> > >
> > > Can I use the last trick to get the encoding of a LF or a CR in any encoding?
> >
> > In the case of UTF-16, it's 2 bytes per code unit, but those 2 bytes
> > could be little-endian or big-endian.
> >
> > As you didn't specify which you wanted, it defaulted to little-endian
> > and added a BOM (U+FEFF).
> >
> > If you specify which endianness you want with "utf-16le" or "utf-16be",
> > it won't add the BOM:
> >
> >  >>> # Little-endian.
> >  >>> "\n".encode("utf-16le")
> > b'\n\x00'
> >  >>> # Big-endian.
> >  >>> "\n".encode("utf-16be")
> > b'\x00\n'
>
> Well, ok, but I need a generic method to get LF and CR for any
> encoding an user can input.
> Do you think that
>
> "\n".encode(encoding).lstrip("".encode(encoding))
>
> is good for any encoding?
'.lstrip' is the wrong method to use because it treats its argument as a 
set of characters, so it might strip off too many characters. A better 
choice is '.removeprefix'.
> Furthermore, is there a way to get the encoding of an opened file object?
>
How was the file opened?


If it was opened as a text file, use the '.encoding' attribute (which 
just tells you what encoding was specified when it was opened, and you'd 
be assuming that it's the correct one).


If it was opened as a binary file, all you know is that it contains 
bytes, and determining the encoding (assuming that it is a text file) is 
down to heuristics (i.e. guesswork).



More information about the Python-list mailing list