UTF-8 and latin1

Tue Oct 25 17:05:09 EDT 2022

On Wed, 26 Oct 2022 at 05:09, Barry Scott <barry at barrys-emacs.org> wrote:
>
>
>
> > On 25 Oct 2022, at 11:16, Stefan Ram <ram at zedat.fu-berlin.de> wrote:
> >
> > ram at zedat.fu-berlin.de (Stefan Ram) writes:
> >> You can let Python guess the encoding of a file.
> >> def encoding_of( name ):
> >> path = pathlib.Path( name )
> >> for encoding in( "utf_8", "cp1252", "latin_1" ):
> >> try:
> >> with path.open( encoding=encoding, errors="strict" )as file:
> >
> >  I also read a book which claimed that the tkinter.Text
> >  widget would accept bytes and guess whether these are
> >  encoded in UTF-8 or "ISO 8859-1" and decode them
> >  accordingly. However, today I found that here it does
> >  accept bytes but it always guesses "ISO 8859-1".
>
> The best you can do is assume that if the text cannot decode as utf-8 it may be 8859-1.
>

Except when it's Windows-1252.

ChrisA