UTF-8 and latin1

Tue Oct 25 13:59:06 EDT 2022

> On 25 Oct 2022, at 11:16, Stefan Ram <ram at zedat.fu-berlin.de> wrote:
> 
> ram at zedat.fu-berlin.de (Stefan Ram) writes:
>> You can let Python guess the encoding of a file.
>> def encoding_of( name ):
>> path = pathlib.Path( name )
>> for encoding in( "utf_8", "cp1252", "latin_1" ):
>> try:
>> with path.open( encoding=encoding, errors="strict" )as file:
> 
>  I also read a book which claimed that the tkinter.Text
>  widget would accept bytes and guess whether these are
>  encoded in UTF-8 or "ISO 8859-1" and decode them 
>  accordingly. However, today I found that here it does 
>  accept bytes but it always guesses "ISO 8859-1".

The best you can do is assume that if the text cannot decode as utf-8 it may be 8859-1.

Barry

> 
>  main.py
> 
> import tkinter
> 
> text = tkinter.Text()
> text.insert( tkinter.END, "AÄäÖöÜüß".encode( encoding='ISO 8859-1' ))
> text.insert( tkinter.END, "AÄäÖöÜüß".encode( encoding='UTF-8' ))
> text.pack()
> print( text.get( "1.0", "end" ))
> 
>  output
> 
> AÄäÖöÜüßAÃ„Ã¤Ã–Ã¶ÃœÃ¼ÃŸ
> 
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list