UTF-8 and latin1

Dennis Lee Bieber wlfraed at ix.netcom.com
Thu Aug 18 21:38:55 EDT 2022


On Thu, 18 Aug 2022 11:33:59 -0700, Tobiah <toby at tobiah.org> declaimed the
following:

>
>So how does this break down?  When a person enters
>Montréal, Quebéc into a form field, what are they
>doing on the keyboard to make that happen?  As the
>string sits there in the text box, is it latin1, or utf-8
>or something else?  How does the browser know what
>sort of data it has in that text box?
>

	If this were my ancient Amiga -- most of the accented characters in
ISO-Latin-1 were entered by using one of the meta/alt keys simultaneously
with one of five or six designated "dead keys" (in days of typewriters, a
dead key was one that did not advance the carriage to the next character
space). The dead key indicated which accent mark was to be applied to the
subsequent "regular" character.

	On Windows, many of the characters might be entered using <alt>####
(where #### are keys on the numeric pad!)  (such as <alt>1254 => µ).

	As for what the browser receives? Unless the browser is asking for raw
key codes and translating them internally to some encoding, it is likely
receiving characters in whatever encoding has been defined for the
computer/OS (Windows, most likely CP1252, which is a superset of latin-1 as
I recall). Whether the browser then re-encodes that to UTF-8 is something I
can't answer.



-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
	wlfraed at ix.netcom.com    http://wlfraed.microdiversity.freeddns.org/


More information about the Python-list mailing list