UTF-8 and latin1

Jon Ribbens jon+usenet at unequivocal.eu
Thu Aug 18 12:53:31 EDT 2022


On 2022-08-18, Tobiah <toby at tobiah.org> wrote:
>> Generally speaking browser submisisons were/are supposed to be sent
>> using the same encoding as the page, so if you're sending the page
>> as "latin1" then you'll see that a fair amount I should think. If you
>> send it as "utf-8" then you'll get 100% utf-8 back.
>
> The only trick I know is to use <meta charset="utf-8">.  Would
> that 'send' the post as utf-8?  I always expected it had more
> to do with the way the user entered the characters.  How do
> they by the way, enter things like Montréal, Quebéc.  When they
> enter that into a text box on a web page can we say it's in
> a particular encoding at that time?  At submit time?

You configure the web server to send:

    Content-Type: text/html; charset=...

in the HTTP header when it serves HTML files. Another way is to put:

    <meta http-equiv="content-type" content="text/html; charset=...">

or:

    <meta charset="...">

in the <head> section of your HTML document. The HTML "standard"
nowadays says that you are only allowed to use the "utf-8" encoding,
but if you use another encoding then browsers will generally use that
as both the encoding to use when reading the HTML file and the encoding
to use when submitting form data.


More information about the Python-list mailing list