UTF-8 and latin1

Jon Ribbens jon+usenet at unequivocal.eu
Wed Aug 17 20:11:28 EDT 2022


On 2022-08-17, Tobiah <toby at tobiah.org> wrote:
>> That has already been decided, as much as it ever can be. UTF-8 is
>> essentially always the correct encoding to use on output, and almost
>> always the correct encoding to assume on input absent any explicit
>> indication of another encoding. (e.g. the HTML "standard" says that
>> all HTML files must be UTF-8.)

> I got an email from a client with blast text that
> was in French with stuff like: Montréal, Quebéc.
> latin1 did the trick.

There's no accounting for the Québécois. They think they speak French.

> Also, whenever I get a spreadsheet from a client and save as .csv,
> or take browser data through PHP, it always seems to work with latin1,
> but not UTF-8.

That depends on how you "saved as .csv" and what you did with PHP.
Generally speaking browser submisisons were/are supposed to be sent
using the same encoding as the page, so if you're sending the page
as "latin1" then you'll see that a fair amount I should think. If you
send it as "utf-8" then you'll get 100% utf-8 back.


More information about the Python-list mailing list