translating foreign data

Marko Rauhamaa marko at pacujo.net
Sat Jun 23 10:10:08 EDT 2018


Richard Damon <Richard at Damon-Family.org>:

> On 6/23/18 9:05 AM, Marko Rauhamaa wrote:
>> Richard Damon <Richard at Damon-Family.org>:
>>
>>> On 6/23/18 8:03 AM, Marko Rauhamaa wrote:
>>>> I always know my locale. The locale is tied to the human user.
>>> No, it should be tied to the data you are processing.
>>    In computing, a locale is a set of parameters that defines the user's
>>    language, region and any special variant preferences that the user
>>    wants to see in their user interface.
>>
>>    <URL: https://en.wikipedia.org/wiki/Locale_(computer_software)>
>>
>> The data should not depend on the locale.
> So no one foreign ever gives you data?

Never in my decades in computer programming have I found any use for
locales.

In particular, they have never helped me decode "foreign" data, whether
in ASCII, Latin-1, Latin-3, Latin-9, JIS or UTF-8.

> Note, that wikipedia article is focused on the SYSTEM locale, which
> yes, that should reflect the what the user wants in his interface.

I don't think locales have anything to do with anything else.


>>> If an English user is feeding a program Chinese documents, while
>>> processing those documents the program should be using the
>>> appropriate Chinese Locale.
>> Not true.
> How else is the program going to understand the Chinese data?

If someone gives me a file, they had better indicate the file format.

> The fact that locale issues leak into data is the reason that the
> single immutable global locale doesn't work.

Locales don't work. Period.

> You really want to imbue into data streams what locale their data
> represents (and use that in some of the later processing of data from
> that stream).

Can you refer to a standard for that kind of imbuement?

Of course, you have document types, schema definitions and other
implicit and explicit format indicators. You shouldn't call them
locales, though.

>>> Data presented to the user should normally use his locale (unless he
>>> has specified something different).
>> Ok. Here's a value for you:
>>
>>     100€
>>
>> I see '1', '0', '0', '€'. What do you see in your locale (LC_MONETARY)?
> If I processed that on my system I would either get $100, or an error of
> wrong currency symbol depending on the error checking.

Don't forget to convert the amount as well...

>> The single global is due to what the locale was introduced for. It
>> came about around the time when Unix applications were being made
>> "8-bit clean." Along with UCS-2 and XML, it's one of those things you
>> wish you'd never have to deal with.
>
> Locale predates UCS-2, it was the early attempt to provide
> internationalization to C code so even programmers who didn't think
> about it could add the line setlocale(LC_ALL, "") and make their code
> work at least mostly right in more places. A single global was quick
> and simple, and since threads didn't exist, not an issue.
>
> In many ways it was the first attempt that should have been thrown
> away, but got too intertwined. C++ made a significant improvement to
> it by having streams remember their own locale.

Noone should breathe any new life into locales.

And yes, add C++ to the list of things you wish you'd never have to deal
with...


Marko



More information about the Python-list mailing list