translating foreign data

Marko Rauhamaa marko at pacujo.net
Sat Jun 23 09:05:07 EDT 2018


Richard Damon <Richard at Damon-Family.org>:

> On 6/23/18 8:03 AM, Marko Rauhamaa wrote:
>> I always know my locale. The locale is tied to the human user.
> No, it should be tied to the data you are processing.

   In computing, a locale is a set of parameters that defines the user's
   language, region and any special variant preferences that the user
   wants to see in their user interface.

   <URL: https://en.wikipedia.org/wiki/Locale_(computer_software)>

The data should not depend on the locale.

> If an English user is feeding a program Chinese documents, while
> processing those documents the program should be using the appropriate
> Chinese Locale.

Not true.

> Again, no, a locale is tied to the data, not the user (unless you want
> to require the user to translate all data to his locale conventions
> (without using a program that can use locale information) before
> providing it to a program. Yes, the default for the interpretation
> should be the users default/current locale, but you really want them
> to be able to say I got this file from someone whose locale was
> different than mine.

The locale is not directly related to data or data formats. Of course,
locales leak into data and create the sorry mess we are talking about.

> Data presented to the user should normally use his locale (unless he
> has specified something different).

Ok. Here's a value for you:

    100€

I see '1', '0', '0', '€'. What do you see in your locale (LC_MONETARY)?

>> BTW, I think the locale is a terrible invention.
>
> The locale is a lot better than the alternative, where every
> application that needs to deal with internationalization need to
> recreate (and debub) all of the mechanism. I agree it isn't perfect,
> and for small simple programs it would be nice to be able to say "I
> don't want all this stuff, make it go away".

The locale doesn't solve a single problem in practice and often trips up
programs. For example, a customer-visible bug was once caused by:

   sort <identifiers.txt

producing different results on different customers' machines.

Mental note: *always* prefix GNU textutils commands with LANG=C.

> Python took its locale (at least initially) from C, which was a single
> global which does have more issues because of this.

The single global is due to what the locale was introduced for. It came
about around the time when Unix applications were being made "8-bit
clean." Along with UCS-2 and XML, it's one of those things you wish
you'd never have to deal with.


Marko



More information about the Python-list mailing list