translating foreign data

Peter J. Holzer hjp-python at hjp.at
Sat Jun 23 14:09:08 EDT 2018


On 2018-06-23 12:11:34 -0400, Richard Damon wrote:
> On 6/23/18 10:05 AM, Peter J. Holzer wrote:
> > On 2018-06-23 08:41:38 -0400, Richard Damon wrote:
> >> Once you open the Locale can of worms, EVERYTHING has a locale, to say
> >> you aren't using a locale is to say you are writing
> >> something unintelligible, as you can thing of the locale as the set of
> >> rules to interpret
> > I don't think that's a useful way to look at it. "Locale" in
> > (non-technical) English means "place" or "site". The idea behind the
> > locale concept is that some conventions (e.g. how to write numbers or
> > how to write strings) depend on the place where the program runs (or
> > maybe where the user is sitting or grew up or maybe where a file was
> > produced).
> >
> > For stuff which doesn't depend on the place (e.g. how a Python program
> > should be parsed), the locale concept doesn't apply.
> >
> The Locale should NOT be the place the computer is running in (at least
> not anymore), but where the data and the user are from (which can be
> different).

Yes, it can be different, but for some *very* common cases (PCs,
smartphones most of the time) it isn't. More imporantly for the concept,
when the concept was developed (in the late 1980's) is was very common
(probably more common than 10 years earlier).

> Do your really mean that when I travel to a place that uses
> . as the thousands separator and , as the decimal separator (instead of
> my normal environment when they are the other way around) all my
> programs should immediately change how they read all my data files and
> how I need to enter data? I hope not.

Sometimes, yes. If you want to work with your colleagues at that place
they might thank you to use the local conventions.

> I want my computer to use the Locale of where "I" came from (not
> current am) to talk to me,

That's why I wrote "or grew up".

> and to be able to set the Locale to interpret data to match the rules
> the person who generated them used to generate them,

And that's why I wrote "where a file was produced".

So many words to repeat what I already wrote ...


> so if they swap . and , compared to me, I can tell the program that.
> Your last parenthetical comment in the first paragraph is my key
> point,

I think it is the weakest point. The locale is useful for interactive
use (input and output) and also for output intended for human users. For
parsing files it is woefully inadequate (also for generating files
intended to be parsed).

> the locale used to read data should match the locale used to generate
> it, and that can easily be different than the locale being used to
> interact with the user.

Which is basically why "locale" is a rather useless concept with files.
When I get a CSV file, I don't want to say "use locale en_US.cp437",
because the location "US" is almost completely irrelevant, the language
"English" is somewhat relevant but much too specific", and the list
separator isn't there at all. I want to tell it: Decode using CP437, a
decimal point, tabs as a list separator, CRLF as the record separator,
no quoting.

> If a program doesn't care about the locale it is running in, like a
> Python compiler, the either it needs to use routines that totally ignore
> the locale or it needs to set the locale to one that matches the rules
> it wants.

The former. Because locales are in general opaque, so you can never be
sure that a given locale will use the rules you want ("C" is the
exception, but not very useful).

        hp

-- 
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | hjp at hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20180623/2056d344/attachment.sig>


More information about the Python-list mailing list