translating foreign data

Peter J. Holzer peter.j..holzer at 1
Sat Jun 23 21:09:08 EDT 2018


From: "Peter J. Holzer" <hjp-python at hjp.at>


--b2wbudmypdkmv7il
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2018-06-23 12:11:34 -0400, Richard Damon wrote:
> On 6/23/18 10:05 AM, Peter J. Holzer wrote:
> > On 2018-06-23 08:41:38 -0400, Richard Damon wrote:
> >> Once you open the Locale can of worms, EVERYTHING has a locale, to say
> >> you aren't using a locale is to say you are writing
> >> something unintelligible, as you can thing of the locale as the set of
> >> rules to interpret
> > I don't think that's a useful way to look at it. "Locale" in
> > (non-technical) English means "place" or "site". The idea behind the
> > locale concept is that some conventions (e.g. how to write numbers or
> > how to write strings) depend on the place where the program runs (or
> > maybe where the user is sitting or grew up or maybe where a file was
> > produced).
> >
> > For stuff which doesn't depend on the place (e.g. how a Python program
> > should be parsed), the locale concept doesn't apply.
> >
> The Locale should NOT be the place the computer is running in (at least
> not anymore), but where the data and the user are from (which can be
> different).

Yes, it can be different, but for some *very* common cases (PCs, smartphones
most of the time) it isn't. More imporantly for the concept, when the concept
was developed (in the late 1980's) is was very common (probably more common
than 10 years earlier).

> Do your really mean that when I travel to a place that uses
> . as the thousands separator and , as the decimal separator (instead of
> my normal environment when they are the other way around) all my
> programs should immediately change how they read all my data files and
> how I need to enter data? I hope not.

Sometimes, yes. If you want to work with your colleagues at that place they
might thank you to use the local conventions.

> I want my computer to use the Locale of where "I" came from (not
> current am) to talk to me,

That's why I wrote "or grew up".

> and to be able to set the Locale to interpret data to match the rules
> the person who generated them used to generate them,

And that's why I wrote "where a file was produced".

So many words to repeat what I already wrote ...


> so if they swap . and , compared to me, I can tell the program that.
> Your last parenthetical comment in the first paragraph is my key
> point,

I think it is the weakest point. The locale is useful for interactive use
(input and output) and also for output intended for human users. For parsing
files it is woefully inadequate (also for generating files intended to be
parsed).

> the locale used to read data should match the locale used to generate
> it, and that can easily be different than the locale being used to
> interact with the user.

Which is basically why "locale" is a rather useless concept with files. When I
get a CSV file, I don't want to say "use locale en_US.cp437", because the
location "US" is almost completely irrelevant, the language "English" is
somewhat relevant but much too specific", and the list separator isn't there at
 all. I want to tell it: Decode using CP437, a decimal point, tabs as a list
separator, CRLF as the record separator, no quoting.

> If a program doesn't care about the locale it is running in, like a
> Python compiler, the either it needs to use routines that totally ignore
> the locale or it needs to set the locale to one that matches the rules
> it wants.

The former. Because locales are in general opaque, so you can never be sure
that a given locale will use the rules you want ("C" is the exception, but not
very useful).

        hp

--=20
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | hjp at hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

--b2wbudmypdkmv7il
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAlsujL8ACgkQ8g5IURL+
KF0dVxAAryd5Ew15J/aco4xdLAa+EGzIwFKoPQjIb90C4vbIXCbmrVwVYmbJN/yK
J9AhrnUoyipGbK2IMamCUpCp7XVfkCgnMvGmf8ZFeflwaw5NCJLvv42JqAgl+lUP
I1H/hEz/RoR8NFRWGseXGmTSt6KMwAUJjdDzK5eQru25U0vxTPkGmXLXE8wqmynA
lft/CbsPy374Dda99033UJpG9QDZubmhfnt8j4xuHX0u8ZJmY7LJWGQ+zt06RtP1
RA7m+IxyKRRLlRiVSoS5XslRMKSEGfUhqt+jYjVASE5nOgtPPmQswjpKb3fzTJ13
wVCQJGKTu9mOO7xPwhxms8bKBegxjDbrGF0G8FJYW/ty/brItUkhhtb+Z7Pkj1iq
d4xKRBQmux0tz5/kdFFUkz0u9CpFfB0+pzHGfK1edsAGh+lwzN2KgNcBff6H+5FL
er+FZ1HbicXecq2XgTgpq8UYdHeANQI5yMEPjrCiHG4ybZ9T+bVanayji1vC2xqC
DNZ7cWGTFz5AtSDC1fgRupG6ZX5BK/kZjdQz8Hx0AdgW1i4fKu28/giz6mbB+NnC
XgcyXRjj1Jr3aCThjaZ7bq5dYwLzZKNLRQ3shnJE+Tfm3HfvDjvw5Wqf3lOIkcaQ
Wf40K7bivvEZfdZzy/QkmtHjevVutrMZclj/e3NXPevBTWle6eI=
=W2Ut
-----END PGP SIGNATURE-----

--b2wbudmypdkmv7il--

--- BBBS/Li6 v4.10 Toy-3
 * Origin: Prism bbs (1:261/38)



More information about the Python-list mailing list