translating foreign data

Richard Damon Richard at Damon-Family.org
Sun Jun 24 15:47:54 EDT 2018


On 6/23/18 10:44 PM, Steven D'Aprano wrote:
> On Sat, 23 Jun 2018 17:52:55 -0400, Richard Damon wrote:
>
>> If you have more than just a number representing a value in the locale
>> currency, you can't ask the locale how to present/accept it.
> You're the only one saying that it has to be handled by the locale.
>
>
Actually, it was part of the problem statement by Marko, since he said
to use LC_MONETARY, which is the part of the Locale machinery dealing
with monetary quantities (and can ONLY handle the currency defined by
the Locale). What would you think of providing a program in say, Java,
to a problem statement that said to write a Python program.

I suppose he could have just meant use the number, which would be like
asking to interpret the value of 100 euros using math.pi

Or it could have been just a bad question like how heavy is blue. (Since
by definition a locale only knows how to handle a single type of
currency, assuming any value is of that type).

My answer was in part to point out the problem with the problem
statement (and people seem to want to jump on me for pointing out the
strengths and weaknesses of the locale system.

This also goes back to the very original question at the beginning of
the thread, the OP had a bunch of data with numbers using varying locale
conventions (he didn't use the words), but had various decimal
separators and some people asked about non-'arabic' numbers  (0-9).

This also goes back to some of the comments about file formats. Most
file formats are designed to be 'Machine Read' (even if they use text
formatting) and as such do NOT use localization facilities, so when
processing them you want the I/O processing system to be in a
non-localized mode (typically numbers always use . as the decimal
separator, and usually nothing as the thousands separator). While the
text format files might be opened in a text editor, the file format
doesn't cater to making things pretty for the user. Some programs will
create input/output/storage files where it is expected that the user
WILL open them, look at them and maybe even edit them. Numbers will use
the locale convention of currency and decimal/thousands separators. If
you have such a system, changing the locale rules for these files may
cause misinterpreting the values.

If you are bringing such files from a 'foreign' system, you need to be
able to indicate what locale to use when reading that file. This sounds
very much like the category of problem that the OP was dealing with.
They have apparently a large number files, presumably organized in some
consistent manner that the values in them make sense, but the numbers
are written in different local conventions, and this was causing the
simplistic processing to fail.

-- 
Richard Damon




More information about the Python-list mailing list