translating foreign data
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Fri Jun 22 06:01:26 EDT 2018
On Fri, 22 Jun 2018 01:43:56 -0700, Ethan Furman wrote:
>> You say in a followup that you don't need to worry about digit grouping
>> marks (like thousands separators) so I'm not sure what the problem is.
>> Can't you just replace ',' with '.' a proceed as if you had only one
>> representation?
>
> I could, and that would work right up until a third decimal separator
> was found. I'd like to solve the problem just once if possible.
I don't know of any already existing solution, but there's only a limited
number of decimal separators in common use around the world. There's
probably nothing you can do ahead of time if somebody decides to start
using (say) 5 as a decimal separator within Hindi numerals, except cry,
but you can probably start by transforming all of the following into
decimal points:
- interpuct (middle dot) · U+00B7
- comma ,
- Arabic decimal separator ٫ U+066B
https://en.wikipedia.org/wiki/Decimal_separator
Those three cover pretty much the whole world, using Hindu-Arabic
numerals (1234...) and Eastern Arabic numerals (what the Arabs and
Persians use). Other numeral systems seem to have either adopted Arabic
numerals, or introduced the decimal point/comma into their own numeral
system, or just don't use a decimal place value system.
Either way, I expect that the period . plus the three above will cover
anything you are likely to find in real data.
--
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson
More information about the Python-list
mailing list