translating foreign data

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri Jun 22 06:01:26 EDT 2018


On Fri, 22 Jun 2018 01:43:56 -0700, Ethan Furman wrote:

>> You say in a followup that you don't need to worry about digit grouping
>> marks (like thousands separators) so I'm not sure what the problem is.
>> Can't you just replace ',' with '.' a proceed as if you had only one
>> representation?
> 
> I could, and that would work right up until a third decimal separator
> was found.  I'd like to solve the problem just once if possible.

I don't know of any already existing solution, but there's only a limited 
number of decimal separators in common use around the world. There's 
probably nothing you can do ahead of time if somebody decides to start 
using (say) 5 as a decimal separator within Hindi numerals, except cry, 
but you can probably start by transforming all of the following into 
decimal points:

- interpuct (middle dot) · U+00B7
- comma ,
- Arabic decimal separator ٫ U+066B


https://en.wikipedia.org/wiki/Decimal_separator


Those three cover pretty much the whole world, using Hindu-Arabic 
numerals (1234...) and Eastern Arabic numerals (what the Arabs and 
Persians use). Other numeral systems seem to have either adopted Arabic 
numerals, or introduced the decimal point/comma into their own numeral 
system, or just don't use a decimal place value system.

Either way, I expect that the period . plus the three above will cover 
anything you are likely to find in real data.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson




More information about the Python-list mailing list