translating foreign data

Ethan Furman ethan at stoneleaf.us
Fri Jun 22 04:43:56 EDT 2018


On 06/21/2018 01:20 PM, Ben Bacarisse wrote:
> Ethan Furman writes:
>
>> I need to translate numeric data in a string format into a binary
>> format.  I know there are at least two different methods of
>> representing parts less that 1, such as "10.5" and "10,5".  The data
>> is encoded using code pages, and can vary depending on the file being
>> read (so I can't rely on current locale settings).
>>
>> I'm sure this is a solved problem, but I'm not finding those
>> solutions.  Any pointers?
>
> You say "at least two" and give two but not knowing the others will hamper
> anyone trying to help.  (I appreciate that you may not yet know if there
> are to be any others.)

Yes, I don't know if there are others -- I have not studied the various ways different peoples represent decimal 
numbers.  ;)

> You say in a followup that you don't need to worry about digit grouping
> marks (like thousands separators) so I'm not sure what the problem is.
> Can't you just replace ',' with '.' a proceed as if you had only one
> representation?

I could, and that would work right up until a third decimal separator was found.  I'd like to solve the problem just 
once if possible.

> The code page remark is curious.  Will some "code pages" have digits
> that are not ASCII digits?

Good question.  I have no idea.  I get the appropriate decoder/encoder based on the code page contained in the file, 
then decode to unicode and go from there.  Unfortunately, that doesn't convert the decimal comma to the decimal point. 
:(  So I was hoping to map the code page to a locale that would properly translate the numbers for me, but so far what I 
have found in my readings suggests that in order to use the locale option I would have to actually change the active 
locale and potentially mess up every other part of the program when the file in question is opened in a locale that's 
different from its code page.

Worst case scenario is I manually create a map for each code page to decimal separator, but there's more than a few and 
I'd rather not if there is already a prebuilt solution out there.

--
~Ethan~




More information about the Python-list mailing list