translating foreign data
Ethan Furman
ethan at stoneleaf.us
Fri Jun 22 04:43:56 EDT 2018
On 06/21/2018 01:20 PM, Ben Bacarisse wrote:
> Ethan Furman writes:
>
>> I need to translate numeric data in a string format into a binary
>> format. I know there are at least two different methods of
>> representing parts less that 1, such as "10.5" and "10,5". The data
>> is encoded using code pages, and can vary depending on the file being
>> read (so I can't rely on current locale settings).
>>
>> I'm sure this is a solved problem, but I'm not finding those
>> solutions. Any pointers?
>
> You say "at least two" and give two but not knowing the others will hamper
> anyone trying to help. (I appreciate that you may not yet know if there
> are to be any others.)
Yes, I don't know if there are others -- I have not studied the various ways different peoples represent decimal
numbers. ;)
> You say in a followup that you don't need to worry about digit grouping
> marks (like thousands separators) so I'm not sure what the problem is.
> Can't you just replace ',' with '.' a proceed as if you had only one
> representation?
I could, and that would work right up until a third decimal separator was found. I'd like to solve the problem just
once if possible.
> The code page remark is curious. Will some "code pages" have digits
> that are not ASCII digits?
Good question. I have no idea. I get the appropriate decoder/encoder based on the code page contained in the file,
then decode to unicode and go from there. Unfortunately, that doesn't convert the decimal comma to the decimal point.
:( So I was hoping to map the code page to a locale that would properly translate the numbers for me, but so far what I
have found in my readings suggests that in order to use the locale option I would have to actually change the active
locale and potentially mess up every other part of the program when the file in question is opened in a locale that's
different from its code page.
Worst case scenario is I manually create a map for each code page to decimal separator, but there's more than a few and
I'd rather not if there is already a prebuilt solution out there.
--
~Ethan~
More information about the Python-list
mailing list