[issue10557] Malformed error message from float()

Mon Nov 29 07:23:09 CET 2010

Alexander Belopolsky <belopolsky at users.sourceforge.net> added the comment:

After a bit of svn archeology, it does appear that Arabic-Indic digits' support was deliberate at least in the sense that the feature was tested for when the code was first committed. See r15000.

The test migrated from file to file over the last 10 years, but it is still present in test_float.py:

        self.assertEqual(float(b"  \u0663.\u0661\u0664  ".decode('raw-unicode-escape')), 3.14)

(It should probably be now rewritten using a string literal.)

I am now attaching the patch (issue10557.diff) that fixes the bug without sacrificing non-ASCII digit support.

If this approach is well-received, I would like to replace all calls to PyUnicode_EncodeDecimal() with the calls to the new _PyUnicode_EncodeDecimalUTF8() and deprecate Latin-1-oriented PyUnicode_EncodeDecimal().

For the future, I note that starting with Unicode 6.0.0, the Unicode Consortium promises that

"""
Characters with the property value Numeric_Type=de (Decimal) only occur in contiguous ranges of 10 characters, with ascending numeric values from 0 to 9 (Numeric_Value=0..9).
"""

This makes it very easy to check a numeric string does not contain a mix of digits from different scripts.

I still believe that proper API should require explicit choice of language or locale before allowing digits other than 0-9 just as int() would not accept hexadecimal digits without explicit choice of base >= 16.  But this would be a subject of a feature request.

----------
Added file: http://bugs.python.org/file19865/issue10557.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10557>
_______________________________________