[issue10581] Review and document string format accepted in numeric data type constructors

Wed Jun 12 09:05:08 CEST 2013

Marc-Andre Lemburg added the comment:

On 12.06.2013 07:32, Alexander Belopolsky wrote:
> 
> Alexander Belopolsky added the comment:
> 
> It looks like we a approaching consensus on some points:
> 
> 1. Mixed script numerals should be disallowed.
> 2. '\N{MINUS SIGN}' should be accepted as an alternative to '\N{HYPHEN-MINUS}'
> 
> Open question: should we accept fullwidth + and -, sub/superscript variants etc.?  I believe rather than debating variant codepoints one by one, we should consider applying NFKC (compatibility) normalization to unicode strings to be interpreted as numbers.  This would allow parsing strings like this:
> 
>>>> float(normalize('NFKC', '\N{FULLWIDTH HYPHEN-MINUS}\N{DIGIT ONE FULL STOP}\N{FULLWIDTH DIGIT TWO}'))
> -1.2

While it would solve these cases, I think that would cause a
significant performance hit.

Perhaps we could do this in two phases:
1. detect whether the string uses non-ASCII digits and symbols
2. if it does, apply normalization and then use the decimal codec

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10581>
_______________________________________