[Python-Dev] Python and the Unicode Character Database

M.-A. Lemburg mal at egenix.com
Thu Dec 2 21:05:21 CET 2010


"Martin v. Löwis" wrote:
>>> Now, one may wonder what precisely a "possibly signed floating point
>>> number" is, but most likely, this refers to
>>>
>>> floatnumber   ::=  pointfloat | exponentfloat
>>> pointfloat    ::=  [intpart] fraction | intpart "."
>>> exponentfloat ::=  (intpart | pointfloat) exponent
>>> intpart       ::=  digit+
>>> fraction      ::=  "." digit+
>>> exponent      ::=  ("e" | "E") ["+" | "-"] digit+
>>> digit          ::=  "0"..."9"
>>
>> I don't see why the language spec should limit the wealth of number
>> formats supported by float().
> 
> If it doesn't, there should be some other specification of what
> is correct and what is not. It must not be unspecified.

True.

>> It is not uncommon for Asians and other non-Latin script users to
>> use their own native script symbols for numbers. Just because these
>> digits may look strange to someone doesn't mean that they are
>> meaningless or should be discarded.
> 
> Then these users should speak up and indicate their need, or somebody
> should speak up and confirm that there are users who actually want
> '١٢٣٤.٥٦' to denote 1234.56. To my knowledge, there is no writing
> system in which '١٢٣٤.٥٦e4' means 12345600.0.

I'm not sure what you're after here.

>> Please also remember that Python3 now allows Unicode names for
>> identifiers for much the same reasons.
> 
> No no no. Addition of Unicode identifiers has a well-designed,
> deliberate specification, with a PEP and all. The support for
> non-ASCII digits in float appears to be ad-hoc, and not founded
> on actual needs of actual users.

Please note that we didn't have PEPs and the PEP process at the
time. The Unicode proposal predates and in some respects inspired
the PEP process.

The decision to add this support was deliberate based on the desire
to support as much of the nice features of Unicode in Python as
we could. At least that was what was driving me at the time.

Regarding actual needs of actual users: I don't buy that as an
argument when it comes to supporting a standard that is meant
to attract users with non-ASCII origins.

Some references you may want to read up on:

http://en.wikipedia.org/wiki/Numbers_in_Chinese_culture
http://en.wikipedia.org/wiki/Vietnamese_numerals
http://en.wikipedia.org/wiki/Korean_numerals
http://en.wikipedia.org/wiki/Japanese_numerals

Even MS Office supports them:

http://languages.siuc.edu/Chinese/Language_Settings.html

>> Note that the support in float() (and the other numeric constructors)
>> to work with Unicode code points was explicitly added when Unicode
>> support was added to Python and has been available since Python 1.6.
> 
> That doesn't necessarily make it useful. Alexander's complaint is that
> it makes Python unstable (i.e. changing as the UCD changes).

If that were true, then all Unicode database (UCD) changes would make
Python unstable. However, most changes to existing code points in the UCS
are bug fixes, so they actually have a stabilizing quality more than
a destabilizing one.

>> It is not a bug by any definition of "bug"
> 
> Most certainly it is: the documentation is either underspecified,
> or deviates from the implementation (when taking the most plausible
> interpretation). This is the very definition of "bug".

The implementation is not a bug and neither was this a bug in the
2.x series of the Python documentation. The Python 3.x docs apparently
introduced a reference to the language spec which is clearly not
capturing the wealth of possible inputs.

So, yes, we're talking about a documentation bug, but not an
implementation bug.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 29 2010)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Python-Dev mailing list