[Python-Dev] Python and the Unicode Character Database

Nick Coghlan ncoghlan at gmail.com
Mon Nov 29 13:43:26 CET 2010


On Mon, Nov 29, 2010 at 9:02 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> If we would go down that road, we would also have to disable other
> Unicode features based on locale, e.g. whether to apply non-ASCII
> case mappings, what to consider whitespace, etc.
>
> We don't do that for a good reason: Unicode is supposed to be
> universal and not limited to a single locale.

Because parsing numbers is about more than just the characters used
for the individual digits. There are additional semantics associated
with digit ordering (for any number) and decimal separators and
exponential notation (for floating point numbers) and those vary by
locale. We deliberately chose to make the builtin numeric parsers
unaware of all of those things, and assuming that we can simply parse
other digits as if they were their ASCII equivalents and otherwise
assume a C locale seems questionable.

If the existing semantics can be adequately defined, documented and
defended, then retaining them would be fine. However, the language
reference needs to define the behaviour properly so that other
implementations know what they need to support and what can be chalked
up as being just an implementation accident of CPython. (As a point in
the plus column, both decimal.Decimal and fractions.Fraction were able
to handle the '١٢٣٤.٥٦' example in a manner consistent with the int
and float handling)

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list