[Python-Dev] Python and the Unicode Character Database

Alexander Belopolsky alexander.belopolsky at gmail.com
Fri Dec 3 00:54:10 CET 2010


On Thu, Dec 2, 2010 at 4:14 PM, M.-A. Lemburg <mal at egenix.com> wrote:
..
> Some examples:
>
> http://www.bdl.gov.lb/circ/intpdf/int123.pdf

I looked at this one more closely.  While I cannot understand what it
says, It appears that Arabic numerals are used in dates.   It looks
like Python want be able to deal with those:

>>> datetime.strptime('١٩٩٩/١٠/٢٩', '%Y/%m/%d')
..
ValueError: time data '١٩٩٩/١٠/٢٩' does not match format '%Y/%m/%d'

Interestingly,

>>> datetime.strptime('١٩٩٩', '%Y')
datetime.datetime(1999, 1, 1, 0, 0)

which further suggests that support of such numerals is accidental.

As I think more about it, though I am becoming less avert to accepting
these numerals for base 10 integers.  Integers can be easily extracted
from text using simple regex and '\d' accepts all category Nd
characters.  I would require though that all digits be from the same
block, which is not hard because Unicode now promises to only have
them in contiguous blocks of 10.   This rule seems to address some of
security issues because it is unlikely that a system that can display
some of the local digits would not be able to display all of them
properly.

I still don't think it makes any sense to accept them in float().


More information about the Python-Dev mailing list