[issue26483] docs unclear on difference between str.isdigit() and str.isdecimal()

Anna Koroliuk report at bugs.python.org
Sat Mar 12 09:50:06 EST 2016


Anna Koroliuk added the comment:

Hi, all!

At Helsinki Python sprint I with the kind help of Ezio found two things. 

1) This code gives results which are attached in the file. I will just now show some interesting cases where isdigit() and isdecimal() give different results.

for c in map(chr, range(0x10FFFF)):
if unicodedata.digit(c, None) is not None: print(c, c.isdigit(), c.isdecimal())
... 

0 True True
1 True True
2 True True
² True False
³ True False
¹ True False
፩ True False
፪ True False
፫ True False
፬ True False
① True False
② True False
③ True False

So it's different commands, although for usual digits 0-9 in usual typewriting without those upper indexes etc they give same results. Full file command_comparison.txt is attached. 

2) Both commands isdigit() and isdecimal() are traced back that symbol is compared to a certain tables (masks), but masks are different. For isdigit() it is DIGIT_MASK = 0x04 and for isdecimal() is DECIMAL_MASK 0x02.

Here is how all the commands are traced to the mask. 

A) isdecimal()

./Objects/unicodeobject.c:    {"isdecimal", (PyCFunction) unicode_isdecimal, METH_NOARGS, isdecimal__doc__},

./Objects/unicodeobject.c:
static PyObject*
unicode_isdecimal(PyObject *self)
....
    if (length == 1)
        return PyBool_FromLong(
            Py_UNICODE_ISDECIMAL(PyUnicode_READ(kind, data, 0)));

./Include/unicodeobject.h:#define Py_UNICODE_ISDECIMAL(ch) _PyUnicode_IsDecimalDigit(ch)

./Objects/unicodectype.c:
int _PyUnicode_IsDecimalDigit(Py_UCS4 ch)
{
    if (_PyUnicode_ToDecimalDigit(ch) < 0)
        return 0;
    return 1;
}

int _PyUnicode_ToDecimalDigit(Py_UCS4 ch)
{
    const _PyUnicode_TypeRecord *ctype = gettyperecord(ch);

    return (ctype->flags & DECIMAL_MASK) ? ctype->decimal : -1;
}
./Objects/unicodectype.c:#define DECIMAL_MASK 0x02

B) isdigit()

./Objects/unicodeobject.c:    {"isdigit", (PyCFunction) unicode_isdigit, METH_NOARGS, isdigit__doc__},

./Objects/unicodeobject.c: static PyObject*
unicode_isdigit(PyObject *self)
...
    if (length == 1) {
        const Py_UCS4 ch = PyUnicode_READ(kind, data, 0);
        return PyBool_FromLong(Py_UNICODE_ISDIGIT(ch));
    }

./Include/unicodeobject.h:#define Py_UNICODE_ISDIGIT(ch) _PyUnicode_IsDigit(ch)

./Objects/unicodectype.c: int _PyUnicode_IsDigit(Py_UCS4 ch)
{
    if (_PyUnicode_ToDigit(ch) < 0)
        return 0;
    return 1;
}

int _PyUnicode_ToDigit(Py_UCS4 ch)
{
    const _PyUnicode_TypeRecord *ctype = gettyperecord(ch);

    return (ctype->flags & DIGIT_MASK) ? ctype->digit : -1;
}

./Tools/unicode/makeunicodedata.py:DIGIT_MASK = 0x04

BR,
Anna

----------
nosy: +Anna Koroliuk
Added file: http://bugs.python.org/file42149/command_comparison.txt

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue26483>
_______________________________________


More information about the Python-bugs-list mailing list