[issue26483] docs unclear on difference between str.isdigit() and str.isdecimal()
Anna Koroliuk
report at bugs.python.org
Sat Mar 12 09:50:06 EST 2016
Anna Koroliuk added the comment:
Hi, all!
At Helsinki Python sprint I with the kind help of Ezio found two things.
1) This code gives results which are attached in the file. I will just now show some interesting cases where isdigit() and isdecimal() give different results.
for c in map(chr, range(0x10FFFF)):
if unicodedata.digit(c, None) is not None: print(c, c.isdigit(), c.isdecimal())
...
0 True True
1 True True
2 True True
² True False
³ True False
¹ True False
፩ True False
፪ True False
፫ True False
፬ True False
① True False
② True False
③ True False
So it's different commands, although for usual digits 0-9 in usual typewriting without those upper indexes etc they give same results. Full file command_comparison.txt is attached.
2) Both commands isdigit() and isdecimal() are traced back that symbol is compared to a certain tables (masks), but masks are different. For isdigit() it is DIGIT_MASK = 0x04 and for isdecimal() is DECIMAL_MASK 0x02.
Here is how all the commands are traced to the mask.
A) isdecimal()
./Objects/unicodeobject.c: {"isdecimal", (PyCFunction) unicode_isdecimal, METH_NOARGS, isdecimal__doc__},
./Objects/unicodeobject.c:
static PyObject*
unicode_isdecimal(PyObject *self)
....
if (length == 1)
return PyBool_FromLong(
Py_UNICODE_ISDECIMAL(PyUnicode_READ(kind, data, 0)));
./Include/unicodeobject.h:#define Py_UNICODE_ISDECIMAL(ch) _PyUnicode_IsDecimalDigit(ch)
./Objects/unicodectype.c:
int _PyUnicode_IsDecimalDigit(Py_UCS4 ch)
{
if (_PyUnicode_ToDecimalDigit(ch) < 0)
return 0;
return 1;
}
int _PyUnicode_ToDecimalDigit(Py_UCS4 ch)
{
const _PyUnicode_TypeRecord *ctype = gettyperecord(ch);
return (ctype->flags & DECIMAL_MASK) ? ctype->decimal : -1;
}
./Objects/unicodectype.c:#define DECIMAL_MASK 0x02
B) isdigit()
./Objects/unicodeobject.c: {"isdigit", (PyCFunction) unicode_isdigit, METH_NOARGS, isdigit__doc__},
./Objects/unicodeobject.c: static PyObject*
unicode_isdigit(PyObject *self)
...
if (length == 1) {
const Py_UCS4 ch = PyUnicode_READ(kind, data, 0);
return PyBool_FromLong(Py_UNICODE_ISDIGIT(ch));
}
./Include/unicodeobject.h:#define Py_UNICODE_ISDIGIT(ch) _PyUnicode_IsDigit(ch)
./Objects/unicodectype.c: int _PyUnicode_IsDigit(Py_UCS4 ch)
{
if (_PyUnicode_ToDigit(ch) < 0)
return 0;
return 1;
}
int _PyUnicode_ToDigit(Py_UCS4 ch)
{
const _PyUnicode_TypeRecord *ctype = gettyperecord(ch);
return (ctype->flags & DIGIT_MASK) ? ctype->digit : -1;
}
./Tools/unicode/makeunicodedata.py:DIGIT_MASK = 0x04
BR,
Anna
----------
nosy: +Anna Koroliuk
Added file: http://bugs.python.org/file42149/command_comparison.txt
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue26483>
_______________________________________
More information about the Python-bugs-list
mailing list