[Tutor] three numbers for one

Steven D'Aprano steve at pearwood.info
Sat Jun 8 07:25:48 CEST 2013


On 08/06/13 13:11, Jim Mooney wrote:
> I'm puzzling out the difference between isdigit, isdecimal, and
> isnumeric. But at this point, for simple  practice programs, which is
> the best to use for plain old 0123456589 , without special characters?

Context?

Are you talking about the string methods, isdigit, isdecimal and isnumeric in Python 3? You should say so rather than have us guess.

Assuming this is what you mean, the short answer is, use isdecimal. Or better still, don't use any of them -- as the saying goes, it is better to ask forgiveness than permission. Instead of:

if s.isdecimal():
     n = int(s)
else:
     print("not a number")


it is usually better to just call int(s) and catch the exception. But occasionally it is handy or useful to "Look Before You Leap" and find out whether a string is numeric first, and for that Python 3 provides three methods. The Unicode standard defines three categories of "numeric character":


=== Decimal Digit, or category 'Nd' ===

This includes the familiar "Arabic numerals" we use in English and most European languages:

0123456789

plus actual Arabic numerals:

٠١٢٣٤٥٦٧٨٩

(which ironically are called "Indian numerals" in the parts of the Arab world that use them), and various others, such as Tamil, Bengali, Thai, and many others. Here is a full list:

http://www.fileformat.info/info/unicode/category/Nd/list.htm

(Note: if the above Arabic-Indic digits looks like garbage or mojibake, tell your email client to use the UTF-8 encoding. Most good email programs will automatically do so, but if it doesn't, there is usually a way to set the encoding by hand. If they look like square boxes, try changing the font you are using for display.)

The str.isdecimal() method returns True if the string contains only characters in the 'Nd' Unicode category. isdecimal() is the most specific of the three methods. Python's int() and float() functions will happily convert strings made of such characters into numbers:

py> s = '\N{TIBETAN DIGIT THREE}\N{TIBETAN DIGIT SEVEN}'
py> int(s)
37


=== Other Number, or category 'No' ===

These are characters which represents numerals, but in some context other than "ordinary" numbers. For example, they include fractions, special currency numerals, superscripts and subscripts. The full list is here:

http://www.fileformat.info/info/unicode/category/No/list.htm

The str.isdigit() returns True for anything that str.isdecimal() returns True, plus *some* characters in the 'No' category. To be precise, it returns True for those characters whose Unicode property includes Numeric_Type=Digit or Numeric_Type=Decimal.

For example, superscript digits count as a digit, but fractions do not.

py> '²'.isdigit()
True
py> '¼'.isdigit()
False

int() and float() do *not* convert such characters to numbers. If you want to support them, you have to manually convert them yourself, or you can use the unicodedata.numeric() function.


=== Letter Number, or category 'Nl' ===

This includes characters which are technically letters, but are used as numbers. Examples include Ancient Roman, Ancient Greek, Cuneiform and Hangzhou numerals. The full list is here:

http://www.fileformat.info/info/unicode/category/Nl/list.htm

Like 'No', int() and float() do not convert such characters.

The str.isnumeric() method is the least specific of the three methods. It returns True if the string contains only characters in any of the three numeric categories, 'Nd', 'No' and 'Nl'.




-- 
Steven


More information about the Tutor mailing list