[Python-Dev] Unicode character property methods
M.-A. Lemburg
mal@lemburg.com
Mon, 06 Mar 2000 23:04:14 +0100
Guido van Rossum wrote:
>
> > As you may have noticed, the Unicode objects provide
> > new methods .islower(), .isupper() and .istitle(). Finn Bock
> > mentioned that Java also provides .isdigit() and .isspace().
> >
> > Question: should Unicode also provide these character
> > property methods: .isdigit(), .isnumeric(), .isdecimal()
> > and .isspace() ? Plus maybe .digit(), .numeric() and
> > .decimal() for the corresponding decoding ?
>
> What would be the difference between isdigit, isnumeric, isdecimal?
> I'd say don't do more than Java. I don't understand what the
> "corresponding decoding" refers to. What would "3".decimal() return?
These originate in the Unicode database; see
ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html
Here are the descriptions:
"""
6
Decimal digit value
normative
This is a numeric field. If the
character has the decimal digit
property, as specified in Chapter
4 of the Unicode Standard, the
value of that digit is represented
with an integer value in this field
7
Digit value
normative
This is a numeric field. If the
character represents a digit, not
necessarily a decimal digit, the
value is here. This covers digits
which do not form decimal radix
forms, such as the compatibility
superscript digits
8
Numeric value
normative
This is a numeric field. If the
character has the numeric
property, as specified in Chapter
4 of the Unicode Standard, the
value of that character is
represented with an integer or
rational number in this field. This
includes fractions as, e.g., "1/5" for
U+2155 VULGAR FRACTION
ONE FIFTH Also included are
numerical values for compatibility
characters such as circled
numbers.
u"3".decimal() would return 3. u"\u2155".
Some more examples from the unicodedata module (which makes
all fields of the database available in Python):
>>> unicodedata.decimal(u"3")
3
>>> unicodedata.decimal(u"²")
2
>>> unicodedata.digit(u"²")
2
>>> unicodedata.numeric(u"²")
2.0
>>> unicodedata.numeric(u"\u2155")
0.2
>>> unicodedata.numeric(u'\u215b')
0.125
> > Similar APIs are already available through the unicodedata
> > module, but could easily be moved to the Unicode object
> > (they cause the builtin interpreter to grow a bit in size
> > due to the new mapping tables).
> >
> > BTW, string.atoi et al. are currently not mapped to
> > string methods... should they be ?
>
> They are mapped to int() c.s.
Hmm, I just noticed that int() et friends don't like
Unicode... shouldn't they use the "t" parser marker
instead of requiring a string or tp_int compatible
type ?
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/