[Python-Dev] Python and the Unicode Character Database

Stephen J. Turnbull stephen at xemacs.org
Thu Dec 2 08:49:24 CET 2010


Ben Finney writes:

 > Input from an existing text file, as I said earlier. Or any other way of
 > text data making its way into a Python program.

 > Direct entry at the console is a red herring.

I don't think it is.  Not at all.  Here's why: '''print "%d" %
some_integer''' doesn't now, and never will (unless Kristan gets his
Python 2.8<wink>), produce Arabic or Han numerals.  Not in any
language I know of, not in Microsoft Excel, and definitely not in
Python 2.  *Somebody* typed that text at some point.  If it's Han,
that somebody had *way* too much time on his hands, not a working
accountant nor a graduate assistant in a research lab for sure.

How about old archived texts, copied and recopied?  At least for
Japanese, old archival (text) data will *all* be in ASCII, because the
earliest implementations of Japanese language text used JIS X 0201 (or
its predecessor), which doesn't have Han digits (and kana digits don't
exist even if you write with a brush and ink AFAIK).  Ditto Arabic, I
would imagine; ISO 8859/6 (aka Latin/Arabic) does not contain the
Arabic digits that have been presented here earlier AFAICT.  Note that
there's plenty of space for them in that code table (eg, 0xB0-0xB9 is
empty).  Apparently nobody *ever* thought it was useful to have them!

So, which culture, using which script and in which application, inputs
numeric data in other than ASCII digits?  Or would want to, if only
somebody would tell them they can do it in Python?  Hearsay will do,
for starters.


More information about the Python-Dev mailing list