[Python-Dev] Python and the Unicode Character Database

M.-A. Lemburg mal at egenix.com
Thu Dec 2 22:14:34 CET 2010


"Martin v. Löwis" wrote:
>> [...]
>> For direct entry by an interactive user, yes. Why are some people in
>> this discussion thinking only of direct entry by an interactive user?
> 
> Ultimately, somebody will have entered the data.

I don't think you really believe that all data processed by a
computer was eventually manually entered by a someone :-)

I already gave you a couple of examples of how such data can
end up being input for Python number constructors. If you are
still curious, please see the Wikipedia pages I linked to,
or have a look at these keyboards:

http://en.wikipedia.org/wiki/File:KB_Arabic_MAC.svg
http://en.wikipedia.org/wiki/File:Keyboard_Layout_Sanskrit.png
http://en.wikipedia.org/wiki/File:800px-KB_Thai_Kedmanee.png
http://en.wikipedia.org/wiki/File:Tibetan_Keyboard.png
http://en.wikipedia.org/wiki/File:KBD-DZ-noshift-2009.png

(all referenced on http://en.wikipedia.org/wiki/Keyboard_layout)

and then compare these to:

http://www.unicode.org/Public/5.2.0/ucd/extracted/DerivedNumericType.txt

Arabic numerals are being used a lot nowadays in Asian countries,
but that doesn't mean that the native script versions are not
being used anymore.

Furthermore, data can well originate from texts that were written
hundreds or even thousands of years ago, so there is plenty of
material available for processing.

Even if not entered directly, there are plenty of ways to convert
Arabic numerals (or other numeral systems) to the above forms,
e.g. in MS Office for Thai:

http://office.microsoft.com/en-us/excel-help/convert-arabic-numbers-to-thai-text-format-HP003074364.aspx

Anyway, as mentioned before: all this is really besides the point:

If we want to support Unicode in Python, we have to also support
conversion of numerals declared in Unicode into a form that can
be processed by Python. Regardless of where such data originates.

If we were not to follow this approach, we could just as well
decide not support support reading Egyptian Hieroglyphs based
on the argument that there's no keyboard to enter them...

http://www.unicode.org/charts/PDF/U13000.pdf  :-)

(from http://www.unicode.org/charts/)

>> Input from an existing text file, as I said earlier.
> 
> Which *specific* existing text file? Have you actually *seen* such a
> text file?

Have you tried Google ?

http://www.google.com/search?q=١٢٣
http://www.google.com/search?q=٣+site%3Agov.lb

Some examples:

http://www.bdl.gov.lb/circ/intpdf/int123.pdf
http://www.cdr.gov.lb/study/sdatl/Arabic/Chapter3.PDF
http://www.batroun.gov.lb/PDF/Waredat2006.pdf

(these all use http://en.wikipedia.org/wiki/Eastern_Arabic_numerals)

>> Direct entry at the console is a red herring.
> 
> And we don't need powerhouses because power comes out of the socket.

Martin, the argument simply doesn't fit well with the discussion
about Python and Unicode.

We introduced Unicode in Python not because there was a need
for each and every code point in Unicode, but because we wanted
to adopt a standard which doesn't prefer any one way of writing
things over another.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 02 2010)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Python-Dev mailing list