[Python-Dev] Python and the Unicode Character Database

Alexander Belopolsky alexander.belopolsky at gmail.com
Thu Dec 2 04:28:49 CET 2010


On Wed, Dec 1, 2010 at 10:11 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 12/1/2010 7:44 PM, Alexander Belopolsky wrote:
>
>> it.  The argument was that if there was a use case for parsing Eastern
>> Arabic numerals, it would be better served by a module written by
>> someone who speaks one of the Arabic languages and knows the details
>> of how  Eastern Arabic numerals are written.  So far nobody has even
>> claimed to know conclusively that Arabic-Indic digits are always
>> written left-to-right.
>
> Both my personal observations when travelling from Turkey to India and
> Wikipedia say yes. "When representing a number in Arabic, the lowest-valued
> position is placed on the right, so the order of positions is the same as in
> left-to-right scripts."
> https://secure.wikimedia.org/wikipedia/en/wiki/Arabic_language#Numerals

This matches my limited research on this topic as well.  However, I am
not sure that when these codes are embedded in Arabic text, their
logical order always matches their display order.  It seems to me that
it can go either way depending on the surrounding text and/or presence
of explicit formatting codes.  Also, I don't understand why Eastern
Arabic-Indic digits have the same Bidi-Class as European digits, but
Arabic-Indic digits, Arabic decimal and thousands separators have
Bidi-Class "AN".

http://www.unicode.org/reports/tr9/tr9-23.html#Bidirectional_Character_Types


More information about the Python-Dev mailing list