print arabic characters

Martin v. Loewis martin at v.loewis.de
Mon Dec 22 15:31:35 EST 2003


Peter Otten wrote:
> Disclaimer: As I know nothing about right-to-left printing languages, it's
> likely that I have got it at least partially wrong.

Indeed. First of all, each Unicode character has a directionality,
available as unicodedata.bidirectional; this is L, R, or AL for most
characters; some characters have weak (EN, ES, ET, ...) or neutral
(B, S, ...) directionality. You need to find runs of characters with
the same directionality; extending the run into weak or neutral
characters. Then you need to reverse only RTL runs, leaving the LTR
runs intact.

Next, in the process of reversing, you may need to mirrot weak LTR
characters, replacing them with their unicodedata.mirrored character.

Then, for AL runs, you need to replace European numerals with Arabic
numerals (but keeping the LTR order).

Finally, and again for Arabic characters, you need to perform glyph
shaping, replacing the first character of a word with the INITIAL
FORM, the last character with the FINAL FORM, all other characters
of a word with the MEDIAL FORM, and all remaining characters with
the ISOLATED FORM. This, of course, assumes your font has glpyhs
for these available.

This is specified in more detail in

http://www.unicode.org/reports/tr9/

> Can anybody point me to a way to iterate over characters with a varying
> number of bytes?

There is no trivial algorithm. You best decode the string into Unicode,
reverse, then encode again to the original encoding.

Regards,
Martin





More information about the Python-list mailing list