urwid with multi-byte encoded and bidirectional text?

Thu Nov 4 23:17:07 EST 2004

I hope to add support for multi-byte encoded and bidirectional text to
my curses-based UI library:
http://excess.org/urwid/

I would like to support whatever encoding the user likes.  Are there
functions for:
- querying the preferred encoding
- splitting encoded strings into characters based on an encoding
- determining the direction (L to R, R to L) of each character
- determining the number of columns used by each character when written 
to the terminal

I currently use a "line translation" structure to store instructions
for mapping a text string to a two-dimensional "canvas". Its current,
simple, format is described here:
http://excess.org/urwid/reference.html#Text-get_line_translation

The line translation structures describe the result of
word-wrapping/clipping and justification applied to the source text. A
*new* line translation format would have to support characters that are
N bytes in the string and M columns wide when displayed, as well as text
that is displayed in a different order than it appears in the string.

Is normalizing bidirectional text orthogonal to wrapping/clipping and
aligning that text?  Could I create a "direction translation" structure
that describes how a given string can be reordered Left-to-Right, then
solve the wrapping and alignment with this normalized version?

In what situations are characters modified/removed/inserted as part of
displaying them? (eg. punctuation being reversed when surrounding R to L
text)

TIA

Ian Ward <ian#excess,org>