Newbie question about text encoding

random832 at fastmail.us random832 at fastmail.us
Fri Mar 6 08:33:00 EST 2015


On Fri, Mar 6, 2015, at 04:06, Rustom Mody wrote:
> Also:
> Can a programmer who is away from UTF-16 in one part of the system (say
> by using python3)
> assume he is safe all over?

The most common failure of UTF-16 support, supposedly, is in programs
misusing the number of code units (for length or random access) as a
proxy for the number of characters.

However, when do you _really_ want the number of characters? You may
want to use it for, for example, the number of columns in a 'monospace'
font, which you've already screwed up because you haven't accounted for
double-wide characters or combining marks. Or you may want the position
that pressing an arrow key or backspace or forward-delete a number of
times will reach, which has its own rules in e.g. Indic languages (and
also fails on Latin with combining marks).



More information about the Python-list mailing list