How to get a "screen" length of a multibyte string?
Evan Driscoll
driscoll at cs.wisc.edu
Sun Nov 25 19:58:55 EST 2012
On 11/25/2012 07:48 AM, kobayashi wrote:
> Encoding is utf-8.
> I use "screen length" means as that; that of ascii character is 1, and that of character having double width than ascii character is 2.
> It's not bytes, but drawing width.
> As you say, it depends font. I'll be considering carefully.
>
Don't forget also that there are combining characters. To wit:
>>> "\u00e1"
'á'
>>> "\u0061\u0301"
'á'
(U+00e1 is an 'a' with acute accent; U+0061 is an unaccented 'a'; U+0301
is an combining acute accent.)
So far the discussion has been on single Unicode code points which
appear as a double-wide glyph (I did not know about those!); depending
on how you want to look at it, combining characters result in sequences
of Unicode code points which result in a single glyph, or combining
characters are zero-width code points.
Evan
More information about the Python-list
mailing list