How to get a "screen" length of a multibyte string?

Evan Driscoll driscoll at cs.wisc.edu
Sun Nov 25 19:58:55 EST 2012


On 11/25/2012 07:48 AM, kobayashi wrote:
> Encoding is utf-8.
> I use "screen length" means as that; that of ascii character is 1, and that of character having double width than ascii character is 2.
> It's not bytes, but drawing width.
> As you say, it depends font. I'll be considering carefully.
>

Don't forget also that there are combining characters. To wit:

 >>> "\u00e1"
'á'
 >>> "\u0061\u0301"
'á'

(U+00e1 is an 'a' with acute accent; U+0061 is an unaccented 'a'; U+0301 
is an combining acute accent.)


So far the discussion has been on single Unicode code points which 
appear as a double-wide glyph (I did not know about those!); depending 
on how you want to look at it, combining characters result in sequences 
of Unicode code points which result in a single glyph, or combining 
characters are zero-width code points.

Evan




More information about the Python-list mailing list