[ python-Bugs-989185 ] unicode.width broken for combining characters

Mon Jul 12 11:45:29 CEST 2004

Bugs item #989185, was opened at 2004-07-12 05:59
Message generated for change (Comment added) made by lemburg
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=989185&group_id=5470

Category: Unicode
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Matthew Mueller (donut)
Assigned to: M.-A. Lemburg (lemburg)
Summary: unicode.width broken for combining characters

Initial Comment:
Python 2.4a1+ (#38, Jul 11 2004, 20:36:10) 
[GCC 3.3.4 (Debian 1:3.3.4-3)] on linux2
Type "help", "copyright", "credits" or "license" for
more information.
>>> u'\u3060'.width()
2
>>> u'\u305f\u3099'.width()
4

Width should be two in both cases.

----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2004-07-12 11:45

Message:
Logged In: YES 
user_id=38388

To be honest: I don't really know how .width() ended up as
method.
The use context seems to be rather limited in that it only
applies to East Asian code points according to Unicode
Standard Annex #11.

I'd suggest to move the whole implementation to unicodedata
instead
(and then apply normalization before looking up the width).

Reading the UAX11 (http://www.unicode.org/reports/tr11/)
I also have a feeling that taking the sum of all
widths in a string of Unicode code points is not a very useful
approach. Since the width is mainly used for rendering East
Asian
text, only the per code point information is useful.
I think that it would be more appropriate to raise an
exception if you pass in more than one code point to the
function.

----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2004-07-12 06:46

Message:
Logged In: YES 
user_id=55188

This sounds that we need to normalize to NFC before
evaluations for unicode.width().
So, I think we'll need to choose how to use normalization
database from width() method.

1. export normalization CAPI functions from unicodedata
module like ucnhash_CAPI and unicodeobject uses it when
width() is first called.

2. move unicode.width() to unicodedata module and use
normalization functions statically.

I would prefer 2. ;)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=989185&group_id=5470