[ python-Bugs-1067294 ] Incorrect length of unicode strings using
.encode('utf-8')
SourceForge.net
noreply at sourceforge.net
Tue Nov 16 13:12:45 CET 2004
Bugs item #1067294, was opened at 2004-11-16 12:58
Message generated for change (Comment added) made by lemburg
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1067294&group_id=5470
Category: Unicode
Group: Python 2.4
>Status: Closed
>Resolution: Works For Me
Priority: 5
Submitted By: Ed Schofield (edschofield)
Assigned to: M.-A. Lemburg (lemburg)
Summary: Incorrect length of unicode strings using .encode('utf-8')
Initial Comment:
Python 2.3.4 and Python 2.4b2:
print "x = %-15s" %(x.encode('utf-8'),) + " more text"
gives an incorrect number of spaces when x is a
two-byte unicode character like à. There is no such
problem if x is used alone rather than its encode(...)
method.
The reason seems to be this: if x = u'\u00e0' (the
character à) and s=x.encode('utf-8'), then len(s) = 2,
which breaks the print command above on a UTF-8 terminal.
A slightly longer example is attached.
----------------------------------------------------------------------
>Comment By: M.-A. Lemburg (lemburg)
Date: 2004-11-16 13:12
Message:
Logged In: YES
user_id=38388
As you already noted: the problem is that you are mixing Unicode
and strings in a way which is bound to fail.
You should use:
print (u"x = %-15s" %x + u" more text").encode('utf-8')
ie. stay with Unicode as long as you can and only call encode
when doing I/O as last step before passing off the string
to an 8-bit stream.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1067294&group_id=5470
More information about the Python-bugs-list
mailing list