how to make format operator % work with unicode as expected

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Sun Jan 27 12:14:51 EST 2008


On Sun, 27 Jan 2008 16:00:42 +0000, Peter Pei wrote:

> "Marc 'BlackJack' Rintsch" <bj_666 at gmx.net> wrote in message 
> news:6031q9F1oh17aU1 at mid.uni-berlin.de...
>> On Sun, 27 Jan 2008 05:32:40 +0000, Peter Pei wrote:
>>
>>> You didn't understand my question, but thanks any way.
>>>
>>> Yes, it is true that %s already support unicode, and I did not contradict
>>> that. But it counts the number of bytes instead of characters, and makes
>>> things like %-20s out of alignment. If you don't understand my assertion,
>>> please don't argue back and I am only interested in answers from those 
>>> who are qualified.
>>
>> I have the impression from your original post
>>
>>  […] because it is unicode, and one byte is not neccessary one character.
>>
>> that you confuse unicode and utf-8.  Are you sure you are qualified to
>> ask such a question in the first place!?  :-þ
> so you are saying, with utf-8 encoding a byte is a character, shame on
> you.

No I don't say that.  I say unicode has no bytes but codepoints.  And with
unicode objects Python counts characters and not bytes.  So I guess you
are trying to format utf-8 encoded byte strings instead of unicode
strings.  Because with unicode strings your "problem" simply does not
exist.  As several people already *showed* to you with examples.  Once
again:

In [346]: u = u'sm\xf8rebr\xf8d'

In [347]: s = u.encode('utf-8')

In [348]: print '%-20s+\n%-20s+' % (s, 'spam')
smørebrød         +
spam                +

In [349]: print '%-20s+\n%-20s+' % (u, 'spam')
smørebrød           +
spam                +

348 is what you are doing, utf-8 encoded byte strings, but you claim that's
a problem with unicode.

And 349 is formatting unicode.  See, no problem -- lines up nicely.

Instead of embarrassing yourself and being rude to people you should take
some time and learn something about unicode and encodings.  Especially that
utf-8 ≠ unicode.

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list