how to make format operator % work with unicode as expected
Marc 'BlackJack' Rintsch
bj_666 at gmx.net
Sun Jan 27 12:14:51 EST 2008
On Sun, 27 Jan 2008 16:00:42 +0000, Peter Pei wrote:
> "Marc 'BlackJack' Rintsch" <bj_666 at gmx.net> wrote in message
> news:6031q9F1oh17aU1 at mid.uni-berlin.de...
>> On Sun, 27 Jan 2008 05:32:40 +0000, Peter Pei wrote:
>>
>>> You didn't understand my question, but thanks any way.
>>>
>>> Yes, it is true that %s already support unicode, and I did not contradict
>>> that. But it counts the number of bytes instead of characters, and makes
>>> things like %-20s out of alignment. If you don't understand my assertion,
>>> please don't argue back and I am only interested in answers from those
>>> who are qualified.
>>
>> I have the impression from your original post
>>
>> […] because it is unicode, and one byte is not neccessary one character.
>>
>> that you confuse unicode and utf-8. Are you sure you are qualified to
>> ask such a question in the first place!? :-þ
> so you are saying, with utf-8 encoding a byte is a character, shame on
> you.
No I don't say that. I say unicode has no bytes but codepoints. And with
unicode objects Python counts characters and not bytes. So I guess you
are trying to format utf-8 encoded byte strings instead of unicode
strings. Because with unicode strings your "problem" simply does not
exist. As several people already *showed* to you with examples. Once
again:
In [346]: u = u'sm\xf8rebr\xf8d'
In [347]: s = u.encode('utf-8')
In [348]: print '%-20s+\n%-20s+' % (s, 'spam')
smørebrød +
spam +
In [349]: print '%-20s+\n%-20s+' % (u, 'spam')
smørebrød +
spam +
348 is what you are doing, utf-8 encoded byte strings, but you claim that's
a problem with unicode.
And 349 is formatting unicode. See, no problem -- lines up nicely.
Instead of embarrassing yourself and being rude to people you should take
some time and learn something about unicode and encodings. Especially that
utf-8 ≠ unicode.
Ciao,
Marc 'BlackJack' Rintsch
More information about the Python-list
mailing list