Print formatted Strings with Umlauts

Jeff Epler jepler at unpythonic.net
Wed Feb 11 17:01:06 EST 2004


If you work with Unicode strings instead of byte strings in the utf-8
encoding, you'll get the desired results for characters in the german
character set:

>>> b = '123'
>>> a = u'\344\366\374'
>>> print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8")
äöü   äöü  
123   123  

However, this isn't good enough in general.  For instance, in the
presence of Unicode combining characters, you won't get what you want:
>>> u = u'\N{COMBINING DIAERESIS}'
>>> a = 'a%so%su%s' % (u,u,u)
>>> print a.encode("utf-8")
äöü
>>> print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8")
äöü äöü
123   123  


You'll also run into problems with characters that have "Wide" or
"Ambiguous" East Asian Width properties in Unicode.  For example,
>>> a = u'\N{FULLWIDTH LATIN SMALL LETTER U}' * 3
>>> print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8")
uuu   uuu  
123   123  

Jeff




More information about the Python-list mailing list