Formatting string with accented characters for printing

Sun Jan 18 13:28:52 EST 2015

Jerry Rocteur wrote:

> When I try and format output when there are accented characters the
> output does not look right.
> 
> e.g.
> 
> 27 Angie Dickons                       67,638
> 28 Anne MÉRESSE                 64,825
> 
> So the strings containing accented characters print one less than
> those that don't.
> 
> I've tried both:
> 
>     print '{0:2} {1:25} {2} '.format( cnt, nam[num].encode('utf-8'),
> steps[ind1])
>     print "%3d %-25s %-7s" % ( cnt, nam[num].encode('utf-8'), steps[ind1])
> 
> I've searched but I can't see a solution..
> 
> I guess it is the way I'm printing nam[num].encode('utf-8') perhaps I
> have to convert it first ?

If you have a byte string (the standard in Python 2) you have to decode(), 
i. e. convert it to unicode) before you format it. Compare:

>>> names = "Angie Dickons", "Anne Méresse"
>>> for name in names:
...     print "|{:20}|".format(name)
... 
|Angie Dickons       |
|Anne Méresse       |
>>> for name in names:
...     name = name.decode("utf-8")
...     print u"|{:20}|".format(name)
... 
|Angie Dickons       |
|Anne Méresse        |

The best approach is to convert your data to unicode as soon as you read it 
and perform all string operations with unicode. This also avoids breaking 
characters:

>>> print "Méresse"[:2]
M�                                                                                                                                                    
>>> print u"Méresse"[:2]
Mé

There are still problems (e. g. with narrow builds), and the best way to 
avoid a few string-related inconviences is to switch to Python 3.