Formatting string with accented characters for printing
Peter Otten
__peter__ at web.de
Sun Jan 18 13:28:52 EST 2015
Jerry Rocteur wrote:
> When I try and format output when there are accented characters the
> output does not look right.
>
> e.g.
>
> 27 Angie Dickons 67,638
> 28 Anne MÉRESSE 64,825
>
> So the strings containing accented characters print one less than
> those that don't.
>
> I've tried both:
>
> print '{0:2} {1:25} {2} '.format( cnt, nam[num].encode('utf-8'),
> steps[ind1])
> print "%3d %-25s %-7s" % ( cnt, nam[num].encode('utf-8'), steps[ind1])
>
> I've searched but I can't see a solution..
>
> I guess it is the way I'm printing nam[num].encode('utf-8') perhaps I
> have to convert it first ?
If you have a byte string (the standard in Python 2) you have to decode(),
i. e. convert it to unicode) before you format it. Compare:
>>> names = "Angie Dickons", "Anne Méresse"
>>> for name in names:
... print "|{:20}|".format(name)
...
|Angie Dickons |
|Anne Méresse |
>>> for name in names:
... name = name.decode("utf-8")
... print u"|{:20}|".format(name)
...
|Angie Dickons |
|Anne Méresse |
The best approach is to convert your data to unicode as soon as you read it
and perform all string operations with unicode. This also avoids breaking
characters:
>>> print "Méresse"[:2]
M�
>>> print u"Méresse"[:2]
Mé
There are still problems (e. g. with narrow builds), and the best way to
avoid a few string-related inconviences is to switch to Python 3.
More information about the Python-list
mailing list