python 2.7 and unicode (one more time)

Peter Otten __peter__ at web.de
Thu Nov 20 07:35:17 EST 2014


Francis Moreau wrote:

> Hello,
> 
> My application is using gettext module to do the translation
> stuff. Translated messages are unicode on both python 2 and
> 3 (with python2.7 I had to explicitely asked for unicode).
> 
> A problem arises when formatting those messages before logging
> them. For example:
> 
>   log.debug("%s: %s" % (header, _("will return an unicode string")))

This is only problematic if header is a non-ascii bytestring.

> Indeed on python2.7, "%s: %s" is 'str' whereas _() returns
> unicode.
> 
> My question is: how should this be fixed properly ?
> 
> A simple solution would be to force all strings passed to the
> logger to be unicode:
> 
>   log.debug(u"%s: %s" % ...)
> 
> and more generally force all string in my code to be unicode by
> using the 'u' prefix.
> 
> or is there a proper solution ?

You don't need to change an all-ascii bytestring to unicode. 
Lo and behold:

>>> "%s %s" % (u"üblich", u"ähnlich")
u'\xfcblich \xe4hnlich'
>>> u"%s %s" % (u"üblich", u"ähnlich")
u'\xfcblich \xe4hnlich'

Only non-ascii bytestrings mean trouble, either noisy

>>> u"%s nötig %s" % (u"üblich", "ähnlich")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: 
ordinal not in range(128)
>>> "%s nötig %s" % (u"üblich", u"ähnlich")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: 
ordinal not in range(128)

or silently until you have to decipher the logfile contents. It's best to 
stay away from them, and the

from __future__ unicode_literals

that Chris mentionend is a convenient way to achieve that.




More information about the Python-list mailing list