python 2.7 and unicode (one more time)
Peter Otten
__peter__ at web.de
Thu Nov 20 07:35:17 EST 2014
Francis Moreau wrote:
> Hello,
>
> My application is using gettext module to do the translation
> stuff. Translated messages are unicode on both python 2 and
> 3 (with python2.7 I had to explicitely asked for unicode).
>
> A problem arises when formatting those messages before logging
> them. For example:
>
> log.debug("%s: %s" % (header, _("will return an unicode string")))
This is only problematic if header is a non-ascii bytestring.
> Indeed on python2.7, "%s: %s" is 'str' whereas _() returns
> unicode.
>
> My question is: how should this be fixed properly ?
>
> A simple solution would be to force all strings passed to the
> logger to be unicode:
>
> log.debug(u"%s: %s" % ...)
>
> and more generally force all string in my code to be unicode by
> using the 'u' prefix.
>
> or is there a proper solution ?
You don't need to change an all-ascii bytestring to unicode.
Lo and behold:
>>> "%s %s" % (u"üblich", u"ähnlich")
u'\xfcblich \xe4hnlich'
>>> u"%s %s" % (u"üblich", u"ähnlich")
u'\xfcblich \xe4hnlich'
Only non-ascii bytestrings mean trouble, either noisy
>>> u"%s nötig %s" % (u"üblich", "ähnlich")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0:
ordinal not in range(128)
>>> "%s nötig %s" % (u"üblich", u"ähnlich")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4:
ordinal not in range(128)
or silently until you have to decipher the logfile contents. It's best to
stay away from them, and the
from __future__ unicode_literals
that Chris mentionend is a convenient way to achieve that.
More information about the Python-list
mailing list