python 2.7 and unicode (one more time)

Francis Moreau francis.moro at gmail.com
Fri Nov 21 11:11:02 EST 2014


On 11/20/2014 04:15 PM, Chris Angelico wrote:
> On Fri, Nov 21, 2014 at 1:14 AM, Francis Moreau <francis.moro at gmail.com> wrote:
>> Hi,
>>
>> Thanks for the "from __future__ import unicode_literals" trick, it makes
>> that switch much less intrusive.
>>
>> However it seems that I will suddenly be trapped by all modules which
>> are not prepared to handle unicode. For example:
>>
>>  >>> from __future__ import unicode_literals
>>  >>> import locale
>>  >>> locale.setlocale(locale.LC_ALL, 'fr_FR')
>>  Traceback (most recent call last):
>>    File "<stdin>", line 1, in <module>
>>    File "/usr/lib64/python2.7/locale.py", line 546, in setlocale
>>      locale = normalize(_build_localename(locale))
>>    File "/usr/lib64/python2.7/locale.py", line 453, in _build_localename
>>      language, encoding = localetuple
>>  ValueError: too many values to unpack
>>
>> Is the locale module an exception and in that case I'll fix it by doing:
>>
>>  >>> locale.setlocale(locale.LC_ALL, b'fr_FR')
>>
>> or is a (big) part of the modules in python 2.7 still not ready for
>> unicode and in that case I have to decide which prefix (u or b) I should
>> manually add ?
> 
> Sadly, there are quite a lot of parts of Python 2 that simply don't
> handle Unicode strings. But you can probably keep all of those down to
> just a handful of explicit b"whatever" strings; most places should
> accept unicode as well as str. What you're seeing here is a prime
> example of one of this author's points (caution, long post):
> 
> http://unspecified.wordpress.com/2012/04/19/the-importance-of-language-level-abstract-unicode-strings/
> 
> """The lesson of Python 3 is: give programmers a Unicode string type,
> *make it the default*, and encoding issues will /mostly/ go away."""
> 
> There's a whole ecosystem to Python 2 - some in the standard library,
> heaps more in the rest of the world - and a lot of it was written on
> the assumption that a byte is a character is an octet. When you pass
> Unicode strings to functions written to expect byte strings, sometimes
> you win, and sometimes you lose... even with the standard library
> itself. But the Python 3 ecosystem has been written on the assumption
> that strings are Unicode. It's only a narrow set of programs
> ("boundary code", where you're moving text across networks and stuff
> like that) where the Python 2 model is easier to work with; and the
> recent Py3 releases have been progressively working to relieve that
> pain.
> 
> The absolute worst case is a function which exists in Python 2 and 3,
> and requires a byte string in Py2 and a text string in Py3. Sadly,
> that may be exactly what locale.setlocale() is. For that, I would
> suggest explicitly passing stuff through str():
> 
> locale.setlocale(locale.LC_ALL, str('fr_FR'))
> 
> In Python 3, 'fr_FR' is already a str, so passing it through str()
> will have no significant effect. (Though it would be worth commenting
> that, to make it clear to a subsequent reader that this is Py2 compat
> code.) In Python 2 with unicode_literals active, 'fr_FR' is a unicode,
> so passing it through str() will encode it to ASCII, producing a byte
> string that setlocale should be happy with.
> 
> By the way, the reason for the strange error message is clearer in
> Python 3, which chains in another exception:
> 
>>>> locale.setlocale(locale.LC_ALL, b'fr_FR')
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.5/locale.py", line 498, in _build_localename
>     language, encoding = localetuple
> ValueError: too many values to unpack (expected 2)
> 
> During handling of the above exception, another exception occurred:
> 
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/local/lib/python3.5/locale.py", line 594, in setlocale
>     locale = normalize(_build_localename(locale))
>   File "/usr/local/lib/python3.5/locale.py", line 507, in _build_localename
>     raise TypeError('Locale must be None, a string, or an iterable of
> two strings -- language code, encoding.')
> TypeError: Locale must be None, a string, or an iterable of two
> strings -- language code, encoding.
> 
> So when it gets the wrong type of string, it attempts to unpack it as
> an iterable; it yields five values (the five bytes or characters,
> depending on which way it's the wrong type of string), but it's
> expecting two. Fortunately, str() will deal with this. But make sure
> you don't have the b prefix, or str() in Py3 will give you quite a
> different result!
> 

Yes I finally used str() since only setlocale() reported to have some
issues with unicode_literals active in my appliction.

Thanks Chris for your useful insight.




More information about the Python-list mailing list