[Python-Dev] readd u'' literal support in 3.3?

Nick Coghlan ncoghlan at gmail.com
Sat Dec 10 06:55:45 CET 2011


On Sat, Dec 10, 2011 at 5:58 AM, PJ Eby <pje at telecommunity.com> wrote:
> In fact, I'm not sure why people are bringing it into this discussion at
> all: PEP 3333 was designed to work well with 2to3, which does the right
> thing for WSGI code: it converts 2.x "str" to 3.x "str", as it should.  If
> you're writing 2.x WSGI code with 'u' literals, *your code is broken*.
>
> WSGI doesn't need 'u' literals and never has.  It *does* need b'' literals
> for stuff that refers to request and response bodies, but everything else
> should be plain old string literals for the appropriate Python version.

The reason it came up is that the reason "from __future__ import
unicode_literals" doesn't obviously help with doing single codebase
style ports for a lot of WSGI related code is because such code
actually has *3* string types to deal with:

Actual text (u'', unicode -> str)
Native strings for WSGI ('', str -> str)
Binary data (b'', str -> bytes)

That works fine with 2to3, since 2to3 will strip out the leading 'u'
from the actual text literals, but presents a potential hassle for the
single codebase approach. Most other contexts only need the
binary->binary and text->text conversion, so the future import really
helps out.

However, I just realised that there actually *is* a relatively clear
way to spell this for all 2.6+ versions: the future import *doesn't*
change the meaning of the 'str' builtin (it's still the 8-bit string
type in 2.x), so the native way to spell the above distinction when
"from __future__ import unicode_literals" is in effect is as follows:

Actual text: ''
Native strings for WSGI: str('')
Binary data: b''

Calling a builtin is much lower overhead than calling a helper from a
compatibility module, and this also makes it clear that native strings
are the odd ones out.

So I'm back to being -1 on the idea of adding back u'' literals for
3.3. Instead, people should explicitly call str() on any literals that
they want to be actual str instances both in 3.x and in 2.x when the
unicode literals future import is in effect.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list