Problem with -3 switch

Mon Jan 12 08:38:51 EST 2009

On Jan 13, 12:06 am, Christian Heimes <li... at cheimes.de> wrote:
> >> Perhaps you also like to hear from a developer who has worked on Python
> >> 3.0 itself and who has done lots of work with internationalized
> >> applications. If you want to get it right you must
>
> >> * decode incoming text data to unicode as early as possible
> >> * use unicode for all internal text data
> >> * encode outgoing unicode as late as possible.
>
> >> where incoming data is read from the file system, database, network etc.
>
> >> This rule applies not only to Python 3.0 but to *any* application
> >> written in *any* languate.
>
> > The above is a story with which I'm quite familiar. However it is
> > *not* the issue!! The issue is why would anyone propose changing a
> > string constant "foo" in working 2.x code to u"foo"?
>
> Do I really have to repeat "use unicode for all internal text data"?
>
> "foo" and u"foo" are two totally different things. The former is a byte
> sequence "\x66\x6f\x6f" while the latter is the text 'foo'. It just
> happens that "foo" and u"foo" are equal in Python 2.x because
> "foo".decode("ascii") == u"foo". In Python 3.x does it right, b"foo" is
> unequal to "foo".
>

Again, all very true, but irrelevant. b"foo" is *not* involved.

You're ignoring the effect of 2to3:

Original 2.x code: assert "foo" == u"foo" # works
output from 2to3: assert "foo" == "foo" # works

Original 2.x code with u prepended: assert u"foo" == u"foo" # works
output from 2to3: assert "foo" == "foo" # works

I say again, show me a case of working 2.5 code where prepending u to
an ASCII string constant that is intended to be used in a text context
is actually worth the keystrokes.