Problem with -3 switch

Mon Jan 12 07:05:03 EST 2009

John Machin schrieb:
> And therefore irrelevant.

No, Carl is talking about the very same issue.

> I would like to hear from someone who has actually started with
> working 2.x code and changed all their text-like "foo" to
> u"foo" [except maybe unlikely suspects like open()'s mode arg]:
> * how many places where the 2.x code broke and so did the 3.x code
> [i.e. the problem would have been detected without prepending u]
> * how many places where the 2.x code broke but the 3.x code didn't
> [i.e. prepending u did find the problem]
> * whether they thought it was worth the effort

Perhaps you also like to hear from a developer who has worked on Python
3.0 itself and who has done lots of work with internationalized
applications. If you want to get it right you must

* decode incoming text data to unicode as early as possible
* use unicode for all internal text data
* encode outgoing unicode as late as possible.

where incoming data is read from the file system, database, network etc.

This rule applies not only to Python 3.0 but to *any* application
written in *any* languate. The urlopen example is a very good example
for the issue. The author didn't think of decoding the incoming bytes to
unicode. In Python 2.x it works fine as long as the site contains ASCII
only. In Python 3.0 however an error is raised because binary data is no
longer implicitly converted to unicode.

Christian