2to3 chokes on bad character

Thu Feb 24 15:59:26 EST 2011

On Feb 25, 12:00 am, Peter Otten <__pete... at web.de> wrote:
> John Machin wrote:

> > Your Python 2.x code should be TESTED before you poke 2to3 at it. In
> > this case just trying to run or import the offending code file would
> > have given an informative syntax error (you have declared the .py file
> > to be encoded in UTF-8 but it's not).
>
> The problem is that Python 2.x accepts arbitrary bytes in string constants.

Ummm ... isn't that a bug? According to section 2.1.4 of the Python
2.7.1 Language Reference Manual: """The encoding is used for all
lexical analysis, in particular to find the end of a string, and to
interpret the contents of Unicode literals. String literals are
converted to Unicode for syntactical analysis, then converted back to
their original encoding before interpretation starts ..."""

How do you reconcile "used for all lexical analysis" and "String
literals are converted to Unicode for syntactical analysis" with the
actual (astonishing to me) behaviour?