diferences between 22 and python 23

Wed Dec 3 14:02:42 EST 2003

Fredrik Lundh wrote:

>Enrique wrote:
>
>  
>
>>running a script that works fine in python 22 in python 23 i find something
>>like:
>>
>>unicodedecodeerror: "ascii" codec dan+t decode byte 0xed in position
>>37:ordinal not in range (128)
>>
>>Usually major versions of python were courteus with the previous versions...
>>    
>>
>
>0xED has never been a valid 7-bit ASCII character.
>  
>
Sure, but Python used to accept 8-bit characters in the platform's 
default encoding as part of string characters...

Most likely Enrique has a \xED somewhere in a string literal in his code 
that is intended to be an i-accent-ague.  That would have worked fine in 
all versions of Python before 2.3, but started failing in 2.3 due to the 
decision that all string literals would be converted to unicode and back 
and that the default encoding for such conversions would be ASCII 
(whereas previously it would most closely have been approximated by 
"platform's local 256-char encoding").

PythonWin 2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit (Intel)] on win32.
Portions Copyright 1994-2001 Mark Hammond (mhammond at skippinet.com.au) - 
see 'Help/About PythonWin' for further copyright information.
 >>> print '23\xED'
23í

So, Enrique, what you're probably looking for is this:
# -*- coding: ISO-8859-1 -*-

for latin-1, or

# -*- coding: cp1252 -*-

for Windows code-page.

You add these "magic" comments to the top of your Python source files to 
tell the interpreter that you're using a particular encoding for your 
Python string literals.  Even if you're just using string literals to 
store binary data, you'll still need to use a dummy encoding, such as 
latin-1.

Yes it's a bit of a pain, but the decision was made, so we have to deal 
with it :) .  I'm assuming that somewhere in the "new in 2.3" pages is a 
huge warning to the effect that this breaks lots of old code, but 
Enrique can be forgiven for missing it, as I think I managed to miss it 
too, all I found was this:

    *Encoding declarations* - you can put a comment of the form "# -*-
    coding: <encodingname> -*-" in the first or second line of a Python
    source file to indicate the encoding (e.g. utf-8). (PEP 263
    <http://www.python.org/peps/pep-0263.html> phase 1)

Which doesn't actually mention the breakage of code that results.  True, 
theoretically the code was never valid, but *lots* of people used 8-bit 
encodings quite happily with earlier versions and do find their code 
breaking in 2.3 because of this.

Have fun,
Mike

_______________________________________
  Mike C. Fletcher
  Designer, VR Plumber, Coder
  http://members.rogers.com/mcfletch/