Changing default encoding

Mike C. Fletcher mcfletch at rogers.com
Thu Oct 9 01:11:45 EDT 2003


Martin v. Löwis wrote:

>"jean.moser" <jean.moser at wanadoo.fr> writes:
>
>  
>
>>Word is my word-processing tool.I can save the files in txt format
>>but special characters like é are transformed in \xe9 when I read
>>the files in Python.
>>    
>>
>
>That is not the case. They are not transformed to \xe9. Why do you
>believe such a transformation happens?
>
>  
>
He is likely looking at a repr( value ) and seeing the (safe) 
representation with the hexadecimal escapes.  Many people new to 
programming may get confused by this.  That is, he sees this:

 >>> 'áí' # implicit repr
'\xe1\xed'
 >>>

and doesn't realise that those particular escaped values are the latin-1 
escaped characters, he was expecting the accented characters to show 
up.  Doing this will help him see that the data is still in string (not 
unicode) format:

 >>> print 'áí'
áí

Knowing that Python supports unicode, new programmers may very easily 
get confused by the escapes and assume they are part of some weird 
"unicode encoding".

BTW, original poster, the actual encoding is quite probably not latin-1, 
but the default Microsoft Windows encoding, such as 'cp1252'.  Luckily, 
as long as you're not trying to convert to Unicode you don't have to 
care :) .

Enjoy,
Mike

_______________________________________
  Mike C. Fletcher
  Designer, VR Plumber, Coder
  http://members.rogers.com/mcfletch/








More information about the Python-list mailing list