Python strings outside the 128 range

Diez B. Roggisch deets at nospam.web.de
Thu Jul 13 06:35:10 EDT 2006


Sébastien Boisgérault schrieb:
> Hi,
> 
> Could anyone explain me how the python string "é" is mapped to
> the binary code "\xe9" in my python interpreter ?
> 
> "é" is not present in the 7-bit ASCII table that is the default
> encoding, right ? So is the mapping "é" -> "\xe9" portable ?
> (site-)configuration dependent ? Can anyone have something
> different of "é" when 'print "\xe9"' is executed ? If the process
> is config-dependent, what kind of config info is used ?

The default encoding has nothing to do with this. "\xe9" is just a byte. 
You can write it into a file (which the terminal is basically), and no 
default encoding whatsoever in the mix.

The default-encoding comes into play when you write unicode(!) strings 
to a file. Then the unicode string is converted to a byte string using 
the default-eocoding. Which will fail miserably if the default encoding 
is ascii (as it is supposed to be) and your unicode string contains any 
"funny" characters.

But even if you encode the unicode string explicitely with an encoding 
like latin1 or utf-8, the resulting byte strings will just be written to 
the file. And it is a totally different question (and actually not 
controllable by you/python) if the terminal will interpret the bytes 
correct or not.

Diez



More information about the Python-list mailing list