What encoding does u'...' syntax use?

Terry Reedy tjreedy at udel.edu
Fri Feb 20 16:35:47 EST 2009


Ron Garret wrote:
> I would have thought that the answer would be: the default encoding 
> (duh!)  But empirically this appears not to be the case:
> 
>>>> unicode('\xb5')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xb5 in position 0: 
> ordinal not in range(128)

The unicode function is usually used to decode bytes read from *external 
sources*, each of which can have its own encoding.  So the function 
(actually, developer crew) refuses to guess and uses the ascii common 
subset.

>>>> u'\xb5'
> u'\xb5'
>>>> print u'\xb5'
>
Unicode literals are *in the source file*, which can only have one 
encoding (for a given source file).

> (That last character shows up as a micron sign despite the fact that my 
> default encoding is ascii, so it seems to me that that unicode string 
> must somehow have picked up a latin-1 encoding.)

I think latin-1 was the default without a coding cookie line.  (May be 
uft-8 in 3.0).




More information about the Python-list mailing list