What encoding does u'...' syntax use?

Fri Feb 20 15:55:25 EST 2009

Ron Garret wrote:
> I would have thought that the answer would be: the default encoding 
> (duh!)  But empirically this appears not to be the case:
> 
>>>> unicode('\xb5')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xb5 in position 0: 
> ordinal not in range(128)
>>>> u'\xb5'
> u'\xb5'
>>>> print u'\xb5'
> µ
> 
> (That last character shows up as a micron sign despite the fact that my 
> default encoding is ascii, so it seems to me that that unicode string 
> must somehow have picked up a latin-1 encoding.)

You are mixing up console output and internal data representation. What you
see in the last line is what the Python interpreter makes of your unicode
string when passing it into stdout, which in your case seems to use a
latin-1 encoding (check your environment settings for that).

BTW, Unicode is not an encoding. Wikipedia will tell you more.

Stefan