What encoding does u'...' syntax use?

Fri Feb 20 19:10:15 EST 2009

In article <499F3A8F.9010200 at v.loewis.de>,
 "Martin v. Löwis" <martin at v.loewis.de> wrote:

> >>>>> u'\xb5'
> >> u'\xb5'
> >>>>> print u'\xb5'
> >> ?
> > 
> > Unicode literals are *in the source file*, which can only have one
> > encoding (for a given source file).
> > 
> >> (That last character shows up as a micron sign despite the fact that
> >> my default encoding is ascii, so it seems to me that that unicode
> >> string must somehow have picked up a latin-1 encoding.)
> > 
> > I think latin-1 was the default without a coding cookie line.  (May be
> > uft-8 in 3.0).
> 
> It is, but that's irrelevant for the example. In the source
> 
>   u'\xb5'
> 
> all characters are ASCII (i.e. all of "letter u", "single
> quote", "backslash", "letter x", "letter b", "digit 5").
> As a consequence, this source text has the same meaning in all
> supported source encodings (as source encodings must be ASCII
> supersets).
> 
> The Unicode literal shown here does not get its interpretation
> from Latin-1. Instead, it directly gets its interpretation from
> the Unicode coded character set. The string is a short-hand
> for
> 
>  u'\u00b5'
> 
> and this denotes character U+00B5 (just as u'\u20ac" denotes
> U+20AC; the same holds for any other u'\uXXXX').
> 
> HTH,
> Martin

Ah, that makes sense.  Thanks!

rg