Unicode codepoints

Wed Jun 22 06:00:54 EDT 2011

That seems to me correct.

>>> '\\u{:04x}'.format(ord(u'é'))
\u00e9
>>> '\\U{:08x}'.format(ord(u'é'))
\U000000e9
>>>

because

>>> u'\U00e9'
  File "<eta last command>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes
in position 0-5: end of string in escape sequence
>>> u'\U000000e9'
é
>>> u'\u00e9'
é
>>>

from this:

>>> u'éléphant\N{EURO SIGN}'
éléphant€
>>> u = u'éléphant\N{EURO SIGN}'
>>> ''.join(['\\u{:04x}'.format(ord(c)) for c in u])
\u00e9\u006c\u00e9\u0070\u0068\u0061\u006e\u0074\u20ac
>>>

Skipping surrogate pairs is a little bit a non sense,
because the purpose is to display code points!