Short questions wrt Python & Unicode

John Machin sjmachin at lexicon.net
Fri Jun 9 08:59:45 EDT 2006


On 9/06/2006 10:04 PM, KvS wrote:

> 2) How do I get a representation of a unic. object in terms of Unicode
> code points? repr() doesn't do that, it sometimes parses or encodes the
> code points right:
> 
>|>>> s=u"\u0040\u0166\u00e6"
>|>>> s
> u'@\u0166\xe6'

|>>> ' '.join('U+%04X % ord(c) for c in s)
'U+0040 U+0166 U+00E6'

If you'd prefer it more Pythonic than unicode.orgic, adjust the format 
string and separator to suit your taste.

> (does this latter \xe6 have to do with the internal representation of
> unic. objects, maybe with this  UCS-2 encoding?)

|>>> u'\xe6' == u'\u00e6' == unichr(0xe6)
True
|>>> hex(ord(u'\u00e6'))
'0xe6'

U+nnnnnn is represented internally as the integer 0xnnnnnn -- except if 
it won't fit, but you can pretend that surrogate pairs don't exist, for 
the moment :-)

Cheers,
John




More information about the Python-list mailing list