[Python-3000] Handling of wide Unicode characters

Alexandre Vassalotti alexandre at peadrop.com
Sat Jun 2 00:57:41 CEST 2007


Hi,

I was doing some testing on the new _string_io module, since I was
slightly skeptical on my handling of wide Unicode characters (32-bit
of length, instead of the usual 16-bit in UTF-16). So, I ran this
little test:

   >>> s = _string_io.StringIO()
   >>> s.write(u'��')
   >>> s.tell()
   2

Like I expected, wide Unicode characters count for two. However, I was
surprised that Python treats them as two characters as well:

   >>> len(u'��')
   2
   >>> u'��'
   u'\ud87e\udccd'

Is it a bug, or only an implementation choice?

Cheers,
-- Alexandre


More information about the Python-3000 mailing list