[I18n-sig] Support for "wide" Unicode characters

M.-A. Lemburg mal@lemburg.com
Thu, 28 Jun 2001 15:49:28 +0200


"Machin, John" wrote:
> 
> [Guido van Rossum]
> > store Unicode strings using UTF-8.
> >
> > Does UTF-8 transfer isolated surrogates correctly?
> 
> [Marc-Andre Lemburg}
> It handles surrogates correctly, but rejects isolated ones on input
> (easy to fix though) and passes them through on output. As I said
> before, surrogate is far from being complete.
> 
> Marc-Andre, there is a *bug* in 2.1 encoding isolated high surrogates. I
> reported it
> and you assigned it to yourself on 23 June. Lookee here:
> 
> Python 2.1 (#15, Apr 16 2001, 18:25:49) [MSC 32 bit (Intel)] on win32
> Type "copyright", "credits" or "license" for more information.
> >>> u'\ud800'.encode('utf-8')
> '\xa0\x80' # should be 3 bytes, not 2
> >>>
> 
> While the fix is trivial, IMO an appropriate answer to Guido's question
> would include
> this particular lack of correctness.

Thanks for the note. I was looking at the code rather than
actually trying an example -- guess the latter is faster and
gives better answers ;-)

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/