[I18n-sig] Re: Unicode surrogates: just say no!

François Pinard pinard@iro.umontreal.ca
02 Jul 2001 15:05:35 -0400


[Guido van Rossum]

> When using UCS-4 mode, I was in favor of allowing unichr() and \U to
> specify any value in range(0x100000000L) 

I did not check recently, but would think Unicode and 10646 are defined
on 31 bits, not 32.  If you represent an UCS-4 code within a 32 bit int,
it will never be negative.  It might be useful to rely on this.

P.S. - Would not 32 bits also require one more byte in UTF-8?

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard