[I18n-sig] How does Python Unicode treat surrogates?

Fredrik Lundh fredrik@pythonware.com
Mon, 25 Jun 2001 20:41:48 +0200


Tom Emerson wrote:
> > To extract the n'th Unicode character you would have to loop over all
> > the preceding characters checking for surrogates.  This makes it O(n).
> 
> No. If the n'th character is a valid high-surrogate (U+D800 -- U+DBFF)
> then look at the n+1'th character for a valid low-surrogate. If the
> n'th character is a valid low-surrogate and the n-1'th character is a
> valid high-surrogate, then skip it.

bzzt.  try again.

</F>