[I18n-sig] Unicode surrogates: just say no!

Paul Prescod paulp@ActiveState.com
Wed, 27 Jun 2001 13:10:45 -0700


Guido van Rossum wrote:
> 
>..
> 
> Users can choose to write code that's portable between the two
> versions by using surrogates on the narrow platform but not on the
> wide platform.  (This would be a good idea for backward compatibility
> with Python 2.0 and 2.1 anyway.)  The proposed (and current!) behavior
> of \U makes it easy for them to do the right thing with string
> literals; everything else, they just have to write code that won't
> separate surrogate halves.

What is the virtue in making the literal syntax easy and making unichr()
easy when everything else is hard? Counting characters is hard.
Addressing characters reliably is hard. Slicing reliably is hard. Why
not simplify things? Surrogates are just characters. If you want to
handle wide characters you need to build Python that way.

I'm trying to imagine the use-case where you care about surrogates
enough to want them to be automatically generated but not enough to care
about slicing and addressing and counting and ...and is this use-case
worth breaking the invariant that len(unichr(i))==1.

Surrogates: Just say no. :)
-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook