[I18n-sig] Unicode surrogates: just say no!

Paul Prescod paulp@ActiveState.com
Tue, 26 Jun 2001 16:40:23 -0700


Guido van Rossum wrote:
> 
>...
> 
> But you don't have to maintain it.  I say that this particular varying
> behavior is just as acceptable as the varying int size.

Aren't we trying to get of the maximum int size? And even if we keep it,
the rule for working with large integers is simple: calculations work on
particular ranges of inputs. Period. 

If I understand correctly, the surrogates proposal will (for example)
change this from legal to illegal:

if unichr(0x10000) in somestring:
	...

Because sometimes unichr is a single-char string and sometimes it will
actually produce a 2-byte encoding.

> Do you want to write the PEP?

If nobody pipes up to say that they've started it, then I'll do a first
draft tonight. I presume you mean write the PEP up as you described it
and not as I would like it.

So I guess I would want to cover

 * what is the issue
 * what are surrogates
 * how Py_UNICODE effects literals and unichr
 * rationale for doing surrogate generation
 * description of the configure switches
 * description of why other options were rejected

-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook