[Python-Dev] New Py_UNICODE doc

Sun May 8 20:40:40 CEST 2005

On May 8, 2005, at 5:15 AM, Martin v. Löwis wrote:

> 'configure takes an option --enable-unicode, with the possible
> values "ucs2", "ucs4", "yes" (equivalent to no argument),
> and  "no" (equivalent to --disable-unicode)'
>
> *THIS* documentation would break. This documentation is factually
> correct at the moment (configure does indeed take these options),
> and people rely on them in automatic build processes. Changing
> configure options should not be taken lightly, even if they
> may result from a "wrong mental model". By that rule, --with-suffix
> should be renamed to --enable-suffix, --with-doc-strings to
> --enable-doc-strings, and so on. However, the nitpicking that
> underlies the desire to rename the option should be ignored
> in favour of backwards compatibility.
>
> Changing the documentation that goes along with the option
> would be fine.

That is exactly what I proposed originally, which you shot down.  
Please actually read the contents of my messages.  What I said was 
"change the configure option and related documentation".

>> It provides more than minimum value - it provides the truth.
>
> No. It is just a command line option. It could be named
> --enable-quirk=(quork|quark), and would still select UTF-16.
> Command line options provide no truth - they don't even
> provide statements.

Wow, what an inane way of looking at it.  I don't know what world you 
live in, but in my world, users read the configure options and suppose 
that they mean something.  In fact, they *have* to go off on their own 
to assume something, because even the documentation you refer to above 
doesn't say what happens if they choose UCS-2 or UCS-4.  A logical 
assumption would be that python would use those CEFs internally, and 
that would be incorrect.

>>> With --enable-unicode=ucs2, Python's Py_UNICODE does *not* start
>>> supporting the full Unicode ccs the same way it supports UCS-2.
>>
>> I can't understand what you mean by this.  My point is that if you
>> configure python to support UCS-2, then it SHOULD NOT support 
>> surrogate
>> pairs.  Supporting surrogate paris is the purvey of variable width
>> encodings, and UCS-2 is not among them.
>
> So you suggest to renaming it to --enable-unicode=utf16, right?
> My point is that a Unicode type with UTF-16 would correctly
> support all assigned Unicode code points, which the current
> 2-byte implementation doesn't. So --enable-unicode=utf16 would
> *not* be the truth.

The current implementation supports the UTF-16 CEF.  i.e., it supports 
a variable width encoding form capable of representing all of the 
unicode space using surrogate pairs.  Please point out a code point 
that the current 2 byte implementation does not support, either 
directly, or through the use of surrogate pairs.

--
Nick