[Python-Dev] Multilingual programming article on the Red Hat Developer blog

Jeff Allen ja.py at farowl.co.uk
Fri Sep 12 08:54:56 CEST 2014


On 12/09/2014 04:28, Stephen J. Turnbull wrote:
> Jeff Allen writes:
>
>   > A welcome article. One correction should be made, I believe: the area of
>   > code point space used for the smuggling of bytes under PEP-383 is not a
>   > "Unicode Private Use Area", but a portion of the trailing surrogate
>   > range.
>
> Nice catch.  Note that the surrogate range was originally part of the
> Private Use Area, but it was carved out with the adoption of UTF-16 in
> about 1993.  In practice, I doubt that there are any current
> implementations claiming compatibility with Unicode 1.0 (IIRC, UTF-16
> was made mandatory in Unicode 1.1).
That's a helpful bit of history that explains the uncharacteristic 
inaccuracy. Most I can do to keep the current position clear in my head.

> I've always thought that the "right" way to handle the private use
> area for "platforms" like Python and Emacs, which may need to use it
> for their own purposes (such as "undecodable bytes") but want to
> respect its use by applications, is to create an auxiliary table
> mapping the private use area to objects describing the characters
> represented by the private use code points.  These objects would have
> attributes such as external representation for text I/O, glyph (for
> GUI display), repr (for TTY display), various Unicode properties, etc.
Simply having a block "for private use" seems to create an unmanaged 
space for conflict, reminiscent of the "other 128 characters" in 
bilingual programming. I wondered if the way to respect use by 
applications might be to make it private to a particular sub-class of 
str, idly however.

Jeff Allen



More information about the Python-Dev mailing list