Blog "about python 3"

Sun Jan 5 17:48:43 EST 2014

On 1/5/2014 9:23 AM, wxjmfauth at gmail.com wrote:

> My examples are ONLY ILLUSTRATING, this FSR
> is wrong by design,

Let me answer you a different way. If FSR is 'wrong by design', so are 
the alternatives. Hence, the claim is, in itself, useless as a guide to 
choosing. The choices:

* Keep the previous complicated system of buggy narrow builds on some 
systems and space-wasting wide builds on other systems, with Python code 
potentially acting differently on the different builds. I am sure that 
you agree that this is a bad design.

* Improved the dual-build system by de-bugging narrow builds. I proposed 
to do this (and gave Python code proving the idea) by adding the 
complication of an auxiliary array of indexes of astral chars in a 
UTF-16 string. I suspect you would call this design 'wrong' also.

* Use the memory-wasting UTF-32 (wide) build on all systems. I know you 
do not consider this 'wrong', but come on. From an information theoretic 
and coding viewpoint, it clearly is. The top (4th) byte is *never* used. 
The 3rd byte is *almost never* used. The 2nd byte usage ranges from 
common to almost never for different users.

Memory waste is also time waste, as moving information-free 0 bytes 
takes the same time as moving informative bytes.

Here is the beginning of the rationale for the FSR (from 
http://www.python.org/dev/peps/pep-0393/ -- have you ever read it?).

"There are two classes of complaints about the current implementation of 
the unicode type: on systems only supporting UTF-16, users complain that 
non-BMP characters are not properly supported. On systems using UCS-4 
internally (and also sometimes on systems using UCS-2), there is a 
complaint that Unicode strings take up too much memory - especially 
compared to Python 2.x, where the same code would often use ASCII 
strings...".

The memory waste was a reason to stick with 2.7. It could break code 
that worked in 2.x. By removing the waste, the FSR makes switching to 
Python 3 more feasible for some people. It was a response to real 
problems encountered by real people using Python. It fixed both classes 
of complaint about the previous system.

* Switch to the time-wasting UTF-8 for text storage, as some have done. 
This is different from using UTF-8 for text transmission, which I hope 
becomes the norm soon.

-- 
Terry Jan Reedy