Blog "about python 3"
Terry Reedy
tjreedy at udel.edu
Sun Jan 5 17:48:43 EST 2014
On 1/5/2014 9:23 AM, wxjmfauth at gmail.com wrote:
> My examples are ONLY ILLUSTRATING, this FSR
> is wrong by design,
Let me answer you a different way. If FSR is 'wrong by design', so are
the alternatives. Hence, the claim is, in itself, useless as a guide to
choosing. The choices:
* Keep the previous complicated system of buggy narrow builds on some
systems and space-wasting wide builds on other systems, with Python code
potentially acting differently on the different builds. I am sure that
you agree that this is a bad design.
* Improved the dual-build system by de-bugging narrow builds. I proposed
to do this (and gave Python code proving the idea) by adding the
complication of an auxiliary array of indexes of astral chars in a
UTF-16 string. I suspect you would call this design 'wrong' also.
* Use the memory-wasting UTF-32 (wide) build on all systems. I know you
do not consider this 'wrong', but come on. From an information theoretic
and coding viewpoint, it clearly is. The top (4th) byte is *never* used.
The 3rd byte is *almost never* used. The 2nd byte usage ranges from
common to almost never for different users.
Memory waste is also time waste, as moving information-free 0 bytes
takes the same time as moving informative bytes.
Here is the beginning of the rationale for the FSR (from
http://www.python.org/dev/peps/pep-0393/ -- have you ever read it?).
"There are two classes of complaints about the current implementation of
the unicode type: on systems only supporting UTF-16, users complain that
non-BMP characters are not properly supported. On systems using UCS-4
internally (and also sometimes on systems using UCS-2), there is a
complaint that Unicode strings take up too much memory - especially
compared to Python 2.x, where the same code would often use ASCII
strings...".
The memory waste was a reason to stick with 2.7. It could break code
that worked in 2.x. By removing the waste, the FSR makes switching to
Python 3 more feasible for some people. It was a response to real
problems encountered by real people using Python. It fixed both classes
of complaint about the previous system.
* Switch to the time-wasting UTF-8 for text storage, as some have done.
This is different from using UTF-8 for text transmission, which I hope
becomes the norm soon.
--
Terry Jan Reedy
More information about the Python-list
mailing list