Python usage numbers

Terry Reedy tjreedy at udel.edu
Sun Feb 12 22:09:50 EST 2012


On 2/12/2012 5:14 PM, Chris Angelico wrote:
> On Mon, Feb 13, 2012 at 9:07 AM, Terry Reedy<tjreedy at udel.edu>  wrote:
>> The situation before ascii is like where we ended up *before* unicode.
>> Unicode aims to replace all those byte encoding and character sets with
>> *one* byte encoding for *one* character set, which will be a great
>> simplification. It is the idea of ascii applied on a global rather that
>> local basis.
>
> Unicode doesn't deal with byte encodings; UTF-8 is an encoding,

The Unicode Standard specifies 3 UTF storage formats* and 8 UTF 
byte-oriented transmission formats. UTF-8 is the most common of all 
encodings for web pages. (And ascii pages are utf-8 also.) It is the 
only one of the 8 most of us need to much bother with. Look here for the 
list
http://www.unicode.org/glossary/#U
and for details look in various places in
http://www.unicode.org/versions/Unicode6.1.0/ch03.pdf

> but so are UTF-16, UTF-32.
 > and as many more as you could hope for.

All the non-UTF 'as many more as you could hope for' encodings are not 
part of Unicode.

* The new internal unicode scheme for 3.3 is pretty much a mixture of 
the 3 storage formats (I am of course, skipping some details) by using 
the widest one needed for each string. The advantage is avoiding 
problems with each of the three. The disadvantage is greater internal 
complexity, but that should be hidden from users. They will not need to 
care about the internals. They will be able to forget about 'narrow' 
versus 'wide' builds and the possible requirement to code differently 
for each. There will only be one scheme that works the same on all 
platforms. Most apps should require less space and about the same time.

-- 
Terry Jan Reedy




More information about the Python-list mailing list