Coding systems are political (was Exended ASCII and code pages)

Terry Reedy tjreedy at udel.edu
Sun May 29 14:46:29 EDT 2016


On 5/29/2016 2:12 AM, Rustom Mody wrote:

> In short that a € costs more than a $ is a combination of the factors
> - a natural cause -- there are a million chars to encode (lets assume that the
> million of Unicode is somehow God-given AS A SET)
> - an artificial political one -- out of the million-factorial permutations of
> that million, the one that the Unicode consortium chose is towards satisfying the
> equation: Keep ASCII users undisturbed and happy

 From the Python developer viewpoint, Unicode might as well be a fact of 
nature.  I also note that in English text, a (phoneme) char conveys 
about 6 bits of information, while in Chinese text, a (word) char 
conveys perhaps 15 bits of information.  So I argue that Python 3.3+'s 
FSR is being fair in using 1 byte for the first and most often 2 bytes 
for the other.

-- 
Terry Jan Reedy





More information about the Python-list mailing list