Python 3.2 has some deadly infection

Terry Reedy tjreedy at udel.edu
Tue Jun 3 14:22:45 EDT 2014


On 6/3/2014 10:18 AM, Robin Becker wrote:

> I think the idea that we only give meaning to binary data using
> encodings is a bit limiting.

On the contrary, it is liberating. The fact that bits have no meaning 
other than 'a choice between two alterntives' means
1. any binary choice - 0/1, -/+, false/true, no/yes, closed/open, 
male/female, sad/happy, evil/good, low/high, and so on ad infinitum, can 
be encoded into a bit. Since any such pair could have been reversed, the 
mapping between bit states and the pair is arbitrary, and constitutes an 
encoding.
2. any discret or digitized information that constitutes a choice 
between multiple alternative can be encoded into a sequence of bits.

This crucial discovery is the basis of Shannon's 1947 paper and of the 
information age that started about then.

> A zip or gif file has structure, but I don't think it's reasonable  to
 >to regard such a file as having an encoding in the python unicode sense.

I an not quite sure what you are denying. Color encodings are encodings 
as much as character encodings, even if they encode different 
information. Both encode sensory experience and conceptual correlates 
into a sequences of bits usually organized for convenience into a 
sequence of bytes or other chunks.

There is another similarity. Text files often have at least two levels 
of encoding. First is the character encoding; that is all unicode 
handles. Then there is the text structure encoding, which is sometimes 
called the 'file format'. Most text files are at least structured into 
'lines'. For this, they use encoded line endings, and there have been 
multiple choices for this and at least 2 still in common use (which is a 
nuisance).

Similarly, a pixel (bitmap!) image file must encode the color of each 
pixel and a higher-level structuring of pixels into a a 2D array of rows 
of lines. Just as with text, there have been and still are multiple 
encoding at both levels. Also, similarly, the receiver of an image must 
know what encoding the sender used.

Vector graphics is a different way of encoding certain types of images, 
and again there are multiple ways to encode the information into bits. 
The encoding hassle here is similar to that for text. One of the 
frustrations of tk is that it natively uses just one old dialect of 
postscript (.ps) to output screen images. One has to find and install an 
extension to a modern Scaled Vector Graphics (.svg) encoding.

Because Python is programed with lines of text, it must come with 
minimal text decoding. If Python were programmed with drawings, it would 
come with one or more drawing decoders and a drawing equivalent of a 
lexer. It might even have special 'rd' (read drawing) mode for open.

-- 
Terry Jan Reedy





More information about the Python-list mailing list