Flexible string representation, unicode, typography, ...

Chris Angelico rosuav at gmail.com
Wed Aug 29 08:34:36 EDT 2012


On Wed, Aug 29, 2012 at 9:40 PM,  <wxjmfauth at gmail.com> wrote:
> For a given coding scheme, all code points/characters are
> equivalent. Expecting to handle a sub-range in a coding
> scheme without shaking that coding scheme is impossible.

Not all codepoints are equally likely. That's the whole point behind
variable-length encodings like Huffman compression (eg deflation as
used in zip/gzip), UTF-8, quoted-printable, and Morse code. They
handle a sub-range efficiently and the rest of the range less
efficiently.

> If a coding scheme does not give satisfaction, the only
> valid solution is to create a new coding scheme, cp1252,
> mac-roman, EBCDIC, ... or the interesting "TeX" case, where
> the "internal" coding depends on the fonts!

http://xkcd.com/927/

> This "Flexible String Representation" fails. Not only
> it is unable to stick with a coding scheme, it is
> a mixing of coding schemes, the worst of all possible
> implementations.

I propose, then, that we abolish files. Who *knows* how many different
things might be represented in a file! We need a single coding scheme
that can handle everything, without changing representation. This
ridiculous state of affairs must not go on; the same representation
can be used for bitmapped images or raw audio data!

ChrisA



More information about the Python-list mailing list