Flexible string representation, unicode, typography, ...
Chris Angelico
rosuav at gmail.com
Wed Aug 29 08:34:36 EDT 2012
On Wed, Aug 29, 2012 at 9:40 PM, <wxjmfauth at gmail.com> wrote:
> For a given coding scheme, all code points/characters are
> equivalent. Expecting to handle a sub-range in a coding
> scheme without shaking that coding scheme is impossible.
Not all codepoints are equally likely. That's the whole point behind
variable-length encodings like Huffman compression (eg deflation as
used in zip/gzip), UTF-8, quoted-printable, and Morse code. They
handle a sub-range efficiently and the rest of the range less
efficiently.
> If a coding scheme does not give satisfaction, the only
> valid solution is to create a new coding scheme, cp1252,
> mac-roman, EBCDIC, ... or the interesting "TeX" case, where
> the "internal" coding depends on the fonts!
http://xkcd.com/927/
> This "Flexible String Representation" fails. Not only
> it is unable to stick with a coding scheme, it is
> a mixing of coding schemes, the worst of all possible
> implementations.
I propose, then, that we abolish files. Who *knows* how many different
things might be represented in a file! We need a single coding scheme
that can handle everything, without changing representation. This
ridiculous state of affairs must not go on; the same representation
can be used for bitmapped images or raw audio data!
ChrisA
More information about the Python-list
mailing list