Flexible string representation, unicode, typography, ...

Ben Finney ben+python at benfinney.id.au
Sat Aug 25 03:54:55 EDT 2012


wxjmfauth at gmail.com writes:

> Unicode design: a flat table of code points, where all code
> points are "equals".

Yes, Unicode's design entails a flat table of hundreds of thousands of
code points, expansible in future.

This is in direct conflict with the design of all significant computers
we need to write software for: data stored and transported as 8-bit
bytes, which can only ever hold 256 different values, no expansion.

> As soon as one attempts to escape from this rule, one has to
> "pay" for it.

Yes, in either direction; the conflict means that trade-offs need to be
made.

See this presentation by Ned Batchelder, “Pragmatic Unicode”
<URL:http://nedbatchelder.com/text/unipain.html>, which lays out the
fundamental conflict of representing human text in computer data; and
several practical approaches to deal with it.

-- 
 \      “I busted a mirror and got seven years bad luck, but my lawyer |
  `\                        thinks he can get me five.” —Steven Wright |
_o__)                                                                  |
Ben Finney



More information about the Python-list mailing list