[Chicago] understanding unicode problems
Pete
pfein at pobox.com
Fri Nov 16 17:19:15 CET 2007
On Friday November 16 2007 10:57:10 am Feihong Hsu wrote:
> There's probably no good, complete answer that can be given in a short
> email post. Basically, there's supposed to be a standard encoding for
> unicode: UTF-8. However, go to google.cn for instance and you'll see that
If this isn't outright wrong, it's at least confusing. AFAIK, there is no
official standard encoding, though I'd be happy to be corrected. UTF-8 has
become the de facto standard, because it's the most comprehensive and sane
without using an absurd number of bytes per character. There are a number of
other functionally similar encodings that aren't used all that much: UTF-7,
UTF-16.
> So we have to encode/decode because there is no standard encoding yet.
> That's why GB2312 and all those other bizarro encodings are packed into the
> Python standard library.
As for the need for other encodings, we've got 50 years of legacy documents
that aren't going to magically transform themselves to UTF-8.
--
Peter Fein || 773-575-0694 || pfein at pobox.com
http://www.pobox.com/~pfein/ || PGP: 0xCCF6AE6B
irc: pfein at freenode.net || jabber: peter.fein at gmail.com
More information about the Chicago
mailing list