More Unicode Trouble

M.-A. Lemburg mal at lemburg.com
Mon Dec 4 19:22:39 EST 2000


Alex Martelli wrote:
> 
> > and I can't find any documentation on how to handle Unicode.
> 
> I think (and I *dearly hope* I'm wrong) that the best doc
> is still Lemburg's "proposal", at
> http://www.lemburg.com/files/python/unicode-proposal.txt
> Just read it 'imagining' it's not a proposal but an actual
> description of how things work, mentalling translating all
> the 'should do so-and-so' into 'does so-and-so'.  Not sure
> why this rewording hasn't been actually performed (making
> this document, e.g., a chapter or appendix of the Python
> library reference, or whatever) -- as I say, I hope I'm wrong
> here and the current 'real official' docs already have all
> of this wealth of information, but I don't think they do.

The above file was indeed the proposal that was used as
basis for the Unicode implementation in Python. 

Many of the higher-level interfaces are already documented in the
standard documentation, but if you care about the internals
and all the gory details, then it still is the number one
reference.

Note that most of the trouble users have with Unicode comes from
not understanding the difference between 8-bit strings (without
any encoding information) and 2-byte (single encoding) Unicode --
these are really two different worlds and bringing them together
is a hard piece of work. 

Other languages which seem to make
Unicode easy typically don't make this distinction at all: they
simply use Unicode all the way. But this is something we can't
do just yet in Python since it would break too much code.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/



More information about the Python-list mailing list