[Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects unicodeobject.c,2.21,2.22

M.-A. Lemburg mal@lemburg.com
Thu, 08 Jun 2000 17:20:23 +0200


Greg Stein wrote:
> 
> On Wed, Jun 07, 2000 at 02:13:24AM -0700, Marc-Andre Lemburg wrote:
> > Update of /cvsroot/python/python/dist/src/Objects
> > In directory slayer.i.sourceforge.net:/tmp/cvs-serv17917/Objects
> >
> > Modified Files:
> >       unicodeobject.c
> > Log Message:
> > Marc-Andre Lemburg <mal@lemburg.com>:
> > Change the default encoding to 'ascii' (it was previously
> > defined as UTF-8).
> >
> > Note: The implementation still uses UTF-8 to implement
> > the buffer protocol, so C APIs will still see UTF-8. This
> > is on purpose: rather than fixing the Unicode implementation,
> > the C APIs should be made Unicode aware.
> 
> I'm a little confused on where this gets applied. Is this when somebody says
> "str(unicode_ob)", they get back ASCII rather than UTF-8? Or is this when
> somebody says "unicode(str)" and we expect <str> to be ASCII?

The buffer protocol is used for "s", "t" and "s#" argument
parsing in C. Since these pass back pointers to internal
buffers of the object which must be kept alive until the
object is GCed, there's no easy way to change the
encoding of that buffer. 

To play safe I left that buffer
encoded in UTF-8 (the hash value is also computed from the
UTF-8 encoding of the unicode value to make it compatible to
ASCII 8-bit strings).

Note that both str(unicode) and unicode(str) will use
the default encoding. %-formatting and comparisons also
use the default encoding.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/