[I18n-sig] Re: Unicode debate

Guido van Rossum guido@python.org
Mon, 01 May 2000 16:13:24 -0400


> MAL & GvR wrote:
> >> * cPickle.loads() doesn't like Unicode as data storage
> >
> >Hm, hard to fix.  Again, it really should use the buffer API, but it doesn't.
> 
> Why should it be fixed? Unicode as data storage??? The least we can do
> about the character string vs. data buffer discrepancy is discourage the
> use of Unicode strings as data storage, no?

Good point.  I was getting carried away by the idea that the -U option
implements (all strings are Unicode).  This is what JPython does, and
there strings *are* being used as data storage -- at a 100% overhead
cost.  We shouldn't copy this mistake though, and there are limits to
how far we can take -U.

Perhaps there should be an explicit prefix to force 8-bit strings?  I
think that a notation for 8-bit data is still useful, and string
literals with octal escapes are the most compact form I know!

--Guido van Rossum (home page: http://www.python.org/~guido/)