[I18n-sig] Re: Unicode debate

Guido van Rossum guido@python.org
Mon, 01 May 2000 13:49:47 -0400


> Here's a list of what I've found by running some of the
> regression tests:
> 
> * import string fails due to the way _idtable is constructed

Hm, I don't see this -- string.py imports just fine.  There's no
_idtable in my copy of string.py?!?!

> * getattr() doesn't like Unicode as second argument, same for
>   delattr() and hasattr()
> * eval() expects a string object

These should all be fixed.

> * there still are some string exceptions around in the regr.
>   tests which cause a failure (Unicode exceptions don't work)

Interesting.  One more reason to drop string exceptions sometime in
the future.

> * struct.pack('s') doesn't like Unicode as argument

Fix it.

> * re doesn't work: pcre_expand() needs a string object

Fix it, but with low priority (the expectation is that sre will replace
pcre in 1.6a3).

> * regex doesn't work either because string objects are hard-coded

Don't fix (regex is obsolete, only kept around because it used to be
very common).

> * mmap doesn't like Unicode: "mmap assignment must be
>   single-character string"

Yes, this has 8-bit string written all over it.  It really should be
using the buffer API rather than requiring strings!

> * cPickle.loads() doesn't like Unicode as data storage

Hm, hard to fix.  Again, it really should use the buffer API, but it doesn't.

> * keywords must be strings (f(1, 2, 3, **{'a':4, 'b':5}) doesn't work)

How hard would this be to fix?

> * rotor doesn't work

Not very important.

> Some of these could be fixed by putting a str() call around
> the '...' constants. Others need fixes in C code. Yet others
> would be better off if they used the buffer interfaces (basically
> all APIs which work on raw data like cPickle or rotor).

What I said. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)