[Numpy-discussion] Bytes vs. Unicode in Python3

Pauli Virtanen pav at iki.fi
Fri Nov 27 07:41:54 EST 2009


pe, 2009-11-27 kello 13:23 +0100, René Dudfield kirjoitti:
[clip]
> I imagine dtype 'S' and 'U' need more clarification.  As it misses the
> concept of encodings it seems?  Currently, S appears to mean 8bit
> characters no encoding, and U appears to mean 16bit characters no
> encoding?  Or are some sort of default encodings assumed?

Currently in Numpy in Python 2, 'S' is the same as Python 3 bytes, 'U'
is same as Python 3 unicode and probably in same internal representation
(need to check). Neither is associated with encoding info.

We need probably to change the meaning of 'S', as Francesc noted, and
add a separate bytes dtype.

> 2to3/3to2 fixers will probably have to be written for users code
> here... whatever is decided.  At least warnings should be generated
> I'm guessing.

Possibly. Does 2to3 support plugins? If yes, it could be possible to
write one.

> btw, in my numpy tree there is a unicode_() alias to str in py3, and
> to unicode in py2 (inside the compat.py file).  This helped us in many
> cases with compatible string code in the pygame port.  This allows you
> to create unicode strings on both platforms with the same code.

Yes, I saw that. The name unicode_ is however already taken by the Numpy
scalar type, so we need to think of a different name for it. asstring,
maybe.

Btw, do you want to rebase your distutils changes on top of my tree? I
tried yours out quickly, but there were some issues there that prevented
distutils from working. (Also, you can use absolute imports both for
Python 2 and 3 -- there's probably no need to use relative imports.)

	Pauli





More information about the NumPy-Discussion mailing list