[Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant
oliphant.travis at ieee.org
Sun Oct 29 02:18:04 CEST 2006
Martin v. Löwis wrote:
> Travis E. Oliphant schrieb:
>> In this case, the 'kind' does not specify how large the data-type is.
>> You can have 'u1', 'u2', 'u4', etc.
>>
>> The same is true with Unicode. You can have 10-character unicode
>> elements, 20-character, etc. But, we have to be clear about what a
>> "character" is in the data-format.
>
> That is certainly confusing. In u1, u2, u4, the digit seems to indicate
> the size of a single value (1 byte, 2 bytes, 4 bytes). Right? Yet,
> in U20, it does *not* indicate the size of a single value but of an
> array? And then, it's not the size, but the number of elements?
>
Good point. In NumPy, unicode support was added "in parallel" with
string arrays where there is not the ambiguity. So, yes, it's true
that the unicode case is a special-case.
The other way to handle it would be to describe the 'code'-point size
(i.e. 'U1', 'U2', 'U4' for UCS-1, UCS-2, UCS-4) and then have the length
be encoded as an "array" of those types.
This was not the direction we took with NumPy (which is what I'm using
as a reference) because I wanted Unicode and string arrays to look the
same and thought of strings differently.
How to handle unicode data-formats could definitely be improved.
Suggestions are welcome.
-Travis
More information about the Python-Dev
mailing list