[Python-Dev] PEP: Adding data-type objects to Python

Sun Oct 29 02:18:04 CEST 2006

Martin v. Löwis wrote:
> Travis E. Oliphant schrieb:
>> In this case, the 'kind' does not specify how large the data-type is. 
>> You can have 'u1', 'u2', 'u4', etc.
>>
>> The same is true with Unicode.  You can have 10-character unicode 
>> elements, 20-character, etc.  But, we have to be clear about what a 
>> "character" is in the data-format.
> 
> That is certainly confusing. In u1, u2, u4, the digit seems to indicate
> the size of a single value (1 byte, 2 bytes, 4 bytes). Right? Yet,
> in U20, it does *not* indicate the size of a single value but of an
> array? And then, it's not the size, but the number of elements?
> 

Good point.  In NumPy, unicode support was added "in parallel" with 
string arrays where there is not the ambiguity.   So, yes, it's true 
that the unicode case is a special-case.

The other way to handle it would be to describe the 'code'-point size 
(i.e. 'U1', 'U2', 'U4' for UCS-1, UCS-2, UCS-4) and then have the length 
be encoded as an "array" of those types.

This was not the direction we took with NumPy (which is what I'm using 
as a reference) because I wanted Unicode and string arrays to look the 
same and thought of strings differently.

How to handle unicode data-formats could definitely be improved. 
Suggestions are welcome.

-Travis