[Python-Dev] idea for data-type (data-format) PEP

Wed Nov 1 22:41:38 CET 2006

Martin v. Löwis wrote:

>Travis Oliphant schrieb:
>  
>
>>>>r_field = PyDict_GetItemString(dtype,'r');
>>>>    
>>>>        
>>>>
>>Actually it should read PyDict_GetItemString(dtype->fields).    The
>>r_field is a tuple (data-type object, offset).  The fields attribute is
>>(currently) a Python dictionary.
>>    
>>
>
>Ok. This seems to be missing in the PEP. 
>
Yeah,  actually quite a bit is missing.  Because I wanted to float the 
idea for discussion before "getting the details perfect"  (which of 
course they wouldn't be if it was just my input producing them).

>In this code, where is PyArray_GetField coming from?
>
This is a NumPy Specific C-API.    That's why I was confused about why 
you wanted me to show how I would do it. 

But, what you are actually asking is how would another application use 
the data-type information to do the same thing using the data-type 
object and a pointer to memory.  Is that correct?

This is a reasonable thing to request.  And your example is a good one.  
I will use the PEP to explain it.

Ultimately, the code you are asking for will have to have some kind of 
dispatch table for different binary code depending on the actual 
data-types being shared (unless all that is needed is a copy in which 
case just the size of the element area can be used).  In my experience, 
the dispatch table must be present for at least the "simple" 
data-types.  The data-types built up from there can depend on those.

In NumPy, the data-type objects have function pointers to accomplish all 
the things NumPy does quickly.  So, each data-type object in NumPy 
points to a function-pointer table and the NumPy code defers to it to 
actually accomplish the task (much like Python really).

Not all libraries will support working with all data-types.  If they 
don't support it, they just raise an error indicating that it's not 
possible to share that kind of data. 

> What does
>it do? If I wanted to write this code from scratch, what
>should I write instead? Since this is all about a flat
>memory block, I'm surprised I need "true" Python objects
>for the field values in there.
>  
>
Well, actually, the block could be "strided" as well. 

So, you would write something that gets the pointer to the memory and 
then gets the extended information (dimensionality, shape, and strides, 
and data-format object).  Then, you would get the offset of the field 
you are interested in from the start of the element (it's stored in the 
data-format representation).

Then do a memory copy from the right place (using the array iterator in 
NumPy you can actually do it without getting the shape and strides 
information first but I'm holding off on that PEP until an N-d array is 
proposed for Python).   I'll write something like that as an example and 
put it in the PEP for the extended buffer protocol.  

-Travis

>  
>
>>But, the other option (especially for code already written) would be to
>>just convert the data-format specification into it's own internal
>>representation.
>>    
>>
>
>Ok, so your assumption is that consumers already have their own
>machinery, in which case ease-of-use would be the question how
>difficult it is to convert datatype objects into the internal
>representation.
>
>Regards,
>Martin
>  
>