[Python-Dev] idea for data-type (data-format) PEP

"Martin v. Löwis" martin at v.loewis.de
Wed Nov 1 20:49:44 CET 2006


Travis E. Oliphant schrieb:
>> Or, if it does have uses independent of the buffer extension: what
>> are those uses?
> 
> So that NumPy and ctypes and audio libraries and video libraries and 
> database libraries and image-file format libraries can communicate about 
> data-formats using the same expressions (in Python).

I find that puzzling. In what way can the specification of a data type
enable communication? Don't you need some kind of protocol for it
(i.e. operations to be invoked)? Also, do you mean that these libraries
can communicate with each other? Or with somebody else? If so, with
whom?

> What problem do you have in defining a standard way to communicate about 
> binary data-formats (not just images)?  I still can't figure out why you 
> are so resistant to the idea.  MPI had to do it.

I'm afraid of "dead" specifications, things whose only motivation is
that they look nice. They are just clutter. There are a few examples
of this already in Python, like the character buffer interface or
the multi-segment buffers.

As for MPI: It didn't just independently define a data types system.
Instead, it did that, *and* specified the usage of the data types
in operations such as MPI_SEND. It is very clear what the scope of
this data description is, and what the intended usage is.

Without specifying an intended usage, it is impossible to evaluate
whether the specification meets its goals.

> Absolutely --- if something is to be made useful across packages and 
> from Python.   This is where the discussion should take place.  The 
> struct module and array modules would both be consumers also so that in 
> the struct module you could specify your structure in terms of the 
> standard data-represenation and in the array module you could specify 
> your array in terms of the standard representation instead of using 
> "character codes".

Ok, that would be a new usage: I expected that datatype instances
always come in pairs with memory allocated and filled according to
the description. If you are proposing to modify/extend the API
of the struct and array modules, you should say so somewhere (in
a PEP).

>> Suppose I wanted to change all RGB values to a gray value (i.e. R=G=B),
>> what would the C code look like that does that? (it seems now that the
>> primary purpose of this machinery is image manipulation)
>>
> 
> For me it is definitely not image manipulation that is the only purpose 
> (or even the primary purpose).  It's just an easy one to explain --- 
> most people understand images).   But, I think this question is actually 
> irrelevant (IMHO).  To me, how you change all RGB values to gray would 
> depend on the library you are using not on how data-formats are expressed.
> 
> Maybe we are still mis-understanding each other.

I expect that the primary readers/users of the PEP would be people who
have to write libraries: i.e. people implementing NumPy, struct, array,
and people who implement algorithms that operate on data. So usability
of the specification is a matter of how easy it is to *write* a library
that does perform the image manipulation.

> If you really want to know.  In NumPy it might look like this:
> 
> Python code:
> 
> img['r'] = img['g']
> img['b'] = img['g']

That's not what I'm asking. Instead, what does the NumPy code look
like that gets invoked on these read-and-write operations? Does it
only use the void* pointing to the start of the data, and the
datatype object? If not, how would C code look like that only has
the void* and the datatype object?

> dtype = img->descr;

In this code, is descr a datatype object? ...

> r_field = PyDict_GetItemString(dtype,'r');

... I guess not, because apparently, it is a dictionary, not
a datatype object.

> But, I still don't see how that is relevant to the question of how to 
> represent the data-format to share that information across two extensions.

Well, if NumPy gets the data from a different module, it can't assume
there is a descr object that is a dictionary. Instead, it must
perform these operations just by using the datatype object. What
else is the purpose of sharing the information, if not to use it
to access the data?

Regards,
Martin



More information about the Python-Dev mailing list