[Python-Dev] PEP: Adding data-type objects to Python

Michael Chermside mcherm at mcherm.com
Tue Oct 31 14:26:35 CET 2006


In this email I'm responding to a series of emails from Travis
pretty much in the order I read them:

Travis Oliphant writes:
> I'm saying we should introduce a single-object mechanism for  
> describing binary data so that the many-object approach of c-types  
> does not become some kind of de-facto standard.  C-types can  
> "translate" this object-instance to its internals if and when it  
> needs to.
>
> In the mean-time, how are other packages supposed to communicate  
> binary information about data with each other?

Here we disagree.

I haven't used C-types. I have no idea whether it is well-designed or
horribly unusable. So if someone wanted to argue that C-types is a
mistake and should be thrown out, I'd be willing to listen. Until
someone tries to make that argument, I'm presuming it's good enough to
be part of the standard library for Python.

Given that, I think that it *SHOULD* become a de-facto standard. I
think that the way different packages should communicate binary information
about data with each other is using C-types. Not because it's wonderful
(remember, I've never used it), but because it's STANDARD. There should
be one obvious way to do things! When there is, it makes interoperability
WAY easier, and interoperability is the main objective when dealing with
things like binary data formats.

Propose using C-types. Or propose *improving* C-types. But don't propose
ignoring it.

In a different message, he writes:
> It also bothers me that so many ways to describe binary data are  
> being used out there.  This is a problem that deserves being solved.  
>  And, no, ctypes hasn't solved it (we can't directly use the ctypes  
> solution).

Really? Why? Is this a failing in C-types? Can C-types be "fixed"?

Later he explains:
> Remember the buffer protocol is in compiled code.  So, as a result,
>
> 1) It's harder to construct a class to pass through the protocol  
> using the multiple-types approach of ctypes.
>
> 2) It's harder to interpret the object recevied through the buffer protocol.
>
> Sure, it would be *possible* to use ctypes, but I think it would be  
> very difficult.  Think about how you would write the get_data_format  
> C function in the extended buffer protocol for NumPy if you had to  
> import ctypes and then build a class just to describe your data.   
> How would you interpret what you get back?

Aha! So what you REALLY ought to be asking for is a C interface to the
ctypes module. That seems like a very sensible and reasonable request.

> I don't think we should just *use ctypes because it's there* when  
> the way it describes binary data was not constructed with the  
> extended buffer protocol in mind.

I just disagree. (1) I *DO* think we should "just use ctypes because it's
there". After all, the problem we're trying to solve is one of
COMPATIBILITY - you don't solve those by introducing competing standards.
(2) From what I understand of it, I think ctypes is quite capable of
describing data to be accessed via the buffer protocol.

In another email:
> In order to make sense of the data-format object that I'm proposing  
> you have to see the need to share information about data-format  
> through an extended buffer protocol (which I will be proposing  
> soon).  I'm not going to try to argue that right now because there  
> are a lot of people who can do that.

Actually, no need to convince me... I am already convinced of the
wisdom of this approach.

> My view is that it is un-necessary to use a different type object to  
> describe each different data-type.
      [...]
> So, the big difference is that I think data-formats should be  
> *instances* of a single type.

Why? Who cares? Seriously, if we were proposing to describe the layouts
with a collection of rubber bands and potato chips, I'd say it was a
crazy idea. But we're proposing using data structures in a computer
memory. Why does it matter whether those data structures are of the same
"python type" or different "python types"? I care whether the structure
can be created, passed around, and interrogated. I don't care what
Python type they are.

> I'm saying that I don't like the idea of forcing this approach on  
> everybody else who wants to describe arbitrary binary data just  
> because ctypes is included.

And I'm saying that I *do*. Hey, if someone proposed getting rid of
the current syntax for the array module (for Py3K) and replacing it with
use of ctypes, I'd give it serious consideration. There should be only
one way to describe binary structures. It should be powerful enough to
describe almost any structure, easy-to-use, and most of all it should be
used consistently everywhere.

> I need some encouragement in order to continue to invest energy in  
> pushing this forward.

Please keep up the good work! Some day I'd like to see NumPy built in
to the standard Python distribution. The incremental, PEP by PEP approach
you are taking is the best route to getting there. But there may be
some changes along the way -- convergence with ctypes may be one of
those.

-------------

Look, my advice is to try to make ctypes work for you. Not having any
easy way to construct or to interrogate ctypes objects from C is a
legitimate complaint... and if you can define your requirements, it
should be relatively easy to add a C interface to meet those needs.

-- Michael Chermside



More information about the Python-Dev mailing list