[Python-Dev] PEP: Adding data-type objects to Python

Thu Nov 2 00:42:25 CET 2006

Travis Oliphant <oliphant.travis <at> ieee.org> writes:
> Frankly, I'd be happy enough to start with 
> "typecodes" in the extended buffer protocol (that's where the array 
> module is now) and then move up to something more complete later.
> 

Let's just start with that.  The way I see the problem is that buffer protocol
is fine as long as your data is an array of bytes, but if it is an array of
doubles, you are out of luck. So, while I can do

>>> b = buffer(array('d', [1,2,3]))

there is not much that I can do with b.  For example, if I want to pass it to
numpy, I will have to provide the type and shape information myself:

>>> numpy.ndarray(shape=(3,), dtype=float, buffer=b)
array([ 1.,  2.,  3.])

With the extended buffer protocol, I should be able to do

>>> numpy.array(b)

So let's start by solving this problem and limit it to data that can be found
in a standard library array.  This way we can postpone the discussion of shapes,
strides and nested structs.

I propose a simple bf_gettypeinfo(PyObject *obj, int* type, int* bitsize) method
that would return a type code and the size of the data item.

I believe it is better to have type codes free from size information for
several reasons:

1. Generic code can use size information directly without having to know
that int is 32 and double is 64 bits.

2. Odd sizes can be easily described without having to add a new type code.

3. I assume that the existing bf_ functions would still return size in bytes,
so having item size available as an int will help to get number of items.

If we manage to agree on the standard way to pass primitive type information,
it will be a big achievement and immediately useful because simple arrays are
already in the standard library.