[Numpy-discussion] Latest Array-Interface PEP
Colin J. Williams
cjw at sympatico.ca
Thu Jan 4 18:04:21 EST 2007
Travis Oliphant wrote:
>
> I'm attaching my latest extended buffer-protocol PEP that is trying to
> get the array interface into Python. Basically, it is a translation of
> the numpy header files into something as simple as possible that can
> still be used to describe a complicated block of memory to another user.
>
> My purpose is to get feedback and criticisms from this community before
> display before the larger Python community.
>
> -Travis
>
>
It would help me to understand the proposal if it could be explained in
terms of the methods of the existing buffer class/type:
['__add__', '__class__', '__cmp__', '__delattr__', '__delitem__',
'__delslice__', '__doc__', '__getattribute__', '__getitem__',
'__getslice__', '__hash__', '__init__', '__len__', '__mul__', '__new__',
'__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__',
'__setitem__', '__setslice__', '__str__']
Numpy extends numarray's type/dtype object. This proposal appears to
revert to the old letter codes.
I have had very limited experience with C.
Colin W.
>
> ------------------------------------------------------------------------
>
> PEP: <unassigned>
> Title: Extending the buffer protocol to include the array interface
> Version: $Revision: $
> Last-Modified: $Date: $
> Author: Travis Oliphant <oliphant at ee.byu.edu>
> Status: Draft
> Type: Standards Track
> Created: 28-Aug-2006
> Python-Version: 2.6
>
> Abstract
>
> This PEP proposes extending the tp_as_buffer structure to include
> function pointers that incorporate information about the intended
> shape and data-format of the provided buffer. In essence this will
> place an array interface directly into Python.
>
> Rationale
>
> Several extensions to Python utilize the buffer protocol to share
> the location of a data-buffer that is really an N-dimensional
> array. However, there is no standard way to exchange the
> additional N-dimensional array information so that the data-buffer
> is interpreted correctly. The NumPy project introduced an array
> interface (http://numpy.scipy.org/array_interface.shtml) through a
> set of attributes on the object itself. While this approach
> works, it requires attribute lookups which can be expensive when
> sharing many small arrays.
>
> One of the key reasons that users often request to place something
> like NumPy into the standard library is so that it can be used as
> standard for other packages that deal with arrays. This PEP
> provides a mechanism for extending the buffer protocol (which
> already allows data sharing) to add the additional information
> needed to understand the data. This should be of benefit to all
> third-party modules that want to share memory through the buffer
> protocol such as GUI toolkits, PIL, PyGame, CVXOPT, PyVoxel,
> PyMedia, audio libraries, video libraries etc.
>
>
> Proposal
>
> Add bf_getarrview and bf_relarrview function pointers to the
> buffer protocol to allow objects to share a view on a memory
> pointer including information about accessing it as an
> N-dimensional array. Add the TP_HAS_ARRAY_BUFFER flag to types
> that define this extended buffer protocol.
>
> Also a few additionsl C-API calls should perhaps be added to Python
> to facilitate creating new PyArrViewObjects.
>
> Specification:
>
> static PyObject* bf_getarrayview (PyObject *obj)
>
> This function must return a new reference to a PyArrViewObject
> which contains the details of the array information exposed by the
> object. If failure occurs, then NULL is returned and an exception
> set.
>
> static int bf_relarrayview(PyObject *obj)
>
> If not NULL then this will be called when the object returned by
> bf_getarrview is destroyed so that the underlying object can be
> aware when acquired "views" are released.
>
> The object that defines bf_getarrview should not re-allocate memory
> (re-size itself) while views are extant. A 0 is returned on success
> and a -1 and an error condition set on failure.
>
> The PyArrayViewObject has the structure
>
> typedef struct {
> PyObject_HEAD
> void *data; /* pointer to the beginning of data */
> int nd; /* the number of dimensions */
> Py_ssize_t *shape; /* c-array of size nd giving shape */
> Py_ssize_t *strides; /* SEE BELOW */
> PyObject *base; /* the object this is a "view" of */
> PyObject *format; /* SEE BELOW */
> int flags; /* SEE BELOW */
> } PyArrayViewObject;
>
>
> strides -- a c-array of size nd providing the striding information
> which is the number of bytes to skip to get to the next element
> in that dimension.
>
> format -- a Python data-format object (PyDataFormatObject) which
> contains information about how each item in the array
> should be interpreted.
>
> flags -- an integer of flags. PYARR_WRITEABLE is the only flag
> that must be set appropriately by types.
> Other flags: PYARR_ALIGNED, PYARR_C_CONTIGUOUS,
> PYARR_F_CONTIGUOUS, and PYARR_NOTSWAPPED can all be determined
> from the rest of the PyArrayViewObject using the UpdateFlags C-API.
>
> The PyDataFormatObject has the structure
>
> typedef struct {
> PyObject_HEAD
> PySimpleformat primitive; /* basic primitive type */
> int flags; /* byte-order, isaligned */
> int itemsize; /* SEE BELOW */
> int alignment; /* SEE BELOW */
> PyObject *extended; /* SEE BELOW */
> } PyDataFormatObject;
>
> enum Pysimpleformat {PY_BIT='1', PY_BOOL='?', PY_BYTE='b', PY_SHORT='h', PY_INT='i',
> PY_LONG='l', PY_LONGLONG='q', PY_UBYTE='B', PY_USHORT='H', PY_UINT='I',
> PY_ULONG='L', PY_ULONGLONG='Q', PY_FLOAT='f', PY_DOUBLE='d', PY_LONGDOUBLE='g',
> PY_CFLOAT='F', PY_CDOUBLE='D', PY_CLONGDOUBLE='G', PY_OBJECT='O',
> PY_CHAR='c', PY_UCS2='u', PY_UCS4='w', PY_FUNCPTR='X', PY_VOIDPTR='V'};
>
> Each of these simple formats has a special character code which can be used to
> identify this primitive in a nested python list.
>
>
> flags -- flags for the data-format object. Specified masks are
> PY_NATIVEORDER
> PY_BIGENDIAN
> PY_LITTLEENDIAN
> PY_IGNORE
>
> itemsize -- the total size represented by this data-format in bytes unless the
> primitive is PY_BIT in which case it is the size in bits.
> For data-formats that are simple 1-d arrays of the underlying primitive,
> this total size can represent more than one primitive (with extended
> still NULL).
>
> alignment -- For the primitive types this is offsetof(struct {char c; type v;},v)
>
> extended -- NULL if this is a primitive data-type or no additional information is
> available.
>
> If primitive is PY_FUNCPTR, then this can be a tuple with >=1 element:
> (args, {dim0, dim1, dim2, ...}).
>
> args -- A list (of at least length 2) of data-format objects
> specifying the input argument formats with the last
> argument specifying the output argument data-format
> (use None for void inputs and/or outputs).
>
> For other primitives, this can be a tuple with >=2 elements:
> (names, fields, {dim0, dim1, dim2, ...})
> Use None for both names and fields if they should be ignored.
>
> names -- An ordered list of string or unicode objects giving the names
> of the fields for a structure data-format.
> fields -- a Python dictionary with ordered-keys given by the list
> in names. Each entry in the dictionary is
> a 3-tuple containing (data-format-object, offset,
> meta-information) where meta-information is Py_None if there
> is no meta-information. Offset is given in bytes from the
> start of the record or in bits if PY_BIT is the primitive.
>
> Any additional entries in the extended tuple (dim0,
> dim1, etc.) are interpreted as integers which specify
> that this data-format is an array of the given shape
> of the fundamental data-format specified by the
> remainder of the DataFormat Object. The dimensions
> are specified so that the last-index is always assumed
> to vary the fastest (C-order).
>
>
> The constructor of a PyArrViewObject allocates the memory for shape and strides
> and the destructor frees that memory.
>
> The constructor of a PyDataFormatObject allocates the objects it needs for fields,
> names, and shape.
>
> C-API
>
> void PyArrayView_UpdateFlags(PyObject *view, int flags)
> /* update the flags on the array view object provided */
>
> PyDataFormatObject *Py_NewSimpleFormat(Pysimpleformat primitive)
> /* return a new primitive data-format object */
>
> PyDataFormatObject *Py_DataFormatFromCType(PyObject *ctype)
> /* return a new data-format object from a ctype */
>
> int Py_GetPrimitiveSize(Pysimpleformat primitive)
> /* return the size (in bytes) of the provided primitive */
>
> PyDataFormatObject *Py_AlignDataFormat(PyObject *format)
> /* take a data-format object and construct an aligned data-format
> object where all fields are aligned on appropriate boundaries
> for the compiler */
>
>
> Discussion
>
> The information provided in the array view object is patterned
> after the way a multi-dimensional array is defined in NumPy -- including
> the data-format object which allows a variety of descriptions of memory
> depending on the need.
>
> Reference Implementation
>
> Supplied when the PEP is accepted.
>
> Copyright
>
> This document is placed in the public domain.
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
More information about the NumPy-Discussion
mailing list