[Numpy-discussion] Latest Array-Interface PEP

Colin J. Williams cjw at sympatico.ca
Thu Jan 4 18:04:21 EST 2007


Travis Oliphant wrote:
> 
> I'm attaching my latest extended buffer-protocol PEP that is trying to 
> get the array interface into Python.  Basically, it is a translation of 
> the numpy header files into something as simple as possible that can 
> still be used to describe a complicated block of memory to another user.
> 
> My purpose is to get feedback and criticisms from this community before 
> display before the larger Python community.
> 
> -Travis
> 
> 
It would help me to understand the proposal if it could be explained in 
terms of the methods of the existing buffer class/type:
['__add__', '__class__', '__cmp__', '__delattr__', '__delitem__', 
'__delslice__', '__doc__', '__getattribute__', '__getitem__', 
'__getslice__', '__hash__', '__init__', '__len__', '__mul__', '__new__', 
'__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', 
'__setitem__', '__setslice__', '__str__']

Numpy extends numarray's type/dtype object.  This proposal appears to 
revert to the old letter codes.

I have had very limited experience with C.

Colin W.
> 
> ------------------------------------------------------------------------
> 
> PEP: <unassigned>
> Title: Extending the buffer protocol to include the array interface
> Version: $Revision: $
> Last-Modified: $Date:  $
> Author: Travis Oliphant <oliphant at ee.byu.edu>
> Status: Draft
> Type: Standards Track
> Created: 28-Aug-2006
> Python-Version: 2.6
> 
> Abstract
> 
>     This PEP proposes extending the tp_as_buffer structure to include 
>     function pointers that incorporate information about the intended
>     shape and data-format of the provided buffer.  In essence this will
>     place an array interface directly into Python. 
> 
> Rationale
> 
>     Several extensions to Python utilize the buffer protocol to share
>     the location of a data-buffer that is really an N-dimensional
>     array.  However, there is no standard way to exchange the
>     additional N-dimensional array information so that the data-buffer
>     is interpreted correctly.  The NumPy project introduced an array
>     interface (http://numpy.scipy.org/array_interface.shtml) through a
>     set of attributes on the object itself.  While this approach
>     works, it requires attribute lookups which can be expensive when
>     sharing many small arrays.  
> 
>     One of the key reasons that users often request to place something
>     like NumPy into the standard library is so that it can be used as
>     standard for other packages that deal with arrays.  This PEP
>     provides a mechanism for extending the buffer protocol (which
>     already allows data sharing) to add the additional information
>     needed to understand the data.  This should be of benefit to all
>     third-party modules that want to share memory through the buffer
>     protocol such as GUI toolkits, PIL, PyGame, CVXOPT, PyVoxel,
>     PyMedia, audio libraries, video libraries etc.
> 
> 
> Proposal
>  
>     Add bf_getarrview and bf_relarrview function pointers to the
>     buffer protocol to allow objects to share a view on a memory
>     pointer including information about accessing it as an
>     N-dimensional array. Add the TP_HAS_ARRAY_BUFFER flag to types
>     that define this extended buffer protocol.
> 
>     Also a few additionsl C-API calls should perhaps be added to Python
>     to facilitate creating new PyArrViewObjects. 
> 
> Specification:
>     
>     static PyObject* bf_getarrayview (PyObject *obj)
> 
>     This function must return a new reference to a PyArrViewObject
>     which contains the details of the array information exposed by the
>     object.  If failure occurs, then NULL is returned and an exception 
>     set.  
> 
>     static int bf_relarrayview(PyObject *obj)
> 
>     If not NULL then this will be called when the object returned by 
>     bf_getarrview is destroyed so that the underlying object can be
>     aware when acquired "views" are released.  
>     
>     The object that defines bf_getarrview should not re-allocate memory
>     (re-size itself) while views are extant.  A 0 is returned on success 
>     and a -1 and an error condition set on failure.
> 
>     The PyArrayViewObject has the structure 
> 
>     typedef struct {
>          PyObject_HEAD
>          void *data;             /* pointer to the beginning of data */
>          int nd;                 /* the number of dimensions */
>          Py_ssize_t *shape;      /* c-array of size nd giving shape */
>          Py_ssize_t *strides;    /* SEE BELOW */
>          PyObject *base;         /* the object this is a "view" of */
>          PyObject *format;       /* SEE BELOW */
>          int flags;              /* SEE BELOW */
>     } PyArrayViewObject;
> 
>     
>     strides -- a c-array of size nd providing the striding information
>        which is the number of bytes to skip to get to the next element 
>        in that dimension. 
> 
>     format -- a Python data-format object (PyDataFormatObject) which
>               contains information about how each item in the array 
>               should be interpreted.
> 
>     flags   -- an integer of flags.  PYARR_WRITEABLE is the only flag 
>                   that must be set appropriately by types. 
>                   Other flags: PYARR_ALIGNED, PYARR_C_CONTIGUOUS,
>                   PYARR_F_CONTIGUOUS, and PYARR_NOTSWAPPED can all be determined
>                   from the rest of the PyArrayViewObject using the UpdateFlags C-API.
> 
>     The PyDataFormatObject has the structure
> 
>     typedef struct {
>          PyObject_HEAD
>          PySimpleformat primitive;  /* basic primitive type */
>          int flags;                 /* byte-order, isaligned */
>          int itemsize;              /* SEE BELOW */
>          int alignment;             /* SEE BELOW */
>          PyObject *extended;        /* SEE BELOW */
>     } PyDataFormatObject;
> 
>     enum Pysimpleformat {PY_BIT='1', PY_BOOL='?', PY_BYTE='b', PY_SHORT='h', PY_INT='i',
>      PY_LONG='l', PY_LONGLONG='q', PY_UBYTE='B', PY_USHORT='H', PY_UINT='I', 
>      PY_ULONG='L', PY_ULONGLONG='Q', PY_FLOAT='f', PY_DOUBLE='d', PY_LONGDOUBLE='g',
>      PY_CFLOAT='F', PY_CDOUBLE='D', PY_CLONGDOUBLE='G', PY_OBJECT='O', 
>      PY_CHAR='c', PY_UCS2='u', PY_UCS4='w', PY_FUNCPTR='X', PY_VOIDPTR='V'};
> 
>      Each of these simple formats has a special character code which can be used to
>      identify this primitive in a nested python list.
> 
> 
>     flags -- flags for the data-format object.  Specified masks are
>                 PY_NATIVEORDER
>                 PY_BIGENDIAN
>                 PY_LITTLEENDIAN
>                 PY_IGNORE
> 
>     itemsize -- the total size represented by this data-format in bytes unless the 
>                 primitive is PY_BIT in which case it is the size in bits.  
>                 For data-formats that are simple 1-d arrays of the underlying primitive, 
>                 this total size can represent more than one primitive (with extended
>                 still NULL).
> 
>     alignment -- For the primitive types this is offsetof(struct {char c; type v;},v)
> 
>     extended -- NULL if this is a primitive data-type or no additional information is 
>                 available.
> 
>                 If primitive is PY_FUNCPTR, then this can be a tuple with >=1 element:
>                 (args, {dim0, dim1, dim2, ...}). 
>                 
>                   args -- A list (of at least length 2) of data-format objects
>                           specifying the input argument formats with the last
>                           argument specifying the output argument data-format
>                           (use None for void inputs and/or outputs).
> 
>                 For other primitives, this can be a tuple with >=2 elements: 
>                 (names, fields, {dim0, dim1, dim2, ...})
>                 Use None for both names and fields if they should be ignored.
> 
>                   names -- An ordered list of string or unicode objects giving the names
>                            of the fields for a structure data-format.
>                   fields -- a Python dictionary with ordered-keys given by the list 
>                             in names. Each entry in the dictionary is  
>                             a 3-tuple containing (data-format-object, offset, 
>                             meta-information) where meta-information is Py_None if there 
>                             is no meta-information. Offset is given in bytes from the 
>                             start of the record or in bits if PY_BIT is the primitive.
> 
>                 Any additional entries in the extended tuple (dim0,
>                 dim1, etc.) are interpreted as integers which specify
>                 that this data-format is an array of the given shape
>                 of the fundamental data-format specified by the
>                 remainder of the DataFormat Object.  The dimensions
>                 are specified so that the last-index is always assumed
>                 to vary the fastest (C-order).
> 
>                     
>      The constructor of a PyArrViewObject allocates the memory for shape and strides
>          and the destructor frees that memory.
> 
>      The constructor of a PyDataFormatObject allocates the objects it needs for fields, 
>          names, and shape.
> 
> C-API 
> 
>     void PyArrayView_UpdateFlags(PyObject *view, int flags)
>          /* update the flags on the array view object provided */
> 
>     PyDataFormatObject *Py_NewSimpleFormat(Pysimpleformat primitive)
>          /* return a new primitive data-format object */
> 
>     PyDataFormatObject *Py_DataFormatFromCType(PyObject *ctype)
>          /* return a new data-format object from a ctype */
> 
>     int Py_GetPrimitiveSize(Pysimpleformat primitive)
>          /* return the size (in bytes) of the provided primitive */
> 
>     PyDataFormatObject *Py_AlignDataFormat(PyObject *format)
>          /* take a data-format object and construct an aligned data-format
>             object where all fields are aligned on appropriate boundaries 
>             for the compiler */
>             
> 
> Discussion
> 
>     The information provided in the array view object is patterned
>     after the way a multi-dimensional array is defined in NumPy -- including
>     the data-format object which allows a variety of descriptions of memory
>     depending on the need. 
> 
> Reference Implementation
> 
>     Supplied when the PEP is accepted. 
> 
> Copyright
> 
>     This document is placed in the public domain.
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list