[Numpy-svn] r3588 - trunk/numpy/doc
numpy-svn at scipy.org
numpy-svn at scipy.org
Tue Mar 20 14:43:32 EDT 2007
Author: cookedm
Date: 2007-03-20 13:43:28 -0500 (Tue, 20 Mar 2007)
New Revision: 3588
Modified:
trunk/numpy/doc/pep_buffer.txt
Log:
ReSTify pep_buffer.txt
(hope that's ok, Travis)
Modified: trunk/numpy/doc/pep_buffer.txt
===================================================================
--- trunk/numpy/doc/pep_buffer.txt 2007-03-20 10:34:04 UTC (rev 3587)
+++ trunk/numpy/doc/pep_buffer.txt 2007-03-20 18:43:28 UTC (rev 3588)
@@ -1,332 +1,365 @@
-PEP: <unassigned>
-Title: Revising the buffer protocol
-Version: $Revision: $
-Last-Modified: $Date: $
-Author: Travis Oliphant <oliphant at ee.byu.edu>
-Status: Draft
-Type: Standards Track
-Created: 28-Aug-2006
-Python-Version: 3000
+:PEP: XXX
+:Title: Revising the buffer protocol
+:Version: $Revision: $
+:Last-Modified: $Date: $
+:Author: Travis Oliphant <oliphant at ee.byu.edu>
+:Status: Draft
+:Type: Standards Track
+:Content-Type: text/x-rst
+:Created: 28-Aug-2006
+:Python-Version: 3000
Abstract
+========
- This PEP proposes re-designing the buffer API (PyBufferProcs
- function pointers) to improve the way Python allows memory sharing
- in Python 3.0
+This PEP proposes re-designing the buffer API (PyBufferProcs
+function pointers) to improve the way Python allows memory sharing
+in Python 3.0
- In particular, it is proposed that the multiple-segment and
- character buffer portions of the buffer API are eliminated and
- additional function pointers are provided to allow sharing any
- multi-dimensional nature of the memory and what data-format the
- memory contains.
+In particular, it is proposed that the multiple-segment and
+character buffer portions of the buffer API are eliminated and
+additional function pointers are provided to allow sharing any
+multi-dimensional nature of the memory and what data-format the
+memory contains.
Rationale
+=========
- The buffer protocol allows different Python types to exchange a
- pointer to a sequence of internal buffers. This functionality is
- *extremely* useful for sharing large segments of memory between
- different high-level objects, but it's too limited and has issues.
+The buffer protocol allows different Python types to exchange a
+pointer to a sequence of internal buffers. This functionality is
+*extremely* useful for sharing large segments of memory between
+different high-level objects, but it's too limited and has issues.
- 1. There is the little (never?) used "sequence-of-segments" option
- (bf_getsegcount)
+1. There is the little (never?) used "sequence-of-segments" option
+ (bf_getsegcount)
- 2. There is the apparently redundant character-buffer option
- (bf_getcharbuffer)
+2. There is the apparently redundant character-buffer option
+ (bf_getcharbuffer)
- 3. There is no way for a consumer to tell the buffer-API-exporting
- object it is "finished" with its view of the memory and
- therefore no way for the exporting object to be sure that it is
- safe to reallocate the pointer to the memory that it owns (for
- example, the array object reallocating its memory after sharing
- it with the buffer object which held the original pointer led
- to the infamous buffer-object problem).
+3. There is no way for a consumer to tell the buffer-API-exporting
+ object it is "finished" with its view of the memory and
+ therefore no way for the exporting object to be sure that it is
+ safe to reallocate the pointer to the memory that it owns (for
+ example, the array object reallocating its memory after sharing
+ it with the buffer object which held the original pointer led
+ to the infamous buffer-object problem).
- 4. Memory is just a pointer with a length. There is no way to
- describe what is "in" the memory (float, int, C-structure, etc.)
+4. Memory is just a pointer with a length. There is no way to
+ describe what is "in" the memory (float, int, C-structure, etc.)
- 5. There is no shape information provided for the memory. But,
- several array-like Python types could make use of a standard
- way to describe the shape-interpretation of the memory
- (wxPython, GTK, pyQT, CVXOPT, PyVox, Audio and Video
- Libraries, ctypes, NumPy, data-base interfaces, etc.)
+5. There is no shape information provided for the memory. But,
+ several array-like Python types could make use of a standard
+ way to describe the shape-interpretation of the memory
+ (wxPython, GTK, pyQT, CVXOPT, PyVox, Audio and Video
+ Libraries, ctypes, NumPy, data-base interfaces, etc.)
- There are two widely used libraries that use the concept of
- discontiguous memory: PIL and NumPy. Their view of discontiguous
- arrays is a bit different, though. NumPy uses the notion of
- constant striding in each dimension as its basic concept of an
- array. In this way a simple sub-region of a larger array can be
- described without copying the data. Strided memory is also a common
- way to describe data in many computing libraries (such as the BLAS
- and LAPACK).
+There are two widely used libraries that use the concept of
+discontiguous memory: PIL and NumPy. Their view of discontiguous
+arrays is a bit different, though. NumPy uses the notion of
+constant striding in each dimension as its basic concept of an
+array. In this way a simple sub-region of a larger array can be
+described without copying the data. Strided memory is also a common
+way to describe data in many computing libraries (such as the BLAS
+and LAPACK).
- The PIL uses a more opaque memory representation. Sometimes an
- image is contained in a contiguous segment of memory, but
- sometimes it is contained in an array of pointers to the
- contiguous segments (usually lines) of the image. This allows the
- image to not be loaded entirely into memory but still managed
- abstractly as if it were. I believe, the PIL is where the idea of
- multiple buffer segments in the original buffer interface came
- from, I believe.
+The PIL uses a more opaque memory representation. Sometimes an
+image is contained in a contiguous segment of memory, but
+sometimes it is contained in an array of pointers to the
+contiguous segments (usually lines) of the image. This allows the
+image to not be loaded entirely into memory but still managed
+abstractly as if it were. I believe, the PIL is where the idea of
+multiple buffer segments in the original buffer interface came
+from, I believe.
- The buffer interface should allow discontiguous memory areas to
- share standard striding information. However, consumers that do
- not want to deal with strided memory should also be able to
- request a contiguous segment easily.
+The buffer interface should allow discontiguous memory areas to
+share standard striding information. However, consumers that do
+not want to deal with strided memory should also be able to
+request a contiguous segment easily.
-
Proposal Overview
+=================
- * Eliminate the char-buffer and multiple-segment sections of the
- buffer-protocol.
+* Eliminate the char-buffer and multiple-segment sections of the
+ buffer-protocol.
- * Unify the read/write versions of getting the buffer.
+* Unify the read/write versions of getting the buffer.
- * Add a new function to the interface that should be called when
- the consumer object is "done" with the view.
+* Add a new function to the interface that should be called when
+ the consumer object is "done" with the view.
- * Add a new memory_view object that is returned from the
- buffer interface getbuffer call. This memory_view object
- contains
- * Add a new function to allow the interface to describe what is in
- memory (unifying what is currently done now in struct and
- array)
+* Add a new memory_view object that is returned from the
+ buffer interface getbuffer call. This memory_view object
+ contains
+* Add a new function to allow the interface to describe what is in
+ memory (unifying what is currently done now in struct and
+ array)
- * Add a new function to allow the protocol to share shape and
- stride information
+* Add a new function to allow the protocol to share shape and
+ stride information
- * Fix all objects in the core and the standard library to conform
- to the new interface
+* Fix all objects in the core and the standard library to conform
+ to the new interface
- * Extend the struct module to handle more format specifiers
+* Extend the struct module to handle more format specifiers
Specification
+=============
- Change the PyBufferProcs structure to
+Change the PyBufferProcs structure to
+::
+
typedef struct {
getbufferproc bf_getbuffer
releasebufferproc bf_releasebuffer
}
+::
+
typedef PyObject *(*getbufferproc)(PyObject *obj, void **buf,
Py_ssize_t *len, int *writeable,
char **format, int *ndims,
Py_ssize_t **shape,
Py_ssize_t **strides)
-
- Return a pointer to memory in *buf and the length of that memory
- buffer (in bytes) in *len. The next arguments are optional.
- NULL is returned on failure. On success an oject-specific
- view is returned (which may just be a borrowed reference to obj).
- This view should be passed to bf_releasebuffer when the consumer
- is done with the view.
- writeable -- address of an integer variable to hold
- whether or not the memory is writeable.
- If this is NULL, then you must assume the memory
- is read-only.
- format -- address of a format-string (following extended struct
- syntax) indicating what is in each element of
- of memory. The number of elements is len / itemsize,
- where itemsize is the number of bytes implied by the format.
- NULL if not needed in which case format == "B" for
- unsigned bytes. The memory for this string must not
- be freed by the consumer --- it is managed by the exporter.
- ndims -- address of a variable storing the number of dimensions
- or NULL if not needed. If shape and/or strides are given
- then this must be non NULL. If this variable is
- not provided then it is assumed that *ndims == 1
- shape -- address of a Py_ssize_t* variable that will be filled
- with a pointer to an array of Py_ssize_t of length *ndims
- indicating the shape of the memory as an N-D array.
- Ignored if this is NULL. Note that
- ((*shape)[0] * ... * (*shape)[ndims-1])*itemsize = len
- If this variable is not provided then it is assumed that
- (*shape[0]) == len / itemsize.
- stride -- address of a Py_ssize_t* variable that will be filled
- with a pointer to an array of Py_ssize_t of length *ndims
- indicating the number of bytes to skip to get to the next
- element in each dimension. If this is NULL, then
- the memory is assumed to be C-style contigous with
- the last dimension varying the fastest. An
- error should be raised if this is not accurate and
- strides are not requested. This variable may be
- set to NULL when called if memory is C-style
- contiguous.
-
- This view object should be used in the other API call and
- does not need to be decref'd. It should be "released" if the
- interface exporter provides the bf_releasebuffer function.
+Return a pointer to memory in ``*buf`` and the length of that memory
+buffer (in bytes) in ``*len``. The next arguments are optional.
+NULL is returned on failure. On success an oject-specific
+view is returned (which may just be a borrowed reference to obj).
+This view should be passed to bf_releasebuffer when the consumer
+is done with the view.
- typedef int (*releasebufferproc)(PyObject *view)
+writeable
+ address of an integer variable to hold whether or not the memory
+ is writeable. If this is NULL, then you must assume the memory
+ is read-only.
- This function is called (if defined by the exporting object)
- when a view of memory previously acquired from the object is no
- longer needed. It is up to the exporter of the API to make sure
- all views have been released before re-allocating any previously
- shared memory. It is up to consumers of the API to call this
- function on the object whose view is obtained when it is no
- longer needed. Any format string, shape array or strides array
- returned through the interface should also not be referenced after
- the releasebuffer call is made.
- A -1 is returned on error and 0 on success.
+format
+ address of a format-string (following extended struct
+ syntax) indicating what is in each element of
+ of memory. The number of elements is len / itemsize,
+ where itemsize is the number of bytes implied by the format.
+ NULL if not needed in which case format is "B" for
+ unsigned bytes. The memory for this string must not
+ be freed by the consumer --- it is managed by the exporter.
- Both of these routines are optional for a type object
+ndims
+ address of a variable storing the number of dimensions
+ or NULL if not needed. If shape and/or strides are given
+ then this must be non NULL. If this variable is
+ not provided then it is assumed that ``*ndims == 1``.
+shape
+ address of a ``Py_ssize_t*`` variable that will be filled
+ with a pointer to an array of ``Py_ssize_t`` of length ``*ndims``
+ indicating the shape of the memory as an N-D array.
+ Ignored if this is NULL. Note that
+ ``((*shape)[0] * ... * (*shape)[ndims-1])*itemsize = len``.
+ If this variable is not provided then it is assumed that
+ ``(*shape[0]) == len / itemsize``.
+stride
+ address of a ``Py_ssize_t*`` variable that will be filled
+ with a pointer to an array of ``Py_ssize_t`` of length ``*ndims``
+ indicating the number of bytes to skip to get to the next
+ element in each dimension. If this is NULL, then
+ the memory is assumed to be C-style contigous with
+ the last dimension varying the fastest. An
+ error should be raised if this is not accurate and
+ strides are not requested. This variable may be
+ set to NULL when called if memory is C-style
+ contiguous.
+
+ This view object should be used in the other API call and
+ does not need to be decref'd. It should be "released" if the
+ interface exporter provides the bf_releasebuffer function.
+
+``typedef int (*releasebufferproc)(PyObject *view)``
+ This function is called (if defined by the exporting object)
+ when a view of memory previously acquired from the object is no
+ longer needed. It is up to the exporter of the API to make sure
+ all views have been released before re-allocating any previously
+ shared memory. It is up to consumers of the API to call this
+ function on the object whose view is obtained when it is no
+ longer needed. Any format string, shape array or strides array
+ returned through the interface should also not be referenced after
+ the releasebuffer call is made.
+ A -1 is returned on error and 0 on success.
+
+ Both of these routines are optional for a type object
+
+
New C-API calls are proposed
+============================
- int
- PyObject_CheckBuffer(PyObject *obj)
+::
- return 1 if the getbuffer function is available otherwise 0
+ int PyObject_CheckBuffer(PyObject *obj)
- PyObject *
- PyObject_GetBuffer(PyObject *obj, void **buf, Py_ssize_t *len,
- int *writeable, char **format, int *ndims,
- Py_ssize_t **shape, Py_ssize_t **strides)
+Return 1 if the getbuffer function is available otherwise 0.
- Get the buffer and optional information variables about the buffer.
- Return an object-specific view object (which may be simply a
- borrowed reference to the object itself).
-
- int
- PyObject_ReleaseBuffer(PyObject *view)
-
- call this function to tell obj that you are done with your "view"
- This doesn't do anything if the object doesn't implement a release function.
- Only call this after a previous PyObject_GetBuffer has succeeded and when
- you will not be needing or referring to the memory (or the format, shape,
- and strides memory used in the view -- if you will use these for a longer
- period of time make copies).
- Returns -1 on error.
-
- int PyObject_SizeFromFormat(char *)
- Return the implied itemsize of the data-format area from a struct-style
- description.
+::
+ PyObject * PyObject_GetBuffer(PyObject *obj, void **buf,
+ Py_ssize_t *len, int *writeable,
+ char **format, int *ndims,
+ Py_ssize_t **shape, Py_ssize_t **strides)
+Get the buffer and optional information variables about the buffer.
+Return an object-specific view object (which may be simply a
+borrowed reference to the object itself).
+
+::
+
+ int PyObject_ReleaseBuffer(PyObject *view)
+
+Call this function to tell obj that you are done with your "view"
+This doesn't do anything if the object doesn't implement a release function.
+Only call this after a previous PyObject_GetBuffer has succeeded and when
+you will not be needing or referring to the memory (or the format, shape,
+and strides memory used in the view -- if you will use these for a longer
+period of time make copies).
+Returns -1 on error.
+
+::
+
+ int PyObject_SizeFromFormat(char *)
+
+Return the implied itemsize of the data-format area from a struct-style
+description.
+
+
Additions to the struct string-syntax
+=====================================
- The struct string-syntax is missing some characters to fully
- implement data-format descriptions already available elsewhere (in
- ctypes and NumPy for example). Here are the proposed additions:
+The struct string-syntax is missing some characters to fully
+implement data-format descriptions already available elsewhere (in
+ctypes and NumPy for example). Here are the proposed additions:
- Character Description
- =============================================================
- 't' bit (number before states how many bits)
- '?' platform _Bool type
- 'g' long double
- 'Z' complex (whatever the next specifier is)
- 'c' ucs-1 (latin-1) encoding
- 'u' ucs-2
- 'w' ucs-4
- 'O' pointer to Python Object
- 'T{}' structure (detailed layout inside {})
- '(k1,k2,...,kn)' multi-dimensional array of whatever follows
- ':name:' optional name of the preceeding element
- '&' specific pointer (prefix before another charater)
- 'X{}' pointer to a function (optional function
- signature inside {})
- ' ' ignored (allow better readability)
- The struct module will be changed to understand these as well and
- return appropriate Python objects on unpacking. Un-packing a
- long-double will return a c-types long_double. Unpacking 'u' or
- 'w' will return Python unicode. Unpacking a multi-dimensional
- array will return a list of lists. Un-packing a pointer will
- return a ctypes pointer object. Un-packing a bit will return a
- Python Bool. Spaces in the struct-string syntax will be ignored.
+================ ===========
+Character Description
+================ ===========
+'t' bit (number before states how many bits)
+'?' platform _Bool type
+'g' long double
+'Z' complex (whatever the next specifier is)
+'c' ucs-1 (latin-1) encoding
+'u' ucs-2
+'w' ucs-4
+'O' pointer to Python Object
+'T{}' structure (detailed layout inside {})
+'(k1,k2,...,kn)' multi-dimensional array of whatever follows
+':name:' optional name of the preceeding element
+'&' specific pointer (prefix before another charater)
+'X{}' pointer to a function (optional function
+ signature inside {})
+' ' ignored (allow better readability)
+================ ===========
- Endian-specification ('=','>','<') is also allowed inside the
- string so that it can change if needed. The previously-specified
- endian string is enforce until changed. The default endian is '='.
+The struct module will be changed to understand these as well and
+return appropriate Python objects on unpacking. Un-packing a
+long-double will return a c-types long_double. Unpacking 'u' or
+'w' will return Python unicode. Unpacking a multi-dimensional
+array will return a list of lists. Un-packing a pointer will
+return a ctypes pointer object. Un-packing a bit will return a
+Python Bool. Spaces in the struct-string syntax will be ignored.
- According to the struct-module, a number can preceed a character
- code to specify how many of that type there are. The
- (k1,k2,...,kn) extension also allows specifying if the data is
- supposed to be viewed as a (C-style contiguous, last-dimension
- varies the fastest) multi-dimensional array of a particular format.
+Endian-specification ('=','>','<') is also allowed inside the
+string so that it can change if needed. The previously-specified
+endian string is enforce until changed. The default endian is '='.
- Functions should be added to ctypes to create a ctypes object from
- a struct description, and add long-double, and ucs-2 to ctypes.
+According to the struct-module, a number can preceed a character
+code to specify how many of that type there are. The
+(k1,k2,...,kn) extension also allows specifying if the data is
+supposed to be viewed as a (C-style contiguous, last-dimension
+varies the fastest) multi-dimensional array of a particular format.
+Functions should be added to ctypes to create a ctypes object from
+a struct description, and add long-double, and ucs-2 to ctypes.
+
Examples of Data-Format Descriptions
+====================================
- Here are some examples of C-structures and how they would be
- represented using the struct-style syntax:
+Here are some examples of C-structures and how they would be
+represented using the struct-style syntax:
- float
- 'f'
- complex double
- 'Zd'
- RGB Pixel data
- 'BBB' or 'B:r: B:g: B:b:'
- Mixed endian (weird but possible)
- '>i:big: <i:little:'
- Nested structure
- struct {
+float
+ 'f'
+complex double
+ 'Zd'
+RGB Pixel data
+ 'BBB' or 'B:r: B:g: B:b:'
+Mixed endian (weird but possible)
+ '>i:big: <i:little:'
+Nested structure
+ ::
+
+ struct {
int ival;
struct {
unsigned short sval;
unsigned char bval;
unsigned char cval;
} sub;
- }
- 'i:ival: T{H:sval: B:bval: B:cval:}:sub:'
- Nested array
- struct {
+ }
+ 'i:ival: T{H:sval: B:bval: B:cval:}:sub:'
+Nested array
+ ::
+
+ struct {
int ival;
double data[16*4];
- }
- 'i:ival: (16,4)d:data:'
+ }
+ 'i:ival: (16,4)d:data:'
Code to be affected
+===================
- All objects and modules in Python that export or consume the old
- buffer interface will be modified. Here is a partial list.
-
- * buffer object
- * bytes object
- * string object
- * array module
- * struct module
- * mmap module
- * ctypes module
+All objects and modules in Python that export or consume the old
+buffer interface will be modified. Here is a partial list.
- anything else using the buffer API
+* buffer object
+* bytes object
+* string object
+* array module
+* struct module
+* mmap module
+* ctypes module
+Anything else using the buffer API
Issues and Details
+==================
+The proposed locking mechanism relies entirely on the objects
+implementing the buffer interface to do their own thing. Ideally
+an object that implements the buffer interface should keep at least
+a number indicating how many releases are extant. If there are views
+to a memory location, then any subsequent reallocation should fail and raise
+an error.
- The proposed locking mechanism relies entirely on the objects
- implementing the buffer interface to do their own thing. Ideally
- an object that implements the buffer interface should keep at least
- a number indicating how many releases are extant. If there are views
- to a memory location, then any subsequent reallocation should fail and raise
- an error.
+The sharing of strided memory is new and can be seen as a
+modification of the multiple-segment interface. It is motivated by
+NumPy. NumPy objects should be able to share their strided memory
+with code that understands how to manage strided memory because
+strided memory is very common when interfacing with compute libraries.
- The sharing of strided memory is new and can be seen as a
- modification of the multiple-segment interface. It is motivated by
- NumPy. NumPy objects should be able to share their strided memory
- with code that understands how to manage strided memory because
- strided memory is very common when interfacing with compute libraries.
+Currently the struct module does not allow specification of nested
+structures. It seems like specifying a nested structure should be
+specified as several ways of viewing memory areas (e.g. ctypes and
+NumPy) already allow this.
- Currently the struct module does not allow specification of nested
- structures. It seems like specifying a nested structure should be
- specified as several ways of viewing memory areas (e.g. ctypes and
- NumPy) already allow this.
+Memory management of the format string and the shape and strides
+array is always the responsibility of the exporting object and can
+be shared between different views. If the consuming object needs to
+keep these memory areas longer than the view is held, then it must
+copy them to its own memory.
- Memory management of the format string and the shape and strides
- array is always the responsibility of the exporting object and can
- be shared between different views. If the consuming object needs to
- keep these memory areas longer than the view is held, then it must
- copy them to its own memory.
-
Copyright
+=========
- This PEP is placed in the public domain
+This PEP is placed in the public domain
More information about the Numpy-svn
mailing list