Python's buffer interface

Robin Boerdijk robin.boerdijk at nl.origin-it.com
Sat Oct 16 07:52:02 EDT 1999


Randall Hopper <aa8vb at yahoo.com> wrote in message
news:19991015113444.A592708 at vislab.epa.gov...
>      From a commentary I read, it looks very useful.  I'd like to learn
> more about the it -- in particular, how to write conforming C interfaces.
>
>      However, I didn't find any mention of this in the docs.  Did I miss
> something?  Or have the docs not caught up with it.

>From previous discussions on this newsgroup, it appears that the buffer
interface is rather controversial and that may be the reason why it hasn't
been properly documented yet. My impression is that part of the
controversion comes from the confusion between the buffer *interface* and
buffer *objects*.

Let me explain what I have understood of the buffer interface sofar and then
show its relation to buffer objects. After reading this, you will see that
using the buffer interface and an alternative implementation of buffer
objects, you could move data around in a very fast and memory friendly
manner. For example, if the buffer object were implemented as a "writable
string" (as I will explain later), you could implement part of a Web server
like this:

buf = buffer(1024) # 1K buffer
len = HtmlFile.readinto(buf)
while len:
    if len == 1024:
        HttpSocket.write(buf)
    else:
        HttpSocket.write(buf[0:len])
    len = HtmlFile.readinto(buf)

Regardless of the size of the HtmlFile, it will typically take only two
small memory allocations to copy the entire HtmlFile to the HttpSocket.

The buffer interface
--------------------

The buffer interface is a C level interface that has an entry in the
PyTypeObject structure of every object. As an example, here is the
PyTypeObject for strings:

PyTypeObject PyString_Type = {
  PyObject_HEAD_INIT(&PyType_Type)
  0,
  "string",
  sizeof(PyStringObject),
  sizeof(char),
  (destructor)string_dealloc, /*tp_dealloc*/
  (printfunc)string_print, /*tp_print*/
  0,  /*tp_getattr*/
  0,  /*tp_setattr*/
  (cmpfunc)string_compare, /*tp_compare*/
  (reprfunc)string_repr, /*tp_repr*/
  0,  /*tp_as_number*/
  &string_as_sequence, /*tp_as_sequence*/
  0,  /*tp_as_mapping*/
  (hashfunc)string_hash, /*tp_hash*/
  0,  /*tp_call*/
  0,  /*tp_str*/
  0,  /*tp_getattro*/
  0,  /*tp_setattro*/

/** Here is the buffer interface entry **/

  &string_as_buffer, /*tp_as_buffer*/

  Py_TPFLAGS_DEFAULT, /*tp_flags*/
  0,  /*tp_doc*/
};

The buffer interface entry is a pointer to a structure of four functions:

static PyBufferProcs string_as_buffer = {
  (getreadbufferproc)string_buffer_getreadbuf,
  (getwritebufferproc)string_buffer_getwritebuf,
  (getsegcountproc)string_buffer_getsegcount,
  (getcharbufferproc)string_buffer_getcharbuf,
};

These functions can be implemented to provide direct but controlled access
to the internal memory buffer(s) of the object.

* The getsegcountproc function should return the number of internal buffer
segments of the object. In the case of a string object (and probably most
other object as well), this is just 1.

* The getreadbufferproc function should return a pointer and the size of a
particular buffer segment from which data can be read. In case of a string
object, this function returns a pointer to and the length of the string.

* The getwritebufferproc function should return a pointer and the size of a
particual buffer segment into which data can be written. In case of a string
object, this function raises an exception because string objects are
supposed to be immutable.

* I am not sure in which cases the getcharbufferproc should be implemented
and what it should do then (maybe someone else can explain).

The purpose of the buffer interface is to allow other objects to directly
access the internal data buffer of objects. For example, the write method of
a file can directly write the data of any object that has a buffer interface
to a file.

static PyObject* file_write(f, args)
  PyFileObject *f;
  PyObject *args;
{
  char *s;
  int n;
  if (!PyArg_Parse(args, "s#", &s, &n))
   return NULL;
  fwrite(s, 1, n, f->f_fp);
  Py_INCREF(Py_None);
  return Py_None;
}

Although it seems like the argument must be a string, PyArg_Parse(args,
"s#", &s, &n) will actually check the tp_as_buffer entry of the argument. If
this entry is not NULL, then s and n are set through a call to the
getreadbufferproc of this entry. So, not only strings, but any object that
has a "readable" buffer interface can be written to a file using this single
write method.

Similarly, the readinto() method of a file can directly read data from a
file into an object that has a buffer interface.

static PyObject file_readinto(f, args)
 PyFileObject *f;
 PyObject *args;
{
 char *ptr;
 int ntodo, ndone;
 if (!PyArg_Parse(args, "w#", &ptr, &ntodo))
   return NULL;
 ndone = fread(ptr, 1, ntodo, f->f_fp);
 return PyInt_FromLong(ndone);
}

In this case, the PyArg_Parse(args, "w#", &ptr, &ntodo) checks the
tp_as_buffer entry of the argument. If this entry is not NULL, then ptr and
ntodo are set through a call to the getwritebufferproc of this entry. So,
any object that has a "writable" buffer interface can be written to a file
using this single readinto method.

A concrete example of an object that exposes both a "readable" and a
"writable" buffer interface can be found here:
http://www.sis.nl/python/xstruct/xstruct.html.

The buffer object
-----------------

If I understand correctly, the buffer object is not tightly related to the
buffer interface.  Let me explain the buffer object by means of your own
example:

>      Also, a bit of puzzlement after trying a brief test (adapted from a
> code snippet in the list archives):
>
>      Do buffer objects not have any methods?
>
>      ----------------------------------------------------------------
>      >>> import array
>      >>> a=array.array("b", range(ord('A'), ord('G')))
>      >>> print a
>      array('b', [65, 66, 67, 68, 69, 70])
>      >>> b = buffer(a)
>      >>> print b
>      ABCDEF
>      >>> dir(b)
>      []
>      ----------------------------------------------------------------

Do you have any idea what is going on here? Well, it beats me. If you look
at the C sources, then a buffer object is an object that maintains a pointer
to a part of the memory buffer of another object. I don't know the exact use
of this is but I do know that it is very dangerous because that other object
may be destroyed without the buffer object knowing it.

I would like to see another implementation of a buffer object that acts as a
writable string. In essence, it is the same as a regular Python string,
except that it is mutable. It would also expose a writeable buffer
interface. I think this kind of object is what people would typically refer
to as a "buffer" anyway. See the example at the top of this post for an
example of how I would use it.

Robin.









More information about the Python-list mailing list