[Python-Dev] buffer interface considered harmful

David Ascher da@ski.org
Mon, 16 Aug 1999 09:45:47 -0700 (Pacific Daylight Time)


On Mon, 16 Aug 1999, Jim Fulton wrote:

> > Second, there's the extension the the buffer interface as of 1.5.2. This is
> > again only available in C, and it allows C programmers to get an object _as an
> > ASCII string_. This is meant for things like regexp modules, to access any
> > "textual" object as an ASCII string. This is the getcharbuffer interface, and
> > bound to the "t#" specifier in PyArg_ParseTuple.
> 
> Hm. So this is making a little more sense. So, there is a notion that
> there are "textual" objects that want to provide a method for getting
> their "text". How does this text differ from what you get from __str__
> or __repr__?  

I'll let others give a well thought out rationale.  Here are some examples
of use which I think worthwile:

* Consider an mmap()'ed file, twelve gigabytes long.  Making mmapfile
  objects fit this aspect of the buffer interface allows you to do regexp
  searches on it w/o ever building a twelve gigabyte PyString.

* Consider a non-contiguous NumPy array.  If the array type supported the
  multi-segment buffer interface, extension module writers could
  manipulate the data within this array w/o having to worry about the
  non-contiguous nature of the data.  They'd still have to worry about
  the multi-byte nature of the data, but it's still a win.  In other
  words, I think that the buffer interface could be useful even w/
  non-textual data.  

* If NumPy was modified to have arrays with data stored in buffer objects
  as opposed to the current "char *", and if PIL was modified to have
  images stored in buffer objects as opposed to whatever it uses, one
  could have arrays and images which shared data.  

I think all of these provide examples of motivations which are appealing
to at least some Python users. I make no claim that they motivate the
specific interface.  In all the cases I can think of, one or both of two
features are the key asset:

  - access to subset of huge data regions w/o creation of huge temporary
    variables.

  - sharing of data space.

Yes, it's a power tool, and as a such should come with safety goggles.
But then again, the same is true for ExtensionClasses =).

leaving-out-the-regexp-on-NumPy-arrays-example, 

   --david

PS: I take back the implicit suggestion that buffer() return read-write
    buffers when possible.