[Python-Dev] buffer interface considered harmful
David Ascher
da@ski.org
Mon, 16 Aug 1999 09:45:47 -0700 (Pacific Daylight Time)
On Mon, 16 Aug 1999, Jim Fulton wrote:
> > Second, there's the extension the the buffer interface as of 1.5.2. This is
> > again only available in C, and it allows C programmers to get an object _as an
> > ASCII string_. This is meant for things like regexp modules, to access any
> > "textual" object as an ASCII string. This is the getcharbuffer interface, and
> > bound to the "t#" specifier in PyArg_ParseTuple.
>
> Hm. So this is making a little more sense. So, there is a notion that
> there are "textual" objects that want to provide a method for getting
> their "text". How does this text differ from what you get from __str__
> or __repr__?
I'll let others give a well thought out rationale. Here are some examples
of use which I think worthwile:
* Consider an mmap()'ed file, twelve gigabytes long. Making mmapfile
objects fit this aspect of the buffer interface allows you to do regexp
searches on it w/o ever building a twelve gigabyte PyString.
* Consider a non-contiguous NumPy array. If the array type supported the
multi-segment buffer interface, extension module writers could
manipulate the data within this array w/o having to worry about the
non-contiguous nature of the data. They'd still have to worry about
the multi-byte nature of the data, but it's still a win. In other
words, I think that the buffer interface could be useful even w/
non-textual data.
* If NumPy was modified to have arrays with data stored in buffer objects
as opposed to the current "char *", and if PIL was modified to have
images stored in buffer objects as opposed to whatever it uses, one
could have arrays and images which shared data.
I think all of these provide examples of motivations which are appealing
to at least some Python users. I make no claim that they motivate the
specific interface. In all the cases I can think of, one or both of two
features are the key asset:
- access to subset of huge data regions w/o creation of huge temporary
variables.
- sharing of data space.
Yes, it's a power tool, and as a such should come with safety goggles.
But then again, the same is true for ExtensionClasses =).
leaving-out-the-regexp-on-NumPy-arrays-example,
--david
PS: I take back the implicit suggestion that buffer() return read-write
buffers when possible.