[Python-Dev] buffer interface considered harmful

David Ascher da@ski.org
Mon, 16 Aug 1999 10:18:46 -0700 (Pacific Daylight Time)


On Mon, 16 Aug 1999, Jim Fulton wrote:

>> [regexps on gigabyte files]
>
> This seems reasonable, if a bit exotic. :)

In the bioinformatics world, I think it's everyday stuff.

> Why is this a good thing? Why should extension module writes worry
> abot the non-contiguous nature of the data now?  Does the NumPy C API
> somehow expose this now?  Will multi-segment buffers make it go away
> somehow?

A NumPy extension module writer needs to create and modify NumPy arrays.
These arrays may be non-contiguous (if e.g. they are the result of
slicing).  The NumPy C API exposes the non-contiguous nature, but it's
hard enough to deal with it that I suspect most extension writers require
contiguous arrays, which means unnecessary copies.

Multi-segment buffers won't make the API go away necessarily (backwards
compatibility and all that), but it could make it unnecessary for many
extension writers.

> > * If NumPy was modified to have arrays with data stored in buffer objects
> >   as opposed to the current "char *", and if PIL was modified to have
> >   images stored in buffer objects as opposed to whatever it uses, one
> >   could have arrays and images which shared data.
> 
> Uh, and this would be a good thing? Maybe PIL should just be modified
> to use NumPy arrays.

Why?  PIL was designed for image processing, and made design decisions
appropriate to that domain.  NumPy was designed for multidimensional
numeric array processing, and made design decisions appropriate to that
domain. The intersection of interests exists (e.g. in the medical imaging
world), and I know people who spend a lot of their CPU time moving data
between images and arrays with "stupid" tostring/fromstring operations.  
Given the size of the images, it's a prodigious waste of time, and kills
the use of Python in many a project.

> Perhaps, although Guido knows how they'd find out about them. ;)

Uh?  These issues have been discussed in the NumPy/PIL world for a while,
with no solution in sight.  Recently, I and others saw mentions of buffers
in the source, and they seemed like a reasonable approach, which could be
done w/o a rewrite of either PIL or NumPy.  

Don't get me wrong -- I'm all for better documentation of the buffer
stuff, design guidelines, warnings and protocols.  I stated as much on
June 15:

  http://www.python.org/pipermail/python-dev/1999-June/000338.html


--david