[Python-Dev] buffer interface considered harmful

Jim Fulton jim@digicool.com
Mon, 16 Aug 1999 13:38:22 -0400


David Ascher wrote:
> 
> On Mon, 16 Aug 1999, Jim Fulton wrote:
> 
> >> [regexps on gigabyte files]
> >
> > This seems reasonable, if a bit exotic. :)
> 
> In the bioinformatics world, I think it's everyday stuff.

Right, in some (exotic ;) domains it's not exotic at all. 

> > Why is this a good thing? Why should extension module writes worry
> > abot the non-contiguous nature of the data now?  Does the NumPy C API
> > somehow expose this now?  Will multi-segment buffers make it go away
> > somehow?
> 
> A NumPy extension module writer needs to create and modify NumPy arrays.
> These arrays may be non-contiguous (if e.g. they are the result of
> slicing).  The NumPy C API exposes the non-contiguous nature, but it's
> hard enough to deal with it that I suspect most extension writers require
> contiguous arrays, which means unnecessary copies.

Hm. This sounds like an API problem to me.

> Multi-segment buffers won't make the API go away necessarily (backwards
> compatibility and all that), but it could make it unnecessary for many
> extension writers.

Multi-segment buffers don't make the mult-segmented nature of the
memory go away. Do they really simplify the API that much?

They seem to strip away an awful lot of information hiding.
 
> > > * If NumPy was modified to have arrays with data stored in buffer objects
> > >   as opposed to the current "char *", and if PIL was modified to have
> > >   images stored in buffer objects as opposed to whatever it uses, one
> > >   could have arrays and images which shared data.
> >
> > Uh, and this would be a good thing? Maybe PIL should just be modified
> > to use NumPy arrays.
> 
> Why?  PIL was designed for image processing, and made design decisions
> appropriate to that domain.  NumPy was designed for multidimensional
> numeric array processing, and made design decisions appropriate to that
> domain. The intersection of interests exists (e.g. in the medical imaging
> world), and I know people who spend a lot of their CPU time moving data
> between images and arrays with "stupid" tostring/fromstring operations.
> Given the size of the images, it's a prodigious waste of time, and kills
> the use of Python in many a project.

It seems to me that NumPy is sufficiently broad enogh to encompass
image processing.

My main concern is having two systems rely on some low-level "shared
memory" mechanism to achiev effiecient communication.
 
> > Perhaps, although Guido knows how they'd find out about them. ;)
> 
> Uh?  These issues have been discussed in the NumPy/PIL world for a while,
> with no solution in sight.  Recently, I and others saw mentions of buffers
> in the source, and they seemed like a reasonable approach, which could be
> done w/o a rewrite of either PIL or NumPy.

My point was that people would be lucky to find out about buffers or
about how to use them as things stand.

> Don't get me wrong -- I'm all for better documentation of the buffer
> stuff, design guidelines, warnings and protocols.  I stated as much on
> June 15:
> 
>   http://www.python.org/pipermail/python-dev/1999-June/000338.html

Yes, that was quite a jihad you launched. ;)

Jim

--
Jim Fulton           mailto:jim@digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.