[Python-Dev] strop vs. string

Tim Peters tim.one@home.com
Tue, 5 Jun 2001 01:18:50 -0400


[Paul Barrett]
> From the discussion so far, it appears that the buffer object is
> intended solely to support string-like objects.

Unsure where that impression came from.  Since buffers wrap a slice "of
memory", they don't make much sense except where raw memory makes sense.
That includes the guts of strings, but also (in the core distribution)
memory-mapped files (the mmap module) and arrays (the array module), which
also support the buffer interface.

> I've seen no mention of their use for binary data objects,

I mentioned two above.  The use of buffers with mutable objects is
dangerous, though, because of the dangling-pointer problem, and Python
itself never uses buffers except for strings.  Even arrays are stretching
it; e.g.,

>>> import array
>>> a = array.array('i')
>>> a.append(2)
>>> a.append(3)
>>> a
array('i', [2, 3])
>>> b = buffer(a)
>>> len(b)
8
>>> [b[i] for i in range(len(b))]
['\x02', '\x00', '\x00', '\x00', '\x03', '\x00', '\x00', '\x00']
>>>

While of *some* conceivable use, that's not exactly destined to become
wildly popular <wink>.

> such as multidimensional arrays and matrices.

Since core Python has no such things, of course it doesn't use buffers for
those either.

> Will the buffer object also support these objects?

In what sense?  If you have an implementation of such things, and believe
that getting at raw memory slices is useful, sure -- fill in its
tp_as_buffer slot.

> ...
> On the otherhand, if yes, then I think the buffer C/API needs to be
> reimplemented,

Or do you mean redesigned?

> because the current design/implementation falls far short of what I
> would expect for a buffer object.  First, it is overly complex: the
> support for multiple buffers does not appear necessary.

AFACT it's entirely unused; everything in the core that supports the buffer
interface returns a segment count of 1, and the buffer object itself appears
to raise exceptions whenever it sees a reference to a segment other than
"the first".  I don't know why it's there.

> Second, the dangling pointer issue has not been resolved.

I expect Greg will fix that now.

> I suggest the addition of lock flag which indicates that the data is
> currently inaccessible, ie. that data and/or data pointer is in the
> process of being modified.

To sell that (but please save it for the PEP <wink>) I expect you have to
provide some compelling uses for it.  The current uses have no need of it.
In the absence of specific good uses, I'm afraid it just sounds like another
variant of "I can't prove segments *won't* be useful, so let's toss them in
too!".

> I would suggest the following structure to be much more useful for
> char and binary data:
>
> typedef struct {
>     char* rf_pointer;
>     int   rf_length;
>     int   rf_access;  /* read, write, etc.  */
>     int   rf_lock;    /* data is in use  */
>     int   rf_flags;   /* type of data; char, binary, unicode, etc.  */
> } PyBufferProcs;
>
> But I'm guessing my proposal is way off base.

Depends on what you want to do.  You've only mentioned multidimensional
arrays, and the need for umpteen flavors of access control there, beyond the
current object's b_readonly flag, is simply unclear.

Also unclear why you've dropped the current object's b_base pointer:
without it, the buffer has no way to get back to the object from which the
memory is borrowed, nor even a guarantee that the object won't die while the
buffer is still active.

If you do pursue this, please please please boost the rf_length field!  An
int is too small to hold real-life sizes anymore, and "large files" are
becoming common even on 32-bit boxes.  Python needs to grow a wholly
supported way to pass 8-byte ints around (and it looks like I'll be adding
that to the struct module, possibly to the array module and marshal too).

> If I find some time, I'll prepare a PEP to air these issues, since
> they are very important to those of us working on and with
> multidimensional arrays. We find the current buffer API lacking.

A PEP is always a good idea.