[Python-Dev] strop vs. string
Paul Barrett
Barrett@stsci.edu
Mon, 04 Jun 2001 09:22:14 -0400
"M.-A. Lemburg" wrote:
>
> Tim Peters wrote:
> >
> > [Tim]
> > > About combining strop and buffers and strings, don't forget
> > > unicodeobject.c: that's got oodles of basically duplicate code too.
> > > /F suggested dealing with the minor differences via maintaining one
> > > code file that gets compiled multiple times w/ appropriate #defines.
> >
> > [MAL]
> > > Hmm, that only saves us a few kB in source, but certainly not
> > > in the object files.
> >
> > That's not the point. Manually duplicated code blocks always get out of
> > synch, as people fix bugs in, or enhance, one of them but don't even know
> > about the others. /F brought this up after I pissed away a few hours trying
> > to repair one of these in all places, and he noted that strop.replace() and
> > string.replace() are woefully inefficient anyway.
>
> Ok, so what we'd need is a bunch of generic low-level string
> operations: one set for 8-bit and one for 16-bit code.
>
> Looking at unicodeobject.c it seems that the section "Helpers" would
> be a good start, plus perhaps a few bits from the method implementations
> refactored to form a low-level string template library.
>
> Perhaps we should move this code into
> a file stringhelpers.h which then gets included by stringobject.c
> and unicodeobject.c with appropriate #defines set up for
> 8-bit strings and for Unicode.
>
> > > The better idea would be making the types subclass from a generic
> > > abstract string object -- I just don't know how this will be
> > > possible with Guido's type patches. We'll just have to wait,
> > > I guess.
>From the discussion so far, it appears that the buffer object is
intended solely to support string-like objects. I've seen no mention
of their use for binary data objects, such as multidimensional arrays
and matrices. Will the buffer object also support these objects? If
no, then I suggest it be renamed to one that is less generic and more
descriptive.
On the otherhand, if yes, then I think the buffer C/API needs to be
reimplemented, because the current design/implementation falls far
short of what I would expect for a buffer object. First, it is overly
complex: the support for multiple buffers does not appear necessary.
Second, the dangling pointer issue has not been resolved. I suggest
the addition of lock flag which indicates that the data is currently
inaccessible, ie. that data and/or data pointer is in the process of
being modified.
I would suggest the following structure to be much more useful for
char and binary data:
typedef struct {
char* rf_pointer;
int rf_length;
int rf_access; /* read, write, etc. */
int rf_lock; /* data is in use */
int rf_flags; /* type of data; char, binary, unicode, etc. */
} PyBufferProcs;
But I'm guessing my proposal is way off base.
If I find some time, I'll prepare a PEP to air these issues, since
they are very important to those of us working on and with
multidimensional arrays. We find the current buffer API lacking.
--
Paul Barrett, PhD Space Telescope Science Institute
Phone: 410-338-4475 ESS/Science Software Group
FAX: 410-338-4767 Baltimore, MD 21218