[Python-3000] Making more effective use of slice objects in Py3k

Tue Aug 29 23:27:19 CEST 2006

"Guido van Rossum" <guido at python.org> wrote:
> On 8/29/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> > "Guido van Rossum" <guido at python.org> wrote:
> > > For operations that may be forced to return a new string (e.g.
> > > concatenation) I think the return value should always be a new string,
> > > even if it could be optimized. So for example if v is a view and s is
> > > a string, v+s should always return a new string, even if s is empty.
> >
> > I'm on the fence about this.  On the one hand, I understand the
> > desireability of being able to get the underlying string object without
> > difficulty.  On the other hand, its performance characteristics could be
> > confusing to users of Python who may have come to expect that "st+''" is
> > a constant time operation, regardless of the length of st.
> 
> Well views aren't strings. And s+t (for s and t strings) normally
> takes O(len(s)+len(t)) time.

Right, but my hope is for users who want to use views to start using
them and be able to not be surprised by what they get back.  You have
previously stated that changing return types based on a flag variable is
a horrible idea.  I agree, as providing a flag variable to change return
types is surprising.  This is changing return types based on variable
type, which could be argued as an implicit flag variable, and perhaps
subject to the same surprising behavior == bad criteria that has stopped
other such suggestions in the past.

> The type consistency and predictability is more important to me.

Is view + <anything that supports the buffer protocol> -> view not
consistant or predictable?

> > The non-null string addition case, I agree that it could make some sense
> > to return the string (considering you will need to copy it anyways), but
> > if one returned a view on that string, it would be more consistant with
> > other methods, and getting the string back via str(view) would offer
> > equivalent functionality.  It would also require the user to be explicit
> > about what they really want; though there is the argument that if I'm
> > passing a string as an operand to addition with a view, I actually want
> > a string, so give me one.
> 
> I strongly believe you're mistaken here. I don't think users will hvae
> any trouble with the concept "operations that don't (necessarily)
> return a substring will return a new string.

I could certainly be, but offering both isn't difficult.

> > I'm going to implement it as returning a view, but leave commented
> > sections for some of them to return a string.
> >
> > > BTW beware that in py3k, strings (which will always be unicode
> > > strings) won't support the buffer API -- bytes objects will. Would you
> > > want views on strings or ob bytes or on both?
> >
> > That's tricky.  Views on bytes will come for free, like array, mmap, and
> > anything else that supports the buffer protocol. It requires the removal
> > of the __hash__ method for mutables, but that is certainly expected.
> 
> The question is, how useful is the buffer protocol going to be? We
> don't know yet.

Pretty useful apparently, bytes support decoding to unicode through the
use of its own buffer interface, or really, it uses the decode machinery
that takes a char* and length.

On the other hand, CharBuffer (as opposed to ReadBuffer and
WriteBuffer[1]) isn't really usable, as the reader has no idea about the
*size* and *type* of the characters it is getting back (8, 16, or 32 bit
integers or characters, even 16, 32, or 64 bit floats, etc.). Maybe
fixing CharBuffer, or creating a different interface (deprecating
CharBuffer) would make sense, and would offer the numarray folks their
'array interface'.

> > Right now, a large portion of standard library code use strings and
> > string methods to handle parsing, etc.  Removing immutable byte strings
> > from 3.x seems likely to result in a huge amount of rewriting necessary
> > to utilize either bytes or text (something I have mentioned before).  I
> > believe that with views on bytes (and/or sufficient bytes methods), the
> > vast majority would likely result in the use of bytes.
> 
> Um, unless you consider decoding a GIF file "parsing", parsing would
> seem to naturally fall in the realm of text (characters), not bytes.

I'm using my own definition of parsing again, I apologize.  What I meant
by parsing is anything that currently performs processing of Python 2.x
strings to determine what it is supposed to do.  From http header
processing (sending and recieving), email processing, socket protocols
in smtplib, poplib, asynchat, etc.  All currently use Python 2.x strings.
They will need to be transitioned to 3.x if 2.x byte strings are removed,
and that transition will be quite a bit of work, regardless of whether
bytes get some string methods, or we wrap bytes to provide string
methods, but significantly more if neither is done.

> > Having a text view for such situtions that works with the same kinds of
> > semantics as the bytes view would be nice from a purity/convenience
> > standpoint, and only needing to handle a single data type (text) could
> > make its implementation easier.  I don't have any short-term plans of
> > writing text views, but it may be somewhat easier to do after I'm done
> > with string/byte views.
> 
> Unifying the semantics between byte views and text views will be
> difficult since bytes are mutable.

The only significant nit is that the location of the underlying buffer
pointer changes with byte views.  This is already handled in a generally
satisfactory way in 2.x buffers.

> I recommend that you have a good look at the bytes implementation in
> the p3yk branch.

It is implemented the way I would have expected.

 - Josiah

[1] http://www.python.org/doc/current/api/abstract-buffer.html