[Python-3000] Making more effective use of slice objects in Py3k

Tue Aug 29 21:55:21 CEST 2006

On 8/29/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> "Guido van Rossum" <guido at python.org> wrote:
> > For operations that may be forced to return a new string (e.g.
> > concatenation) I think the return value should always be a new string,
> > even if it could be optimized. So for example if v is a view and s is
> > a string, v+s should always return a new string, even if s is empty.
>
> I'm on the fence about this.  On the one hand, I understand the
> desireability of being able to get the underlying string object without
> difficulty.  On the other hand, its performance characteristics could be
> confusing to users of Python who may have come to expect that "st+''" is
> a constant time operation, regardless of the length of st.

Well views aren't strings. And s+t (for s and t strings) normally
takes O(len(s)+len(t)) time.

The type consistency and predictability is more important to me.

I didn't mean to recommend v+"" as the best way to turn a view v into
a string; that would be str(v).

> The non-null string addition case, I agree that it could make some sense
> to return the string (considering you will need to copy it anyways), but
> if one returned a view on that string, it would be more consistant with
> other methods, and getting the string back via str(view) would offer
> equivalent functionality.  It would also require the user to be explicit
> about what they really want; though there is the argument that if I'm
> passing a string as an operand to addition with a view, I actually want
> a string, so give me one.

I strongly believe you're mistaken here. I don't think users will hvae
any trouble with the concept "operations that don't (necessarily)
return a substring will return a new string.

> I'm going to implement it as returning a view, but leave commented
> sections for some of them to return a string.
>
> > BTW beware that in py3k, strings (which will always be unicode
> > strings) won't support the buffer API -- bytes objects will. Would you
> > want views on strings or ob bytes or on both?
>
> That's tricky.  Views on bytes will come for free, like array, mmap, and
> anything else that supports the buffer protocol. It requires the removal
> of the __hash__ method for mutables, but that is certainly expected.

The question is, how useful is the buffer protocol going to be? We
don't know yet.

> Right now, a large portion of standard library code use strings and
> string methods to handle parsing, etc.  Removing immutable byte strings
> from 3.x seems likely to result in a huge amount of rewriting necessary
> to utilize either bytes or text (something I have mentioned before).  I
> believe that with views on bytes (and/or sufficient bytes methods), the
> vast majority would likely result in the use of bytes.

Um, unless you consider decoding a GIF file "parsing", parsing would
seem to naturally fall in the realm of text (characters), not bytes.

> Having a text view for such situtions that works with the same kinds of
> semantics as the bytes view would be nice from a purity/convenience
> standpoint, and only needing to handle a single data type (text) could
> make its implementation easier.  I don't have any short-term plans of
> writing text views, but it may be somewhat easier to do after I'm done
> with string/byte views.

Unifying the semantics between byte views and text views will be
difficult since bytes are mutable.

I recommend that you have a good look at the bytes implementation in
the p3yk branch.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)