[Python-3000] Future of slices

Alexander Belopolsky alexander.belopolsky at gmail.com
Mon May 1 19:47:22 CEST 2006


Nick Coghlan <ncoghlan <at> gmail.com> writes:

> 
> Short version: +1 for the first point (but for different reasons), -1 for the 
> rest. Use cases for advanced slicing operations are not provided by the 
> standard library, but by Numpy's sophisticated data manipulation capabilities.
> 
I am glad you mentioned Numpy, because my post was mostly motivated by my Numpy
experiences. Numpy's integration into the standard Python library was on the
table for many years (PEP-209), but for reasons that I don't completely
understand that proposal was never accepted.  As I  said, I don't know the real
reasons for rejection, but in my view the problem with adding numpy to the
standard library is that in many aspects numpy is not a package, but a different
language (now complete with its own scalars and arithmetic rules that make
1/0 = 0!).  Numpy is a perfect language for scientific computing, borrowing more
from APL than from python, but I would rather see Py3K providing ways to
implement scientific libraries without becoming an APL-like language.

> Alexander Belopolsky wrote:
> <get rid of ...>> 1. l[:] syntax for shallow copy.
> 
> I kind of agree with this one, mainly because I'd like standard library data 
> types to return views for slicing operations. Making a copy based on a view is 
> as easy as wrapping the view in a call to the appropriate constructor. 
> Avoiding the memory impact of multiple slicing operations that copy data 
> around is much harder.
> 
> Returning views rather than copies would also eliminate some of the use cases 
> for islice().
> 

I understand that Numpy's implementation of views was not acceptable because
Python lists often relocate their storage.  Maybe Py3K views will provide the
way to solve that problem.


> > 2. Overloading of [] syntax. The primary meaning of c[i] is: get the  
> > i-th item from the collection. This meaning is consistent between  
> > lists/tuples and dicts.  The only difference is that  i may not be an  
> > integer in the case of dict.
> 
> The c[x] syntax isn't really overloaded - it always means "ask the container c 
> for the item corresponding to subscript x"
> 

I disagree. If type(x) is slice, then in c[x] notation (1) x is not a
subscript and (2) the value of c[x] is not an item.

(1) Depending on what you mean by "subscript," your statement is either a
tautology or is incorrect.  If subscript == whatever appears between square
brackets, than x is a subscript by definition, but in my view subscript is
a typographical term referring in the present context to the tradition of
denoting vector components by adding a subscript to the name of the vector.
I am not familiar with any scientific notation for slices.

(2) c[x] is not an item:

>>> c = range(10)
>>> x = slice(3,6)
>>> c[x] in c
False

This is another case where strings differ from other containers:

>>> c = 'abcbdefghij'
>>> c[x] in c
True

> For a dict, x must be hashable, but otherwise both x and the item returned are 
> unconstrained. Other mappings may remove the requirement for hashability.
> 
Yes, and Numpy is the prime example. However, in my view Numpy takes []
overloading a little bit too far. I Numpy c[x] can have the following meanings
(the list is probably incomplete):

1. Traditional subscripting. Multidimentional arrays are indexed by tuples. c[x]
is a scalar.

2.  Projection. Happens when len(x) < rank(c) or if ellipsis is present (to 
allow obtaining rank-0 arrays).  If you view multidimensional arrays as
functions (think of "()" replacing "[]"), then projection is  similar top
functional.partial. c[x] is an array view of rank <= rank(c).

3. Slicing. Nominally rank-preserving, by can be combined with projection in
the same expression. c[x] is an array view of rank = rank(c).

4. Special kind of reshape.  As in c[newaxis]. c[x] is an array view of
rank > rank(c).

5. Selection. AKA "fancy indexing": c[x] is a copy.

I probably missed a few, but you get the picture.  All that functionality could
be implemented without asking python to add new syntax.  Slicing can be a
function or a method.  Tuple-based indexing did not require any additional
syntax, but it could easily be implemented by overloading __call__ instead of
__getitem__ with an additional benefit of a natural way to support named
dimensions.  


> Sequences use the rule that x must be either an integer (object with an 
> __index__ method), or a slice object. The key characteristic that 
> distinguishes a sequence from a general mapping is that c[0:0] == type(c)().
> 

... and c[x] in c may be False.


> Multi-dimensional arrays then loosen the restrictions on x imposed by 
> sequences slightly to also permit tuples. The key characteristic to 
> distinguish Numpy-style arrays from other sequences is that c[0:0] == c[0:0,].
> 
As I explained above that loosening was not entirely necessary.  Numpy could
easily use "()" syntax for that and make itself more familiar to Fortran and
C++ programmers.

> These behaviours aren't fundamental rules of programming that need to be 
> embedded in the underlying language implementation. The kinds of subscript 
> that makes sense may vary from container to container. Python's current 
> approach avoids embedding particular interpretations in the language allowing 
> each data structure designer to make their own decisions (hopefully guided by 
> the conventions used for existing data structures).
> 

That's true, but using similar notation to perform different operations
depending on the type of the operands often leads to confusion, particularly 
when operands are otherwise similar.  This type of confusion is real: I've seen
a bug report for Numeric filed by a user who realized that he cannot resize an
array using slice assignment. 

> >   Slicing is specific  to lists, tuples  
> > and strings (I am ignoring non-built-in types for now).
> 
> Ignoring external types when discussing slicing is a mistake. Much of Python's 
> slicing design was driven by the Numpy folks, rather than the needs of the 
> standard library.
> 

Isn't it a sign of a weakness in the language design when an external library
dictates changes to the syntax?  Recognizing ':' and '...' only inside '[]' 
feels a little odd.  There are probably some parsing issues with making a:b
a shortcut for range(a,b) and allow it anywhere, but I don't see a problem
with making ... a keyword. (Parsing problems with ':' could be solved by 
spelling it '..' or 'to', but I know that's not going to happen:-). 

> > 3.  Overloading of []= syntax. Similarly to #2, this is the case when  
> > the same notation is used to do conceptually different operations.   
> > In addition it provides alternative ways to do the same thing (e.g. l  
> > += a vs. l[len(l):] = a).
> 
> The OOW in TOOWTDI stands for "One Obvious Way" not "Only One Way" :)
> 
> As Josiah said, for manipulating data structures, that obvious way is 
> typically the appropriate methods of the collection being used.
> 

I probably made a wrong example. My main gripe about slice assignment is that
it gives a feel of slice being a view when it is not.

> > 4. Extended slicing.  I believe the most common use case l[::-1] was  
> > eliminated with the introduction of "reversed".  The remaining  
> > functionality in case of a tuple c can be expressed as tuple(c[i] for  
> > i in range(start,stop, stride)).  The later is more verbose than c 
> > [start:stop:stride], but also more flexible.
> 
> Extended slicing was added to provide syntactic support for various operations 
> on Numpy's multi-dimensional arrays. As I understand it, the later addition of 
> support to the types in the standard library was more due to consistency 
> reasons than really compelling uses cases.

In Numpy the need for extended slicing is often a sign of an inapropriate choice
of dimension.  For example if you often use ::2 slices of 1-d arrays, you can
probably represent your data by an Nx2 matrix and refer to columns instead of
::2 slices.  However, I am not against the functionality, I am only against
the syntax.  I would prefer writing c.slice(0, 10, by=2) instead of c[0:10:2].

-- sasha 





More information about the Python-3000 mailing list