Negative array indicies and slice()

Thu Nov 1 22:14:55 EDT 2012

On Thu, 01 Nov 2012 15:25:51 -0700, Andrew Robinson wrote:

> On 11/01/2012 12:07 PM, Ian Kelly wrote:

>>> Pep 357 merely added cruft with index(), but really solved nothing. 
>>> Everything index() does could be implemented in __getitem__ and
>>> usually is.
>>
>> No.  There is a significant difference between implementing this on the
>> container versus implementing it on the indexes.  Ethan implemented his
>> string-based slicing on the container, because the behavior he wanted
>> was specific to the container type, not the index type.  Custom index
>> types like numpy integers on the other hand implement __index__ on the
>> index type, because they apply to all sequences, not specific
>> containers.
> 
> Hmmm...
> D'Aprano didn't like the monkey patch;and sub-classing was his fix-all.

I pointed out that monkey-patching is a bad idea, even if it worked. But 
it doesn't work -- you simply cannot monkey-patch built-ins in Python. 
Regardless of whether "I like" the m-p or not, *you can't use it* because 
you patch built-in list methods.

The best you could do is subclass list, then shadow the built-in name 
"list" with your subclass. But that gives all sorts of problems too, in 
some ways even worse than monkey-patching.

You started this thread with a question about slicing. You believe that 
one particular use-case for slicing, which involves interpreting lists as 
circular rather than linear, is the use-case that built-in list slicing 
should have supported.

Fine, you're entitled to your option. But that boat has sailed about 20 
years ago. Python didn't make that choice, and it won't change now. If 
you write up a PEP, you could aim to have the built-in behaviour changed 
for Python 4 in perhaps another 10-15 years or so. But for the time 
being, that's not what lists, tuples, strings, etc. do. If you want that 
behaviour, if you want a circular list, then you have to implement it 
yourself, and the easiest way to do so is with a subclass.

That's not a "fix-all". I certainly don't believe that subclassing is the 
*only* way to fix this, nor that it will fix "all" things. But it might 
fix *some* things, such as you wanting a data type that is like a 
circular list rather than a linear list.

If you prefer to create a circular-list class from scratch, re-
implementing all the list-like behaviour, instead of inheriting from an 
existing class, then by all means go right ahead. If you have a good 
reason to spend days or weeks writing, testing, debugging and fine-tuning 
your new class, instead of about 15 minutes with a subclass, then I'm 
certainly not going to tell you not to.

> Part of my summary is based on that conversation with him,and you
> touched on one of the unfinished  points; I responded to him that I
> thought __getitem__ was under-developed.   The object slice() has no
> knowledge of the size of the sequence; nor can it get that size on it's
> own, but must passively wait for it to be given to it.

That's because the slice object is independent of the sequence. As I 
demonstrated, you can pass a slice object to multiple sequences. This is 
a feature, not a bug.

> The bottom line is:  __getitem__ must always *PASS* len( seq ) to
> slice() each *time* the slice() object is-used.

The bottom line is: even if you are right, so what?

The slice object doesn't know what the length of the sequence is. What 
makes you think that __getitem__ passes the length to slice()? Why would 
it need to recreate a slice object that already exists?

It is the *sequence*, not the slice object, that is responsible for 
extracting the appropriate items when __getitem__ is called. __getitem__ 
gets a slice object as argument, it doesn't create one. It no more 
creates the slice object than mylist[5] creates the int 5.

> Since this is the case,

But it isn't.

> it would have been better to have list, itself, have a default member
> which takes the raw slice indicies and does the conversion itself.  The
> size would not need to be duplicated or passed -- memory savings, &
> speed savings...

We have already demonstrated that slice objects are smaller than (x)range 
objects and three-item tuples. In Python 3.3:

py> sys.getsizeof(range(1, 10, 2))  # xrange remained in Python 3
24
py> sys.getsizeof((1, 10, 2))
36
py> sys.getsizeof(slice(1, 10, 2))
20

It might help you to be taken seriously if you base your reasoning on 
Python as it actually is, rather than counter-factual assumptions.

> I'm just clay pidgeoning an idea out here.... Let's apply D'Aprano 's
> logic to numpy; Numpy could just have subclassed *list*; 

Sure they could have, if numpy arrays were intended to be a small 
variation on Python lists. But they weren't, so they didn't.

> so let's ignore
> pure python as a reason to do anything on the behalf on Numpy:
> 
> Then, lets' consider all thrid party classes;  These are where
> subclassing becomes a pain -- BUT: I think those could all have been
> injected.
> 
>  >>> class ThirdParty( list ):  # Pretend this is someone else's...
> ...     def __init__(self): return
> ...     def __getitem__(self,aSlice): return aSlice 
> ...

Strange and bizarre semantics for slicing, but okay.

> We know it will default work like this:
>  >>> a=ThirdParty()
>  >>> a[1:2]
> slice(1, 2, None)
> 
> # So, here's an injection...
>  >>> ThirdParty.superOnlyOfNumpy__getitem__ = MyClass.__getitem__
>  >>> ThirdParty.__getitem__ = lambda self,aSlice: ( 1, 3,
> self.superOnlyOfNumpy__getitem__(aSlice ).step )
>  >>> a[5:6]
> (1, 3, None)
> 
> Numpy could have exported a (workable) function that would modify other
> list functions to affect ONLY numpy data types (eg: a filter).  This
> allows user's creating their own classes to inject them with Numpy's
> filter only when they desire;

Sure, the numpy people could have done this, if they were smoking crack.

Have you actually programmed before? Judging from the techniques you seem 
to prefer for everyday use (monkey-patching other classes) and techniques 
you seem to hate (subclassing), I'm getting the impression you've read 
about bleeding edge programming hacks but never actually written code. 
Sort of like somebody who has never driven a car, but fantasises about 
doing the sort of extreme stunt driving that kills people in real life 
and occasionally even stunt drivers. And now you are *insisting* that 
everyone should drive like that, *all the time*, because stopping at 
traffic lights is so inefficient.

Of course, I could be wrong. Maybe you've been programming for years and 
know exactly what you are doing. But if so, you are coming across as 
exactly the kind of cowboy coder that I pray to all the gods I never have 
deal with in real life.

[...]
> Don't consider the present API legacy for a moment, I'm asking
> hypothetical design questions:
> 
> How many users actually keep slice() around from every instance of [::]
> they use?

Does it matter? It is supported behaviour, so even *one* user is enough.

> If it is rare, why create the slice() object in the first place and
> constantly be allocating and de-allocating memory, twice over? (once for
> the original, and once for the repetitive method which computes dynamic
> values?)

Huh? As opposed to what? Creating an xrange() object, and constantly 
allocating and de-allocating memory? Or a tuple? Same again. Some sort of 
object has to be created.

And I have no idea what you are talking about "twice over". What 
"repetitive method which computers dynamic values"?

In any case, I return to my comment earlier in this thread: if you have 
profiled your application and have hard evidence that creating slice 
objects is a bottleneck, then we can talk about optimizing the slice 
objects. Until then, you are wasting your time and ours by prematurely 
optimizing the wrong parts of your code.

> Would a single mutable have less overhead, since it is
> destroyed anyway?

What? This question makes no sense. Why do you think that mutable objects 
have "less overhead" than immutable ones?

-- 
Steven