Negative array indicies and slice()

Andrew Robinson andrew3 at r3dsolutions.com
Thu Nov 1 18:25:51 EDT 2012


On 11/01/2012 12:07 PM, Ian Kelly wrote:
> On Thu, Nov 1, 2012 at 5:32 AM, Andrew Robinson
> <andrew3 at r3dsolutions.com>  wrote:
>> Hmmmm.... was that PEP the active state of Python, when Tim rejected the bug report?
> Yes. The PEP was accepted and committed in March 2006 for release in
> Python 2.5.  The bug report is from June 2006 has a version
> classification of Python 2.5, although 2.5 was not actually released
> until September 2006.
That explain's Peter's remark.  Thank you.  He looks *much* smarter now.

>
>> Pep 357 merely added cruft with index(), but really solved nothing.  Everything index() does could be implemented in __getitem__ and usually is.
> No.  There is a significant difference between implementing this on
> the container versus implementing it on the indexes.  Ethan
> implemented his string-based slicing on the container, because the
> behavior he wanted was specific to the container type, not the index
> type.  Custom index types like numpy integers on the other hand
> implement __index__ on the index type, because they apply to all
> sequences, not specific containers.

Hmmm...
D'Aprano didn't like the monkey patch;and sub-classing was his fix-all.

Part of my summary is based on that conversation with him,and you 
touched on one of the unfinished  points; I responded to him that I 
thought __getitem__ was under-developed.   The object slice() has no 
knowledge of the size of the sequence; nor can it get that size on it's 
own, but must passively wait for it to be given to it.

The bottom line is:  __getitem__ must always *PASS* len( seq ) to 
slice() each *time* the slice() object is-used.  Since this is the case, 
it would have been better to have list, itself, have a default member 
which takes the raw slice indicies and does the conversion itself.  The 
size would not need to be duplicated or passed -- memory savings, & 
speed savings...

I'm just clay pidgeoning an idea out here....
Let's apply D'Aprano 's logic to numpy; Numpy could just have subclassed 
*list*; so let's ignore pure python as a reason to do anything on the 
behalf on Numpy:

Then, lets' consider all thrid party classes;  These are where 
subclassing becomes a pain -- BUT: I think those could all have been 
injected.

 >>> class ThirdParty( list ):  # Pretend this is someone else's...
...     def __init__(self): return
...     def __getitem__(self,aSlice): return aSlice
...

We know it will default work like this:
 >>> a=ThirdParty()
 >>> a[1:2]
slice(1, 2, None)

# So, here's an injection...
 >>> ThirdParty.superOnlyOfNumpy__getitem__ = MyClass.__getitem__
 >>> ThirdParty.__getitem__ = lambda self,aSlice: ( 1, 3, 
self.superOnlyOfNumpy__getitem__(aSlice ).step )
 >>> a[5:6]
(1, 3, None)

Numpy could have exported a (workable) function that would modify other 
list functions to affect ONLY numpy data types (eg: a filter).  This 
allows user's creating their own classes to inject them with Numpy's 
filter only when they desire;

Recall Tim Peter's "explicit is better than implicit" Zen?

Most importantly normal programs not using Numpy wouldn't have had to 
carry around an extra API check for index() *every* single time the 
heavily used [::] happened.  Memory & speed both.

It's also a monkey patch, in that index() allows *conflicting* 
assumptions in violation of the unexpected monkey patch interaction worry.

eg: Numpy *CAN* release an index() function on their floats -- at which 
point a basic no touch class (list itself) will now accept float as an 
index in direct contradiction of PEP 357's comment on floats... see?

My point isn't that this particular implementation I have shown is the 
best (or even really safe, I'd have to think about that for a while).  
Go ahead and shoot it down...

My point is that, the methods found in slice(), and index() now have 
moved all the code regarding a sequence *out* of the object which has 
information on that sequence.  It smacks of legacy.

The Python parser takes values from many other syntactical constructions 
and passes them directly to their respective objects -- but in the case 
of list(), we have a complicated relationship; and not for any reason 
that can't be handled in a simpler way.

Don't consider the present API legacy for a moment, I'm asking 
hypothetical design questions:

How many users actually keep slice() around from every instance of [::] 
they use?
If it is rare, why create the slice() object in the first place and 
constantly be allocating and de-allocating memory, twice over? (once for 
the original, and once for the repetitive method which computes dynamic 
values?)  Would a single mutable have less overhead, since it is 
destroyed anyway?




More information about the Python-list mailing list