UserList.getslice(): copy.copy(self.data) vs. self.class(self.data).

Tue Mar 14 16:36:33 EST 2000

In the Python Reference Manual, Section 3.3.5, "Additional methods for 
emulation of sequence types", we find the following entry:

  ...

  __getslice__ (self, i, j) 

  Called to implement evaluation of self[i:j]. The returned 
  object should be of the same type as self. Note that missing 
  i or j in the slice expression are replaced by zero or 
  sys.maxint, respectively, and no further transformations on 
  the indices is performed. The interpretation of negative 
  indices and indices larger than the length of the sequence 
  is up to the method. 

  ...

The current implementation of UserList.__getslice__(),  looks like this:

    def __getslice__(self, i, j):
        i = max(i, 0); j = max(j, 0)
        userlist = self.__class__()
        userlist.data[:] = self.data[i:j]
        return userlist

Though this follows the guidelines outlined in the reference manual, it 
has an interesting side effect: it instantiates a new object of the same 
class but it loses the current values of all attributes.  

Is this desireable behavior?   Personally, I don't believe that it is.  
My thinking is that the current internal state of the class should pass 
to the newly instantiated object.  I think a better implementation would 
be:  

    def __getslice__(self, i, j):
        i = max(i, 0); j = max(j, 0)
        userlist = copy.copy(self)
        userlist.data[:] = self.data[i:j]
        return userlist

Also, should the i and j arguments be "adjusted" before being used to 
access the list in self.data?  Again, in the Python Reference Manual, 
section 5.3.3 "Slicings," we find:

  The lower and upper bound expressions, if present, must evaluate 
  to plain integers; defaults are zero and the sequence's length,
  respectively. If either bound is negative, the sequence's length 
  is added to it. 

So, the runtime normalizes negative numbers so that small-enough (large 
enough??<g>) negative numbers start counting from the end of the 
sequence.  i.e., aList[-1] returns the last element in the aList.  As 
currently written, UserList converts to zero any negative numbers that 
would otherwise raise an IndexError.  The resulting behavior is that a 
slice is returned rather than an IndexError being raised, thus:

  >>> ul=UserList.UserList([0,1,2,3,4])
  >>> ul[-1] # last element
  4
  >>> ul[-3:-1] # 3rd- and 2nd-to-the-last elements
  [2,3]
  >>> ul[-10] # calls UserList__getitem__(self,i)
  Traceback (innermost last):
    File "<interactive input>", line 1, in ?
    File "UserList.py", line 29, in __getitem__
      def __delitem__(self, i): del self.data[i]
  IndexError: list index out of range
  >>> ul[-10:-1] # should raise IndexError
  [0, 1, 2, 3]
  >>> 

Again, at least to me, this behavior seems to be to be inconsistent with 
a "real" list object.

I think this is a better implementation:

    def __getslice__(self, i, j):
        userlist = copy.copy(self)
        userlist.data[:] = self.data[i:j]
        return userlist

I'd be more than happy to implement these changes (there are a couple of 
places where self.__class__() is called and where method arguments are 
normalized to zero) and submit the context diffs -- unless I'm 
overwhelmed with arguments to the contrary.  Since UserList.py is part of 
the standard distribution, and could break existing code if changes are 
made, I wanted to bounce this off of the community before proceeding.

I'm using UserList in a current project and I've already implemented 
these changes into a NewUserList class.  Submitting context diffs would 
be a piece of cake.

Any feedback?  Should I proceed?

-- 
-=< tom >=-
Thomas D. Funk (tdfunk at asd-web.com)      |        "Software is the lever
Software Engineering Consultant          | Archimedes was searching for"
Advanced Systems Design, Tallahassee FL. |

UserList.__getslice__(): copy.copy(self.data) vs. self.__class__(self.data).

UserList.getslice(): copy.copy(self.data) vs. self.class(self.data).