[Python-ideas] Add a datatype collections.ListView

David Mertz mertz at gnosis.cx
Thu Apr 17 17:49:45 CEST 2014


Following Brandon Rhodes very nice PyCon 2014 talk on datatypes, I was
struck by increased guilt over a program I am developing.  I spoke with him
in the hallway about a good approach to improving performance while keeping
code nice looking.

The basic idea in my program is that I have a large(ish) list of tokens
that I generate in parsing a special language, and I frequently take slices
from this list--and often enough, slices out of those slices.  In some
cases, the most straightforward approach in code is a slice-to-end, e.g.:

    start = find_thing(tokens)
    construct_in = tokens[start:]

Obviously, this winds up doing a lot of copying (of object references, not
of actual data, but still).  Often the above step is followed by something
like:

    end = find_end(construct_in)
    construct = construct_in[:end]

This second step isn't bad in my case since the actual construct will be
dozens of tokens, not thousands or millions, and once I find it I want to
keep it around and process it further.

I realize, of course, that I could program my 'find_end()' differently so
that it took a signature more like 'find_end(tokens, start=start)'.  But
with recursion and some other things I do, this becomes inelegant.

What I'd really like is a "ListView" that acts something like NumPy's
non-copying slices.  However numpy, of course, only deals with arrays and
matrices of uniform numeric types.  I want a non-copying "slice" of a list
of generic Python objects.  Moreover, I believe that such a type is useful
enough to be worth including in the collections module generally.

As an initial implementation, I created the below.  In the module
self-tests, the performance increase is about 100x, but the particular
ad-hoc benchmark I wrote isn't necessarily well-representative of all
use-cases.

% python3 ListView.py
         A bunch of slices from list: 1.29 seconds
A bunch of slices from DummyListView: 1.19 seconds
     A bunch of slices from ListView: 0.01 seconds
-------------------------------------------------------------------
### ListView.py ###
import sys
from time import time
from random import randint, random
from collections import Sequence

class DummyListView(Sequence):
    def __init__(self, l):
        self.list = l
    def __len__(self):
        return len(self.list)
    def __getitem__(self, i):
        return self.list[i]

class ListView(Sequence):
    def __init__(self, seq, start=0, stop=None):
        if hasattr(seq, '__getitem__'):
            self.list = seq
        else:
            self.list = list(seq)
        self.start = start
        self.stop = len(self.list) if stop is None else stop

    def __len__(self):
        return self.stop - self.start

    def __getitem__(self, i):
        if isinstance(i, slice):
            start = self.start if i.start is None else self.start+i.start
            if i.stop is None:
                stop = self.stop
            else:
                stop = self.start + i.stop
            return ListView(self.list, start, stop)
        else:
            val = self.list[i+self.start]
            if i < 0:
                return val
            elif not self.start <= i+self.start < self.stop:
                raise IndexError("View of sequence [%d:%d], index %d" % (
                                             self.start, self.stop, i))
            return val

    def __str__(self):
        return "ListView of %d item list, slice [%d:%d]" % (
                            len(self.list), self.start, self.stop)

    def __repr__(self):
        return "ListView(%s)" % self.list[self.start:self.stop]

    def to_list(self):
        return list(self.list[self.start:self.stop])

class Thing(object):
    def __init__(self, x):
        self.x = x
    def __repr__(self):
        return "Thing(%f)" % self.x

if __name__ == '__main__':
    NUM = 100000
    things = [Thing(random()) for _ in range(NUM)]
    slices = [sorted((randint(0, NUM-1), randint(0, NUM-1)))
              for _ in range(100)]
    offset = randint(0, 100)
    for name, cls in (("list", list),
                      ("DummyListView", DummyListView),
                      ("ListView",ListView)):
        begin = time()
        s = "A bunch of slices from %s: " % name
        print(s.rjust(38), end='')
        sys.stdout.flush()
        l = cls(things)
        for i in range(8):
            for start, stop in slices:
                sl = l[start:stop]
                size = stop-start
                for i in range(3):
                    subslice1 = sl[:offset]
                    subslice2 = sl[offset:]
        print("%0.2f seconds" % (time()-begin))


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140417/ee9462e1/attachment.html>


More information about the Python-ideas mailing list