[Python-ideas] Add a datatype collections.ListView

David Mertz mertz at gnosis.cx
Sun Apr 20 16:04:31 CEST 2014


On Apr 20, 2014 3:02 AM, "Tal Einat" <taleinat at gmail.com> wrote:
>
> FYI your handling of negative indexes is buggy.
>

It's horrible! I'm not quite sure it can even be called "handling." Take my
example only as a prototype of the concept that lets me run the benchmark.
That said, doing the negative indices right should just be some arithmetic
on self.start and self.stop, not anything too complicated or differently
performing.

>
> On Thu, Apr 17, 2014 at 6:49 PM, David Mertz <mertz at gnosis.cx> wrote:
>>
>> Following Brandon Rhodes very nice PyCon 2014 talk on datatypes, I was
struck by increased guilt over a program I am developing.  I spoke with him
in the hallway about a good approach to improving performance while keeping
code nice looking.
>>
>> The basic idea in my program is that I have a large(ish) list of tokens
that I generate in parsing a special language, and I frequently take slices
from this list--and often enough, slices out of those slices.  In some
cases, the most straightforward approach in code is a slice-to-end, e.g.:
>>
>>     start = find_thing(tokens)
>>     construct_in = tokens[start:]
>>
>> Obviously, this winds up doing a lot of copying (of object references,
not of actual data, but still).  Often the above step is followed by
something like:
>>
>>     end = find_end(construct_in)
>>     construct = construct_in[:end]
>>
>> This second step isn't bad in my case since the actual construct will be
dozens of tokens, not thousands or millions, and once I find it I want to
keep it around and process it further.
>>
>> I realize, of course, that I could program my 'find_end()' differently
so that it took a signature more like 'find_end(tokens, start=start)'.  But
with recursion and some other things I do, this becomes inelegant.
>>
>> What I'd really like is a "ListView" that acts something like NumPy's
non-copying slices.  However numpy, of course, only deals with arrays and
matrices of uniform numeric types.  I want a non-copying "slice" of a list
of generic Python objects.  Moreover, I believe that such a type is useful
enough to be worth including in the collections module generally.
>>
>> As an initial implementation, I created the below.  In the module
self-tests, the performance increase is about 100x, but the particular
ad-hoc benchmark I wrote isn't necessarily well-representative of all
use-cases.
>>
>> % python3 ListView.py
>>          A bunch of slices from list: 1.29 seconds
>> A bunch of slices from DummyListView: 1.19 seconds
>>      A bunch of slices from ListView: 0.01 seconds
>> -------------------------------------------------------------------
>> ### ListView.py ###
>> import sys
>> from time import time
>> from random import randint, random
>> from collections import Sequence
>>
>> class DummyListView(Sequence):
>>     def __init__(self, l):
>>         self.list = l
>>     def __len__(self):
>>         return len(self.list)
>>     def __getitem__(self, i):
>>         return self.list[i]
>>
>> class ListView(Sequence):
>>     def __init__(self, seq, start=0, stop=None):
>>         if hasattr(seq, '__getitem__'):
>>             self.list = seq
>>         else:
>>             self.list = list(seq)
>>         self.start = start
>>         self.stop = len(self.list) if stop is None else stop
>>
>>     def __len__(self):
>>         return self.stop - self.start
>>
>>     def __getitem__(self, i):
>>         if isinstance(i, slice):
>>             start = self.start if i.start is None else self.start+i.start
>>             if i.stop is None:
>>                 stop = self.stop
>>             else:
>>                 stop = self.start + i.stop
>>             return ListView(self.list, start, stop)
>>         else:
>>             val = self.list[i+self.start]
>>             if i < 0:
>>                 return val
>>             elif not self.start <= i+self.start < self.stop:
>>                 raise IndexError("View of sequence [%d:%d], index %d" % (
>>                                              self.start, self.stop, i))
>>             return val
>>
>>     def __str__(self):
>>         return "ListView of %d item list, slice [%d:%d]" % (
>>                             len(self.list), self.start, self.stop)
>>
>>     def __repr__(self):
>>         return "ListView(%s)" % self.list[self.start:self.stop]
>>
>>     def to_list(self):
>>         return list(self.list[self.start:self.stop])
>>
>> class Thing(object):
>>     def __init__(self, x):
>>         self.x = x
>>     def __repr__(self):
>>         return "Thing(%f)" % self.x
>>
>> if __name__ == '__main__':
>>     NUM = 100000
>>     things = [Thing(random()) for _ in range(NUM)]
>>     slices = [sorted((randint(0, NUM-1), randint(0, NUM-1)))
>>               for _ in range(100)]
>>     offset = randint(0, 100)
>>     for name, cls in (("list", list),
>>                       ("DummyListView", DummyListView),
>>                       ("ListView",ListView)):
>>         begin = time()
>>         s = "A bunch of slices from %s: " % name
>>         print(s.rjust(38), end='')
>>         sys.stdout.flush()
>>         l = cls(things)
>>         for i in range(8):
>>             for start, stop in slices:
>>                 sl = l[start:stop]
>>                 size = stop-start
>>                 for i in range(3):
>>                     subslice1 = sl[:offset]
>>                     subslice2 = sl[offset:]
>>         print("%0.2f seconds" % (time()-begin))
>>
>>
>> --
>> Keeping medicines from the bloodstreams of the sick; food
>> from the bellies of the hungry; books from the hands of the
>> uneducated; technology from the underdeveloped; and putting
>> advocates of freedom in prisons.  Intellectual property is
>> to the 21st century what the slave trade was to the 16th.
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140420/cff5af9c/attachment.html>


More information about the Python-ideas mailing list