Optimisation problem - in C?

Remco Gerlich scarblac at pino.selwerd.nl
Wed Feb 21 08:02:34 EST 2001


Tim Churches <tchur at optushome.com.au> wrote in comp.lang.python:
> After a bit of profiling I have worked out exactly which part of a data
> crunching class on which I am working needs to be optimised - maybe in
> C? The abstracted function which needs to run faster is as follows - not
> surprisingly it is itself a loop which lives within two other loops -
> hence it is called rather frequently:
> 
> ############
> def makevector(sourcelist,elementlist):
> 	resultvector = []
> 	for element in elementlist:
> 		resultvector.append(sourcelist[element])
> 	return resultvector
> ###########
> 
> My test data has about 300,000 elements in sourcelist and between about
> 10,000 and 70,000 elements in elementlist, but larger lists are
> envisaged, hence the earnest desire for maximum speed.

Yes, I can see that this would be slow.

It's too bad that lists don't have a .get() method (so x.get(3) would be
equal to x[3]), because resultvector=map(sourcelist.get, elementlist)
would be fastest (all done in C). As it is, you have to fake that with
resultvector=map(lambda i, l=sourcelist: l[i], elementlist), and the
lambda calls mean that this is much slower than your version

resultvector = [sourcelist[element] for element in elementlist]
is Python 2.0, and it at least is somewhat faster than your version (0.290s
for your version, 0.250 for the list comprehension on my machine with
len(sourcelist)=300000 and len(elementlist)=70000.

Wait, one of the main problems of your first version is that the list has to
grow. What if we create a list of the same length as elementlist first, say
by copying it?

def makevector(sourcelist, elementlist):
   resultvector = elementlist[:]
   for i in range(len(elementlist)):
      resultvector[i] = sourcelist[elementlist[i]]
   return resultvector
   
This one runs in 0.170 seconds on my computer, on average.

I'm out of further ideas. Too bad there's no syntax construct for 'for i in
range(len(...))' yet.

> Can this function (which essentially does the fairly generic operation
> of "fetch the elements of sequence a using the element indexes contained
> in sequence b") be optimised without resorting to a C module?

By about 40%, but for more you'll need to do it in C. I don't know how much
of a gain you can expect there.

How do you use this list? If you only use part of the data later you might
choose not to compute the whole thing at all but put a lazy wrapper around it.

class Wrapper:
   def __init__(self, sourcelist, elementlist):
      self.sl = sourcelist
      self.el = elementlist
   def __getitem__(self, i):
      return self.sl[self.el[i]]

resultlist = Wrapper(sourcelist, elementlist)

This would be faster if you use only a small part of the whole list.

> If not, is anyone willing to hold my hand (it's clean, but rather
> sweaty) while I fumble with trying to replace the function with a C
> module? (Note: after years of unremitting effort, I have finally
> mastered all aspects of hello_world.c)

I'll pass this one on, haven't needed to do it myself yet...

-- 
Remco Gerlich



More information about the Python-list mailing list