Extract all words that begin with x

Wed May 12 04:29:23 EDT 2010

Bryan, 12.05.2010 08:55:
> Now back to the arguably-interesting issue of speed in the particular
> problem here: 'Superpollo' had suggested another variant, which I
> appended to my timeit targets, resulting in:
>
> [s for s in strs if s.startswith('a')]  took:  5.68393977159
> [s for s in strs if s[:1] == 'a']  took:  3.31676491502
> [s for s in strs if s and s[0] == 'a']  took:  2.29392950076
>
> Superpollo's condition -- s and s[0] == 'a' -- is the fastest of the
> three.

Just out of curiosity, I ran the same code in the latest Cython pre-0.13 
and added some optimised Cython implementations. Here's the code:

def cython_way0(l):
     return [ s for s in l if s.startswith(u'a') ]

def cython_way1(list l):
     cdef unicode s
     return [ s for s in l if s.startswith(u'a') ]

def cython_way2(list l):
     cdef unicode s
     return [ s for s in l if s[:1] == u'a' ]

def cython_way3(list l):
     cdef unicode s
     return [ s for s in l if s[0] == u'a' ]

def cython_way4(list l):
     cdef unicode s
     return [ s for s in l if s and s[0] == u'a' ]

def cython_way5(list l):
     cdef unicode s
     return [ s for s in l if (<Py_UNICODE>s[0]) == u'a' ]

def cython_way6(list l):
     cdef unicode s
     return [ s for s in l if s and (<Py_UNICODE>s[0]) == u'a' ]

And here are the numbers (plain Python 2.6.5 first):

[s for s in strs if s.startswith(u'a')] took: 1.04618620872
[s for s in strs if s[:1] == u'a'] took: 0.518909931183
[s for s in strs if s and s[0] == u'a'] took: 0.617404937744

cython_way0(strs) took: 0.769457817078
cython_way1(strs) took: 0.0861849784851
cython_way2(strs) took: 0.208586931229
cython_way3(strs) took: 0.18615603447
cython_way4(strs) took: 0.190477132797
cython_way5(strs) took: 0.0366449356079
cython_way6(strs) took: 0.0368368625641

Personally, I think the cast to Py_UNICODE in the last two implementations 
shouldn't be required, that should happen automatically, so that way3/4 
runs equally fast as way5/6. I'll add that when I get to it.

Note that unicode.startswith() is optimised in Cython, so it's a pretty 
fast option, too. Also note that the best speed-up here is only a factor of 
14, so plain Python is quite competitive, unless the list is huge and this 
is really a bottleneck in an application.

Stefan