best way to determine sequence ordering?

Sat Apr 29 19:01:44 EDT 2006

nikie wrote:
> Steven Bethard wrote:
> 
>> nikie wrote:
>>> Steven Bethard wrote:
>>>
>>>> John Salerno wrote:
>>>>> If I want to make a list of four items, e.g. L = ['C', 'A', 'D', 'B'],
>>>>> and then figure out if a certain element precedes another element, what
>>>>> would be the best way to do that?
>>>>>
>>>>> Looking at the built-in list functions, I thought I could do something
>>>>> like:
>>>>>
>>>>> if L.index('A') < L.index('D'):
>>>>>     # do some stuff
>>>> This is probably a pretty reasonable approach as long as you're not too
>>>> worried about efficiency.  It scans the list twice though, so if you
>>>> start doing this with really big lists, it might be better to do
>>>> something like:
>>> On average, it'll only have go through half the list twice, as it can
>>> break as soon as it finds the item it searches for. Don't think you can
>>> get any more efficient than that.
>> Sure you can:
>>
>>      a_index = None
>>      d_index = None
>>      for i, item in enumerate(L):
>>          if item == 'A':
>>              a_index = i
>>          elif item == 'D':
>>              d_index = i
>>          if a_index is not None and d_index is not None:
>>              break
>>
>>      if a_index < d_index:
>>          # do some stuff
>>
>> That goes through exactly the minimum number of elements.  But it's
>> probably slower than two .index() calls since .index() is coded in C.
>> But timeit and see if you're interested.
> 
> On my PC, calling index twice is 5 times faster.
> 
> But let's just pretend for a moment Python code was compiled and
> optimized as well as C code.

Ok, lets get comparable functions by writing them both in Python:

----- temp.py -----
def index(L, item):
     for i, x in enumerate(L):
         if x == item:
             return i
     return -1

def a_d_index(L):
     a_index = None
     d_index = None
     for i, item in enumerate(L):
         if item == 'A':
             a_index = i
         elif item == 'D':
             d_index = i
         if a_index is not None and d_index is not None:
             break
     return a_index, d_index
-------------------

$ python -m timeit -s "import temp; L = ['C', 'A', 'D', 'B']" "a = 
temp.index(L, 'A'); d = temp.index(L, 'D'); a < d"
100000 loops, best of 3: 5.73 usec per loop

$ python -m timeit -s "import temp; L = ['C', 'A', 'D', 'B']" "a, d = 
temp.a_d_index(L); a < d"
100000 loops, best of 3: 3.75 usec per loop

The a_d_index() function is definitely faster.

I understand your point about comparison time, but in this case we're 
just comparing one element strings, so it's not so horrible.  Sure, if 
you used strings that are more complicated to compare (or worse yet, you 
used your own custom objects with __cmp__ methods) you could provoke the 
kind of behavior you're looking for.

But in this particular case, there really is some substantial overhead 
to Python's iterator protocol, and you can't just ignore that.

Of course the moral of the story (as is the moral of all such threads) 
is that if you're really interested in speeding things up you need to 
start timing things.

STeVe