getting an item's list index

Sat Apr 19 11:36:04 EDT 2003

<posted & mailed>

Michael Anckaert wrote:

> Hello all,
> 
> I have a list called contents. If one of the elements has the value
> "item1" for example, how can I get that elements index?

This is one question, and I see it has been answered repeatedly and
correctly -- contents.index(item) will return the index of the FIRST
occurrence of value item in list contents (or raise an exception if
the value never occurs).

> for item in contents:
> # here I need to get that items index

This is a very, VERY different question, as I see some respondents
have mentioned, but it's probably worth clarifying.  There is NO
way you can retrieve the exact index of an item's value once you
choose to loop with "for item in contents:" -- you'll only get the
index of the FIRST occurrence of each item, each time you meet
that item.  For example:

>>> contents=list('paraply')
>>> contents
['p', 'a', 'r', 'a', 'p', 'l', 'y']
>>> for item in contents:
...     idx = contents.index(item)
...     print (idx, item),
...
(0, 'p') (1, 'a') (2, 'r') (1, 'a') (0, 'p') (5, 'l') (6, 'y')
>>>

See?  Each 'p' is always associated with index 0, that of the
first occurrence of 'p', and similarly each 'a' with index 1.

Even if you KNOW that the list contents has no duplicates, it's
still UN-advisable to use a loop such as this -- each call to
index takes time proportional to the resulting index, so the
overall time of the loop should end up taking time proportional
to the SQUARE of the list length's, so-called "O(N squared)".
For example:

[alex at lancelot python2.3]$ python timeit.py -s'c=range(1000)' 'for it in c: 
i = c.index(it)'
10 loops, best of 3: 4.02e+04 usec per loop

[alex at lancelot python2.3]$ python timeit.py -s'c=range(2000)' 'for it in c: 
i = c.index(it)'
10 loops, best of 3: 1.6e+05 usec per loop
[alex at lancelot python2.3]$

See?  A list twice as long takes up FOUR times as long for
the loop -- and:

[alex at lancelot python2.3]$ python timeit.py -s'c=range(3000)' 'for it in c: 
i = c.index(it)'
10 loops, best of 3: 3.6e+05 usec per loop
[alex at lancelot python2.3]$

list three times as long, takes NINE times as long for the loop -- etc.
Time for the loop grows like the square of the list's length, in fact.

You can do MUCH better, in such a loop, by looping on indices
directly -- easiest in the coming Python 2.3:

for idx, item in enumerate(contents):
    print (idx, item),

but not TOO terrible even in good old Python 2.2:

for idx in xrange(len(contents)):
    item = contents[idx]
    print (idx, item),

See what happens to the resulting time...:

[alex at lancelot python2.3]$ python timeit.py -s'c=range(3000)' 'for i, it in 
enumerate(c): pass'
1000 loops, best of 3: 1.12e+03 usec per loop
[alex at lancelot python2.3]$

these 1.12 milliseconds per loop correspond to the 360 milliseconds
per loop withe the "for it in c: i = c.index(it)" approach -- a
speedup of over THREE HUNDRED TIMES even for a list of reasonably
moderate length (3000 items), growing without bounds as the list gets
longer and longer.

An excessive focus on performance is a widespread disease in programmers,
but SOME modicum of attention to performance, when you can get speedups
of hundreds and hundreds of times without actually complicating your
code (in fact making it simpler and more elegant), is probably worthwhile.

So, summarizing: if you're looping over a whole list and may need to
know, within the body of the loop, up to what index in the list you've
reached -- forget the list's method .index, and ensure your LOOP itself
tells you the current index -- with enumerate if you can upgrade to
Python 2.3 (a bit too early for that, it's still in alpha -- but the
first beta will be out VERY soon!!!) -- or in Python 2.2 or earlier by
looping directly with the index, in xrange(len(contents)), and using
indexing, contents[i], to get the current item, rather than doing things
the other way 'round.  The key here is that indexing, contents[i], is a
VERY fast operation, taking a short constant time for any list and index
in it, while "looking for the index for an item" is a lengthy operation,
taking time proportional to the index of the item's first occurrence.

Alex