Test if list contains another list

Mon Sep 29 02:15:19 EDT 2008

On Fri, Sep 26, 2008 at 01:39:16PM -0700, bearophileHUGS at lycos.com wrote:
>     # building prefix-function
>     m = 0
>     for i in xrange(1, len_sub):
>         while m > 0 and sub[m] != sub[i]:
>             m = table[m - 1]
>         if sub[m] == sub[i]:
>             m += 1
>         table[i] = m
> 
>     # searching
>     m, i = 0, 0
>     for x in items:
>         while m > 0 and sub[m] != x:
>             m = table[m - 1]
>         if sub[m] == x:
>             m += 1
>             if m == len_sub:
>                 return True
>         i += 1
> 
>     return False

Quite a lot faster than mine... even without using psyco.  Which is
interesting, especially because if I change my search loop to work
like yours, but leave out all your other optimizations, mine runs
roughly as fast as yours (i.e. execution time is negligible to a user
running the program in both cases, even with the large example data
you gave).  This leads me to point out two caveats with your version:

1. Guavat posted a version which returns a list of all the indexes
where a match is found... which is what I mimiced.  Yours returns only
true or false indicating whether or not it found a match.  The main
difference in performance seems to come due to your iteration over the
list items, versus my comparing a sublist-sized slice of the whole to
the sublist.  But this alone is an invalid optimization, because it
doesn't produce the same results...  If you only want to know if a
match exists, it's great; but if you want to know *where*, or *how
many times*, you lose.  That could be fixed though, by counting the
elements as you loop through them...  I didn't attempt to determine
the performance hit for doing that, but I assume it's negligible.
I also imagine that that was your original purpose for the unused
variable i...

2. Your secondary optimizations add a great deal of complexity to your
code, making the algorithm much harder to understand.  However they
don't appear to buy you much, given that the cases they optimize would
probably be rare, and the difference in execution time gained by the
optimization is not noticable to the user.  Unless you're doing lots
and lots of these in your application, or maybe if you know in advance
that your data will contain many instances of the cases you optimized,
I think you're better off leaving the optimizations out, for the sake
of code clarity.  

At the very least, if you're going to write complicated optimizations,
you ought to have explained what you were doing in comments...  :)

-- 
Derek D. Martin
http://www.pizzashack.org/
GPG Key ID: 0x81CFE75D

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20080929/b2f6cbe8/attachment-0001.sig>