finding all sublists of list2 that are identical to list1

Dang Griffith noemail at noemail4u.com
Mon Feb 9 11:32:57 EST 2004


On 9 Feb 2004 06:59:35 -0800, klaus_neuner82 at yahoo.de (Klaus Neuner)
wrote:

>Hello, 
>
>the function given below returns all indexes of list2 where a sublist
>of list2 that is identical to list1 begins.
>
>As I will need this function quite often, I would like to know if more
>experienced programmers would agree with the way I defined the
>function:
>
>- Is there a more efficient way to do it? (Apart from building
>automata.)
>- Is there a more elegant/Pythonic way to write the function?
>
>Klaus
>
>
>def sublist(list1, list2):
>    if list1 == list2:
>        return [0]
>    elif len(list1) > len(list2):
>        return []
>    else:
>        result = []
>        shift = 0
>        i = 0
>        while shift <= len(list2)-len(list1):
>            if list1[i] == list2[shift+i]:
>                if i == len(list1)-1:
>                    result.append(shift)
>                    i = 0
>                    shift += 1
>                else:
>                    i += 1
>            else:
>                i = 0
>                shift += 1
>        return result

I don't know about it being Pythonic, but list comprehensions are
always fun.  This one's 10-20% faster (based on timeit results) than
your code:

def sublist(list1, list2):
    len1 = len(list1)
    rnge = xrange(len(list2))
    return [i for i in rnge if list2[i:len1+i] == list1]

If you need this for a specific purpose, you might be able to
transform the data and try a different technique which would be
executed internally at the "C" level.  For example, if your list is
always integers, you might be able to join them into a tab-delimited
string and scan the string with a regular expression (unless you
consider that building automata).

The list doesn't have to be integers for the regex idea to be
used--you just have to be sure the values in list1 contain no regex
metacharacters, and that list1 and list2 don't contain your delimiter
(tab in this example).  If list1 contains regex characters, you need
to escape them with a backslash first.

It's possible that if you know something about the size of list1, you
could use this information also, but without more details, I'm
shooting in the dark.

    --dang



More information about the Python-list mailing list