Detecting repeated subsequences of identical items

Steven D'Aprano steve at pearwood.info
Thu Apr 21 08:15:16 EDT 2016


On Thu, 21 Apr 2016 06:53 pm, Oscar Benjamin wrote:

> On 21 April 2016 at 04:07, Steven D'Aprano <steve at pearwood.info> wrote:
>> I want to group repeated items in a sequence. For example, I can group
>> repeated sequences of a single item at a time using groupby:
>>
>>
>> from itertools import groupby
>> for key, group in groupby("AAAABBCDDEEEFFFF"):
>>     group = list(group)
>>     print(key, "count =", len(group))
>>
>>
>> outputs:
>>
>> A count = 4
>> B count = 2
>> C count = 1
>> D count = 2
>> E count = 3
>> F count = 4
>>
>>
>> Now I want to group subsequences. For example, I have:
>>
>> "ABCABCABCDEABCDEFABCABCABCB"
>>
>> and I want to group it into repeating subsequences. I can see two ways to
>> group it:
>>
>> ABC ABC ABCDE ABCDE F ABC ABC ABC B
> 
> There are some algorithms (helpfully shown in Python) here:
> 
> https://en.wikipedia.org/wiki/Cycle_detection

It's not necessarily a cycle though. Consider a sequence of function calls:

def f(x):
    return g(x)

def g(x):
    if x < 7:
        return h(x)
    elif x < 50:
        return g(x//2)
    else:
        return x+f(x-1)

def h(x):
    raise ValueError  # oops, a bug


and a function call f(54). That will result in the chain of calls:

f(54) -> g(54) -> f(53) -> g(53) -> f(52) -> g(52) -> f(51) -> g(51) ->
f(50) -> g(50) -> g(25) -> g(12) -> g(6) -> h(6) raises

So you have that almost-cycle f calls g calls f, but it isn't periodic
because you break out of it. I'd still like to detect the repeated f->g
calls.




-- 
Steven




More information about the Python-list mailing list