Detecting repeated subsequences of identical items

Oscar Benjamin oscar.j.benjamin at gmail.com
Thu Apr 21 10:01:27 EDT 2016


On 21 April 2016 at 13:15, Steven D'Aprano <steve at pearwood.info> wrote:
> On Thu, 21 Apr 2016 06:53 pm, Oscar Benjamin wrote:
>
>> On 21 April 2016 at 04:07, Steven D'Aprano <steve at pearwood.info> wrote:
>>> I want to group repeated items in a sequence. For example, I can group
>>> repeated sequences of a single item at a time using groupby:
>>>
>>>
>>> from itertools import groupby
>>> for key, group in groupby("AAAABBCDDEEEFFFF"):
>>>     group = list(group)
>>>     print(key, "count =", len(group))
>>>
>>>
>>> outputs:
>>>
>>> A count = 4
>>> B count = 2
>>> C count = 1
>>> D count = 2
>>> E count = 3
>>> F count = 4
>>>
>>>
>>> Now I want to group subsequences. For example, I have:
>>>
>>> "ABCABCABCDEABCDEFABCABCABCB"
>>>
>>> and I want to group it into repeating subsequences. I can see two ways to
>>> group it:
>>>
>>> ABC ABC ABCDE ABCDE F ABC ABC ABC B
>>
>> There are some algorithms (helpfully shown in Python) here:
>>
>> https://en.wikipedia.org/wiki/Cycle_detection
>
> It's not necessarily a cycle though. Consider a sequence of function calls:
>
> def f(x):
>     return g(x)
>
> def g(x):
>     if x < 7:
>         return h(x)
>     elif x < 50:
>         return g(x//2)
>     else:
>         return x+f(x-1)
>
> def h(x):
>     raise ValueError  # oops, a bug
>
>
> and a function call f(54). That will result in the chain of calls:
>
> f(54) -> g(54) -> f(53) -> g(53) -> f(52) -> g(52) -> f(51) -> g(51) ->
> f(50) -> g(50) -> g(25) -> g(12) -> g(6) -> h(6) raises
>
> So you have that almost-cycle f calls g calls f, but it isn't periodic
> because you break out of it. I'd still like to detect the repeated f->g
> calls.

It doesn't matter that you break out of it. It's periodic for some
part and the algorithms listed on that page can find the cycle. In the
recursive stack overflow case what you'll usually have is

1) A few frames leading up to the start of recursion
2) A long repetitive sequence of frames
3) A few frames at the end showing how the exception was ultimately triggered.

You just need to find the cycle that makes that big long sequence.

--
Oscar



More information about the Python-list mailing list