Detecting repeated subsequences of identical items

Alain Ketterlin alain at universite-de-strasbourg.fr.invalid
Thu Apr 21 03:25:16 EDT 2016


Steven D'Aprano <steve at pearwood.info> writes:

> I want to group repeated items in a sequence. For example, I can group
> repeated sequences of a single item at a time using groupby:
[...]
> Now I want to group subsequences. For example, I have:
>
> "ABCABCABCDEABCDEFABCABCABCB"
>
> and I want to group it into repeating subsequences. I can see two ways to
> group it:
>
> ABC ABC ABCDE ABCDE F ABC ABC ABC B
[...]
> or:
>
> ABC ABC ABC D E A B C D E F ABC ABC ABC B
[...]
> How can I do this? Does this problem have a standard name and/or solution?

Looks like a tough one. I don't think it has a name (I'm not even sure
to be able to formally define it). Lets say it is an instance of
"longest repeating substring"
(https://en.wikipedia.org/wiki/Longest_repeated_substring_problem --
which really does not say much).

Anyway, it looks like a job for a suffix trees.

Depending on what you are after, you may also be interested in the
sequitur algorithm (http://www.sequitur.info/).

-- Alain.



More information about the Python-list mailing list