Detecting repeated subsequences of identical items

Steven D'Aprano steve at pearwood.info
Wed Apr 20 23:07:36 EDT 2016


I want to group repeated items in a sequence. For example, I can group
repeated sequences of a single item at a time using groupby:


from itertools import groupby
for key, group in groupby("AAAABBCDDEEEFFFF"):
    group = list(group)
    print(key, "count =", len(group))


outputs:

A count = 4
B count = 2
C count = 1
D count = 2
E count = 3
F count = 4


Now I want to group subsequences. For example, I have:

"ABCABCABCDEABCDEFABCABCABCB"

and I want to group it into repeating subsequences. I can see two ways to
group it:

ABC ABC ABCDE ABCDE F ABC ABC ABC B

giving counts:

(ABC) count = 2
(ABCDE) count = 2
F count = 1
(ABC) count = 3
B repeats 1 time


or:

ABC ABC ABC D E A B C D E F ABC ABC ABC B

giving counts:

(ABC) count = 3
D count = 1
E count = 1
A count = 1
B count = 1
C count = 1
D count = 1
E count = 1
F count = 1
(ABC) count = 3
B count = 1



How can I do this? Does this problem have a standard name and/or solution?




-- 
Steven




More information about the Python-list mailing list