[issue26150] SequenceMatcher's algorithm is not correct

Tue Jan 19 06:06:38 EST 2016

New submission from Владислав Александрович:

For strings 'aaaaaa', 'aabaaa' SequenceMatcher's algorithm finds only common substring 'aaa', while well-known classic LCS algorithm: http://www.geeksforgeeks.org/printing-longest-common-subsequence/ finds 'aa' and 'aaa'.

Is it the price for "best case time is linear", as mentioned in difflib's documentation? Are there any other reasons not to implement classic LCS algorith (e.g. memory limits?)? If no, maybe it will be usefull to create subclass StrictSequenceMatcher?

----------
messages: 258582
nosy: Владислав Александрович
priority: normal
severity: normal
status: open
title: SequenceMatcher's algorithm is not correct
type: behavior
versions: Python 3.5

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue26150>
_______________________________________