[issue1528074] difflib.SequenceMatcher.find_longest_match() wrong result

Jan report at bugs.python.org
Sat Feb 7 11:50:39 CET 2009


Jan <pfjan at yahoo.com.br> added the comment:

hi all,

just got bitten by this, so i took the time to reiterate the issue.

according to the docs:

http://docs.python.org/library/difflib.html

find_longest_match() should return the longest matching string:

"If isjunk was omitted or None, find_longest_match() returns (i, j, k)
such that a[i:i+k] is equal to b[j:j+k], where alo <= i <= i+k <= ahi
and blo <= j <= j+k <= bhi. For all (i', j', k') meeting those
conditions, the additional conditions k >= k', i <= i', and if i == i',
j <= j' are also met. In other words, of all maximal matching blocks,
return one that starts earliest in a, and of all those maximal matching
blocks that start earliest in a, return the one that starts earliest in b."

but after a couple of hours debugging i finally convinced myself that
the bug was in the library ... and i ended up here :) 

any ideas on how to work around this bug/feature, and just get the
longest matching string ? (from a normal/newbie user perspective, that
is, without patching the C++ library code and recompiling?)

from the comments (which i couldn't follow entirely), does it use some
concept of popularity that is not exposed by the API ? How is
"popularity" defined ?

many thanks!
- jan


ps.: using ubuntu's python 2.5.2

ps2.: and example of a string pair where the issue shows up:

s1='Floor Box SystemsFBS Floor Box Systems - Manufacturer &amp; supplier
of FBS floor boxes, electrical ... experience, FBS Floor Box Systems
continue ... raceways, floor box. ...www.floorboxsystems.com'

s2='FBS Floor Box SystemsFBS Floor Box Systems - Manufacturer &amp;
supplier of FBS floor boxes, electrical floor boxes, wood floor box,
concrete floor box, surface mount floor box, raised floor
...www.floorboxsystems.com'

----------
nosy: +janpf

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue1528074>
_______________________________________


More information about the Python-bugs-list mailing list