diff lists

Tim Peters tim.one at home.com
Thu Mar 29 18:28:52 EST 2001


[Michael Hudson]
> Tangentially, does anyone know of any good algorithms for "edit
> distance" between two sequences?  E.g. if I have
>
> "abcdef"
>
> and want to get to
>
> "abQUACKcde"
>
> I want to get the answer back "insert 'QUACK' at position 3 and delete
> a character at position 11".

See std library module difflib.py (new in 2.1).  This packages the
SequenceMatcher class from the heart of the Tools/scripts/ndiff.py tool for
easy reuse.

Python 2.1b2 (#12, Mar 23 2001, 14:01:30) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
IDLE 0.8 -- press F1 for help
>>> import difflib
>>> a = "abcdef"
>>> b = "abQUACKcde"
>>> m = difflib.SequenceMatcher(None, a, b)
>>> for tag, i1, i2, j1, j2 in m.get_opcodes():
	print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" %
                   (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))

  equal a[0:2] (ab) b[0:2] (ab)
 insert a[2:2] () b[2:7] (QUACK)
  equal a[2:5] (cde) b[7:10] (cde)
 delete a[5:6] (f) b[10:10] ()
>>>

It does not attempt to produce minimal edit sequences, but does attempt to
produce edit sequences that "look right" to people (read the comments in
ndiff.py for more on that).

> "Good" means "pretty quick", here.

For suitably large values of "pretty", sure <wink>.

not-for-all-though-ly y'rs  - tim





More information about the Python-list mailing list