difflib

Mikkel Rasmussen footech at get2net.dk
Mon Apr 23 05:13:19 EDT 2001


Tim Peters <tim.one at home.com> wrote in message
news:mailman.987979892.8018.python-list at python.org...
> [Mikkel Rasmussen]
> > Has anybody got any references for the algorithm used in difflib.
>
> Nope.  It's an algorithm I dreamed up in the early 80's, while at Cray
> Research, writing a file-differencing tool.  I've carried it with me ever
> since, reimplementing it in at least a dozen languages along the way.  The
> basic idea popped up when staring at some incomprehensible "diff" output,
and
> wondering "hmmmm -- what if diff restricted itself to *contiguous*
> subsequences?  and then what if it didn't synch up on junk either?".
That's
> about it.  You can find more discussion in the comments in module ndiff.py
> (difflib's SequenceMatcher class started its life in ndiff.py years ago;
it
> simply got split out into its own module for 2.1).
>
> When Eric Raymond mentioned the Ratcliff-Obershelp algorithm on
Python-Dev,
> it was news to me, and a Google search quickly turned up a C
implementation
> of that.  It was obviously the same basic algorithm, except without the
"skip
> junk" gimmick, and at least the implementations I found were much less
> efficient than SequenceMatcher (this is one of those happy instances where
my
> Python code runs *faster* than previous C, Fortran and Pascal
> implementations:  I had the *idea* of chaining together hash lists to
speed
> the search a long time ago, but never had the patience to *implement* that
> before recoding in Python -- it's easy in Python).
>

I have now tested the difflib algorithm and my own algorithm. The difflib is
about twice as fast as my own (which is not optimised in any way). They
return near identical results. It will be interesting to look closer at the
difflib algorithm.


Mikkel Rasmussen

NB: My first impression of IDLE 0.8 isn't good. The nice possibility to
delete previously output text has gone, and some of the specified short cuts
does not work on a Danish keyboard. The old short cuts work though.





More information about the Python-list mailing list