Efficiently determine where documents differ

Richard richardbp at gmail.com
Tue Jan 5 04:43:12 EST 2010


On Jan 5, 9:46 am, "Gabriel Genellina" <gagsl-... at yahoo.com.ar> wrote:
> En Mon, 04 Jan 2010 19:04:12 -0300, Richard <richar... at gmail.com> escribió:
>
> > I have been using the difflib library to find where 2 large HTML
> > documents differ. The Differ().compare() method does this, but it is
> > very slow - atleast 100x slower than the unix diff command.
>
> Differ compares sequences of lines *and* lines as sequences of characters  
> to provide intra-line differences. The diff command only processes lines.
> If you aren't interested in intra-line differences, use a SequenceMatcher  
> instead. Or, invoke the diff command using   subprocess.Popen +  
> communicate.
>
> --
> Gabriel Genellina


thank you very much Gabriel! Passing a list of the document lines
makes the efficiency comparable to the diff command.
Richard



More information about the Python-list mailing list