comparing two lists, ndiff performance

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Wed Jan 30 09:01:13 EST 2008


On 29 ene, 22:47, Zbigniew Braniecki <zbigniew.branie... at gmail.com>
wrote:

> The new one is of course much better and cleaner (the old one is
> bloated), but I'm wondering if there is a faster way to compare two
> lists and find out what was added, what was removed, what was changed.
> I can simply iterate through two lists because I need to keep an order
> (so it's important that the removed line is after the 3 line which was
> not changed etc.)
>
> ndiff plays well here, but it seems to be extremely slow (1000
> iterations of diffToObject takes 10 sec, 7sec of this is in ndiff).

ndiff does a quadratic process: first determines matching lines using
a SequenceMatcher, then looks for near-matching lines and for each
pair, compares them using another SequenceMatcher.
You don't appear to be interested in what changed inside a line, just
that it changed, so a simple SequenceMatcher would be enough.
Output from SequenceMatcher is quite different than ndiff, but you'd
have to reimplement the _compareLists method only.

--
Gabriel Genellina



More information about the Python-list mailing list