Minimal diffs in difflib?

Magnus Lie Hetland mlh at vier.idi.ntnu.no
Thu Jan 3 13:05:02 EST 2002


When asking about using difflib to implement version control, I was
pointed to the get_opcodes method of SequenceMatcher by Tim Peters,
and that was indeed helpful. And after looking a bit at difflib, I
think the "intuitive" deltas computed are very nice for showing users
the difference between one revision and the previous one.

However, I think it would be nice to be able to compute minimal deltas
too, to increase the compression of the repository, and so I started
wondering... Would it be interesting to include the Levenshtein
algorithm in difflib as well? (Or possibly add a separate module or
something?) The basic algorithm is very small/simple[1], so adding it
wouldn't be much of a burden to the standard lib, would it? (And PHP
has it, so why shouldn't we? <wink>)

On the other hand, even though this might be a useful/natural
algorithm to have in a diff lib, I may not save that much space with
it... I guess I had better do some experimenting. Perhaps using zlib
or gzlib is better <wink>

[1] http://www.hetland.org/python/distance.py

Note: This implementation only computes the distance, not the delta,
but the change needed is minimal.

-- 
Magnus Lie Hetland                                  The Anygui Project
http://hetland.org                                  http://anygui.org



More information about the Python-list mailing list