Minimal diffs in difflib?
Magnus Lie Hetland
mlh at vier.idi.ntnu.no
Thu Jan 3 13:05:02 EST 2002
When asking about using difflib to implement version control, I was
pointed to the get_opcodes method of SequenceMatcher by Tim Peters,
and that was indeed helpful. And after looking a bit at difflib, I
think the "intuitive" deltas computed are very nice for showing users
the difference between one revision and the previous one.
However, I think it would be nice to be able to compute minimal deltas
too, to increase the compression of the repository, and so I started
wondering... Would it be interesting to include the Levenshtein
algorithm in difflib as well? (Or possibly add a separate module or
something?) The basic algorithm is very small/simple[1], so adding it
wouldn't be much of a burden to the standard lib, would it? (And PHP
has it, so why shouldn't we? <wink>)
On the other hand, even though this might be a useful/natural
algorithm to have in a diff lib, I may not save that much space with
it... I guess I had better do some experimenting. Perhaps using zlib
or gzlib is better <wink>
[1] http://www.hetland.org/python/distance.py
Note: This implementation only computes the distance, not the delta,
but the change needed is minimal.
--
Magnus Lie Hetland The Anygui Project
http://hetland.org http://anygui.org
More information about the Python-list
mailing list