How to compute a delta: the difference between lists of strings

J. Mwebaze jmwebaze at gmail.com
Sat May 5 09:13:23 EDT 2012


thank Chris..

On Sat, May 5, 2012 at 2:39 PM, Chris Angelico <rosuav at gmail.com> wrote:k

> On Sat, May 5, 2012 at 10:12 PM, J. Mwebaze <jmwebaze at gmail.com> wrote:
> > This is out of curiosity, i know this can be done with python diffllib
> > module, but been figuring out how to compute the delta, Consider two
> lists
> > below.
> >
> > s1 = ['e', 'f', 'g', 'A', 'B', 'C', 'D', 'C']
> > s2 =['e', 'A', 'B', 'f', 'g', 'C', 'D', 'z']
> >
> > This is the result should be
> >
> > ['  e', '+ A', '+ B', '  f', '  g', '- A', '- B', '  C', '  D', '- C', '+
> > z']
> >
> > ideas on how to approach this.. ?
>
> Here's a simple algorithm that produces sorta-mostly-reasonable
> results most of the time:
>
> Set your current-position-index to the beginning of each lists. (Call
> them 'pos1' and 'pos2'.)
> If s1[pos1] and s2[pos2] are identical, match - increment and iterate.
> Otherwise, increment pos2 until either it matches pos1 or you reach
> the end of s2.
> If you find a match, report the insertion into s2, increment both
> pointers past the match, and carry on.
> If you hit the end of s2, increment pos1 once and report an insertion
> into s1, then go back to looking for a match.
>
> It helps to append a sentinel to each list; that way, you don't need
> to separately check for additional content at the end of either list
> (as the sentinel won't match any of the strings).
>
> This is the algorithm I used for writing a simple file diff tool ages
> ago. It's not as good as diff(1), but it was fun to do the exercise.
> It's quite inefficient at handling large insertions into s1, and needs
> to be modified for most file handling (for instance, requiring a
> 2-line or 3-line rematch after a difference, to avoid matching on
> blank lines), but it's a basis.
>
> Producing beautiful or minimal diffs is a more complex task. :)
>
> ChrisA
> --
> http://mail.python.org/mailman/listinfo/python-list
>



-- 
*Mob UG: +256 (0) 70 1735800 | NL +31 (0) 6 852 841 38 | Gtalk: jmwebaze |
skype: mwebazej | URL: www.astro.rug.nl/~jmwebaze

/* Life runs on code */*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20120505/6e7e9a04/attachment-0001.html>


More information about the Python-list mailing list