DiffLib Question

whitewave fruels at gmail.com
Fri May 4 05:46:44 EDT 2007


> Usually, Differ receives two sequences of lines, being each line a  
> sequence of characters (strings). It uses a SequenceMatcher to compare  
> lines; the linejunk argument is used to ignore certain lines. For each  
> pair of similar lines, it uses another SequenceMatcher to compare  
> characters inside lines; the charjunk is used to ignore characters.
> As you are feeding Differ with a single string (not a list of text lines),  
> the "lines" it sees are just characters. To ignore whitespace and  
> newlines, in this case one should use the linejunk argument:
>
> def ignore_ws_nl(c):
>    return c in " \t\n\r"
>
> a =difflib.Differ(linejunk=ignore_ws_nl).compare(d1,d2)
> dif = list(a)
> print ''.join(dif)
>
>    I  n     a  d  d  i  t  i  o  n  ,     t  h  e     c  o  n  s  i  d  e  
> r  e
> d     p  r  o  b  l  e  m     d  o  e  s     n  o  t     h  a  v  e      
> a     m
>   e  a  n  i  n  g  f  u  l     t  r  a  d  i  t  i  o  n  a  l     t  y  
> p  e
>    o  f-  +
>    a  d  j  o  i  n  t-
> +    p  r  o  b  l  e  m     e  v  e  n     f  o  r     t  h  e     s  i  
> m  p
> l  e     f  o  r  m  s     o  f     t  h  e     d  i  f  f  e  r  e  n  t  
> i  a
>   l     e  q  u  a  t  i  o  n     a  n  d     t  h  e     n  o  n  l  o  
> c  a  l
>       c  o  n  d  i  t  i  o  n  s  .     D  u  e-  +
>    t  o     t  h  e  s  e     f  a  c  t  s  ,     s  o  m  e     s  e  r  
> i  o
> u  s     d  i  f  f  i  c  u  l  t  i  e  s     a  r  i  s  e     i  n      
> t  h
>   e     a  p  p  l  i  c  a  t  i  o  n     o  f     t  h  e     c  l  a  
> s  s  i
>    c  a  l     m  e  t  h  o  d  s     t  o     s  u  c  h     a-  +
>    p  r  o  b  l  e  m  .+
>

Thanks! It works fine but I was wondering why the result isn't
consistent. I am comparing two huge documents with several paragraphs
in it. Some parts in the paragraph returns the diff perfectly but
others aren't. I am confused.

Thanks.
Jen




More information about the Python-list mailing list