DiffLib Question
whitewave
fruels at gmail.com
Fri May 4 05:46:44 EDT 2007
> Usually, Differ receives two sequences of lines, being each line a
> sequence of characters (strings). It uses a SequenceMatcher to compare
> lines; the linejunk argument is used to ignore certain lines. For each
> pair of similar lines, it uses another SequenceMatcher to compare
> characters inside lines; the charjunk is used to ignore characters.
> As you are feeding Differ with a single string (not a list of text lines),
> the "lines" it sees are just characters. To ignore whitespace and
> newlines, in this case one should use the linejunk argument:
>
> def ignore_ws_nl(c):
> return c in " \t\n\r"
>
> a =difflib.Differ(linejunk=ignore_ws_nl).compare(d1,d2)
> dif = list(a)
> print ''.join(dif)
>
> I n a d d i t i o n , t h e c o n s i d e
> r e
> d p r o b l e m d o e s n o t h a v e
> a m
> e a n i n g f u l t r a d i t i o n a l t y
> p e
> o f- +
> a d j o i n t-
> + p r o b l e m e v e n f o r t h e s i
> m p
> l e f o r m s o f t h e d i f f e r e n t
> i a
> l e q u a t i o n a n d t h e n o n l o
> c a l
> c o n d i t i o n s . D u e- +
> t o t h e s e f a c t s , s o m e s e r
> i o
> u s d i f f i c u l t i e s a r i s e i n
> t h
> e a p p l i c a t i o n o f t h e c l a
> s s i
> c a l m e t h o d s t o s u c h a- +
> p r o b l e m .+
>
Thanks! It works fine but I was wondering why the result isn't
consistent. I am comparing two huge documents with several paragraphs
in it. Some parts in the paragraph returns the diff perfectly but
others aren't. I am confused.
Thanks.
Jen
More information about the Python-list
mailing list