DiffLib Question

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Sun May 6 02:24:31 EDT 2007


En Fri, 04 May 2007 06:46:44 -0300, whitewave <fruels at gmail.com> escribió:

> Thanks! It works fine but I was wondering why the result isn't
> consistent. I am comparing two huge documents with several paragraphs
> in it. Some parts in the paragraph returns the diff perfectly but
> others aren't. I am confused.

Differ objects do a two-level diff; depending on what kind of differences  
you are interested in, you feed it with different things.
If the "line" concept is important to you (that is, you want to see which  
"lines" were added, removed or modified), then feed the Differ with a  
sequence of lines (file.readlines() would be fine).
This way, if someone inserts a few words inside a paragraph and the  
remaining lines have to be reflushed, you'll see many changes from words  
that were at end of lines now moving to the start of next line.
If you are more concerned about "paragraphs" and words, feed the Differ  
with a sequence of "paragraphs". Maybe your editor can handle it; assuming  
a paragraph ends with two linefeeds, you can get a list of paragraphs in  
Python using file.read().split("\n\n").
A third alternative would be to consider the text as absolutely plain, and  
just feed Differ with file.read(), as menctioned in an earlier post.

-- 
Gabriel Genellina



More information about the Python-list mailing list