How to ignore white space changes using difflib?

Grant Edwards invalid at invalid
Wed Apr 8 11:56:03 EDT 2009


I'm trying to use difflib to compare strings ignoring changes
to white-space (space/tab).  According to the doc page, you can
do this by specifying a "charjunk" parameter to filter out
characters:

   charjunk: A function that accepts a character (a string of
   length 1), and returns if the character is junk, or false if
   not. The default is module-level function
   IS_CHARACTER_JUNK(), which filters out whitespace characters
   (a blank or tab; note: bad idea to include newline in
   this!).

But, I simply can't get it to work.  I get exactly the same
results with or without white-space filtering:

Here's my test program:

   #!/usr/bin/python
   import difflib
   
   d1 = ["this is string one","this is string   two","this is string three"]
   d2 = ["this is string one","this  is  string two","this is string three"]
   
   def iswhite(c):
       return c in " \t"
   
   print "--------------------no filtering--------------------"
   delta = difflib.ndiff(d1,d2)
   
   for line in delta:
       print line
   print "----------------------------------------------------"
   print
   print "--------------------IS_CHARACTER_JUNK--------------------"
   delta = difflib.ndiff(d1,d2,charjunk=difflib.IS_CHARACTER_JUNK)
   
   for line in delta:
       print line
   print "----------------------------------------------------"
   print
   print "--------------------iswhite--------------------"
   delta = difflib.ndiff(d1,d2,charjunk=iswhite)
   
   for line in delta:
       print line
   print "----------------------------------------------------"
   

And here's the output:

   --------------------no filtering--------------------
     this is string one
   - this is string   two
   ?                --
   
   + this  is  string two
   ?      +  +
   
     this is string three
   ----------------------------------------------------
   
   --------------------IS_CHARACTER_JUNK--------------------
     this is string one
   - this is string   two
   ?                --
   
   + this  is  string two
   ?      +  +
   
     this is string three
   ----------------------------------------------------
   
   --------------------iswhite--------------------
     this is string one
   - this is string   two
   ?                --
   
   + this  is  string two
   ?      +  +
   
     this is string three
   ----------------------------------------------------

What am I doing wrong?   

-- 
Grant Edwards                   grante             Yow! I'll show you MY
                                  at               telex number if you show me
                               visi.com            YOURS ...



More information about the Python-list mailing list