ignoring or replacing white lines in a diff

Adriaan Renting renting at astron.nl
Fri Jan 15 04:44:32 EST 2016


Thanks for the various people that provided help.

Peter Otten provided me with a working solution:

I had to split the "-I '^[[:space:]]*$'" into two commands.

      cmd   = ["diff", "-w", "-I", r"^[[:space:]]*$", "./xml/%s.xml" %
name, "test.xml"]
      p     = subprocess.Popen(cmd, stdin=open('/dev/null'),
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
      logs  = p.communicate()
      diffs = logs[0].splitlines() #stdout

This also works:

      cmd   = ["diff -w -I '^[[:space:]]*$' ./xml/%s.xml test.xml" %
name]
      p     = subprocess.Popen(cmd, stdin=open('/dev/null'),
stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
      logs  = p.communicate()
      diffs = logs[0].splitlines() #stdout

As to other comments:

- I've found that stdin=open('/dev/null') is essential in
subprocess.Popen to make it work from automated (headless) scripts.
- print line, did remove the extra newlines, but didn't get rid of the
blank lines.
- making it a raw string with r"-I '^[[:space:]]*$'" made no difference
(also tried r"-I ^[[:space:]]*$")
- I didn't investigate difflib further but will keep it in mind for the
future.

Thank you for your help,

Adriaan.


Adriaan Renting        | Email: renting at astron.nl
Software Engineer Radio Observatory
ASTRON                 | Phone: +31 521 595 100 (797 direct)
P.O. Box 2             | GSM:   +31 6 24 25 17 28
NL-7990 AA Dwingeloo   | FAX:   +31 521 595 101
The Netherlands        | Web: http://www.astron.nl/~renting/



>>> On 14-1-2016 at 22:05, Peter Otten <__peter__ at web.de> wrote: 
> Adriaan Renting wrote:
> 
>> 
>> Maybe someone here has a clue what is going wrong here? Any help is
>> appreciated.
>> 
>> I'm writing a regression test for a module that generates XML.
>> 
>> I'm using diff to compare the results with a pregenerated one from
an
>> earlier version.
>> 
>> I'm running into two problems:
>> 
>> The diff doesn't seem to behave properly with the -B option. (diff
(GNU
>> diffutils) 2.8.1 on OSX 10.9)
>> 
>> Replacing -B with -I '^[[:space:]]*$' fixes it on the command line,
>> which should be exactly the same according to:
>> 
>
http://www.gnu.org/software/diffutils/manual/html_node/Blank-Lines.html#Blank-L
> ines
>> 
>> (for Python problem continue below)
>> 
>> MacRenting 21:00-159> diff -w -B test.xml xml/Ticket_6923.xml
>> 3,5c3,5
>> <   <version>2.15.0</version>
>> <   <template version="2.15.0" author="Alwin de Jong,Adriaan
Renting"
>> changedBy="Adriaan Renting">
>> <   <description>XML Template generator version
2.15.0</description>
>> ---
>>>           <version>2.6.0</version>
>>>           <template version="2.6.0" author="Alwin de Jong"
>> changedBy="Alwin de Jong">
>>>           <description>XML Template generator version
>> 2.6.0</description>
>> 113d112
>> <
>> 163d161
>> <
>> 213d210
>> <
>> 258d254
>> <
>> 369d364
>> <
>> 419d413
>> <
>> 469d462
>> <
>> 514d506
>> <
>> 625d616
>> <
>> 675d665
>> <
>> 725d714
>> <
>> 770d758
>> <
>> 881d868
>> <
>> 931d917
>> <
>> 981d966
>> <
>> 1026d1010
>> <
>> 1137d1120
>> <
>> 1187d1169
>> <
>> 1237d1218
>> <
>> 1282d1262
>> <
>> 
>> /Users/renting/src/CEP4-DevelopClusterModel-Story-Task8432-
> SAS/XML_generator/test
>> MacRenting 21:00-160> diff -w -I '^[[:space:]]*$' test.xml
>> xml/Ticket_6923.xml
>> 3,5c3,5
>> <   <version>2.15.0</version>
>> <   <template version="2.15.0" author="Alwin de Jong,Adriaan
Renting"
>> changedBy="Adriaan Renting">
>> <   <description>XML Template generator version
2.15.0</description>
>> ---
>>>           <version>2.6.0</version>
>>>           <template version="2.6.0" author="Alwin de Jong"
>> changedBy="Alwin de Jong">
>>>           <description>XML Template generator version
>> 2.6.0</description>
>> 
>> 
>> Now I try to use this in Python:
>> 
>>       cmd   = ["diff", "-w", "-I '^[[:space:]]*$'", "./xml/%s.xml"
%
>> name, "test.xml"]
> 
> Instead of 
> 
> ..., "-I '^[[:space:]]*$'", ...
> 
> try two separate arguments
> 
> ..., "-I", "^[[:space:]]*$", ...
> 
>>       ## -w ignores differences in whitespace
>>       ## -I '^[[:space:]]*$' because -B doesn't work for blank
lines
>> (on OSX?)
>>       p     = subprocess.Popen(cmd, stdin=open('/dev/null'),
>> stdout=subprocess.PIPE, stderr=subprocess.PIPE)
> 
> I don't think you need to specify stdin.
> 
>>       logs  = p.communicate()
>>       diffs = logs[0].splitlines() #stdout
>>       print "diff reply was %i lines long" % len(diffs)
>> 
>> This doesn't work. I've tried escaping the various bits, like the *
and
>> $, even though with single quotes that should not be needed.
>> 
>> I tried first removing the blank lines from the file:
>> 
>>       import fileinput
>>       for line in fileinput.FileInput("test.xml",inplace=1):
>>         if line.rstrip():
>>           print line
>> 
>> This makes it worse, as it adds and empty line for each line in the
>> file.
> 
> Add a trailing comma to suppress the newline:
> 
> print line,
> 
>> I've tried various other options. The only thing I can think of, is
>> ditching Python and trying to rewrite the whole script in Bash.
>> (It's quite complicated, as it loops over various things and does
some
>> pretty output in between and I'm not very fluent in Bash)
>> 
>> Any suggestions?
> 
> Whatever floats your boat ;)




More information about the Python-list mailing list