extract certain values from file with re

Fabian Braennstroem f.braennstroem at gmx.de
Fri Oct 6 05:51:04 EDT 2006


Hi,

I would like to remove certain lines from a log files. I had
some sed/awk scripts for this, but now, I want to use python
with its re module for this task. 

Actually, I have two different log files. The first file looks
like:

   ...
   'some text'
   ...
   
       ITER I----------------- GLOBAL ABSOLUTE RESIDUAL -----------------I  I------------ FIELD VALUES AT MONITORING LOCATION  ----------I
        NO    UMOM     VMOM     WMOM     MASS     T EN     DISS     ENTH       U        V        W        P       TE       ED        T
         1  9.70E-02 8.61E-02 9.85E-02 1.00E+00 1.61E+01 7.65E+04 0.00E+00  1.04E-01-8.61E-04 3.49E-02 1.38E-03 7.51E-05 1.63E-05 2.00E+01
         2  3.71E-02 3.07E-02 3.57E-02 1.00E+00 3.58E-01 6.55E-01 0.00E+00  1.08E-01-1.96E-03 4.98E-02 7.11E-04 1.70E-04 4.52E-05 2.00E+01
         3  2.64E-02 1.99E-02 2.40E-02 1.00E+00 1.85E-01 3.75E-01 0.00E+00  1.17E-01-3.27E-03 6.07E-02 4.02E-04 4.15E-04 1.38E-04 2.00E+01
         4  2.18E-02 1.52E-02 1.92E-02 1.00E+00 1.21E-01 2.53E-01 0.00E+00  1.23E-01-4.85E-03 6.77E-02 1.96E-05 9.01E-04 3.88E-04 2.00E+01
         5  1.91E-02 1.27E-02 1.70E-02 1.00E+00 8.99E-02 1.82E-01 0.00E+00  1.42E-01-6.61E-03 7.65E-02 1.78E-04 1.70E-03 9.36E-04 2.00E+01
   ...
   ...
   ...
   
      2997  3.77E-04 2.89E-04 3.05E-04 2.71E-02 5.66E-04 6.28E-04 0.00E+00 -3.02E-01 3.56E-02-7.97E-02-7.11E-02 4.08E-02 1.86E-01 2.00E+01
      2998  3.77E-04 2.89E-04 3.05E-04 2.71E-02 5.65E-04 6.26E-04 0.00E+00 -3.02E-01 3.63E-02-8.01E-02-7.10E-02 4.02E-02 1.83E-01 2.00E+01
      2999  3.76E-04 2.89E-04 3.05E-04 2.70E-02 5.64E-04 6.26E-04 0.00E+00 -3.02E-01 3.69E-02-8.04E-02-7.10E-02 3.96E-02 1.81E-01 2.00E+01
      3000  3.78E-04 2.91E-04 3.07E-04 2.74E-02 5.64E-04 6.26E-04 0.00E+00 -3.01E-01 3.75E-02-8.07E-02-7.09E-02 3.91E-02 1.78E-01 2.00E+01
    &&&&&&  --------------------------------------------------------------  ----
   
   ....
   'some text'
   ....

I actually want to extract the lines with the numbers, write
them to a file and finally use gnuplot for plotting them. A
nicer and more python way would be to extract those numbers,
write them into an array according to their column and plot
those using the gnuplot or matplotlib module :-)

Unfortunately, I am pretty new to the re module and tried
the following so far:


  import re
  pat = re.compile('\ \ \ NO.*?&&&&&&', re.DOTALL)
  print re.sub(pat, '', open('log_star_orig').read()) 
  

but this works just the other way around, which means that
the original log file is printed without the number part. So
the next step would be to delete the part from the first
line to '\ \ \ \ NO' and the part from '&&&&&&' to the end,
but I do not know how to address the first and last line!?

Would be nice, if you can give me a hint and especially
interesting would it be, when you have an idea, how I can
put those columns in arrays, so I can plot them right away!


A more difficult log file looks like:

 ======================================================================
 OUTER LOOP ITERATION =    1                     CPU SECONDS = 2.40E+01
 ----------------------------------------------------------------------
 |       Equation       | Rate | RMS Res | Max Res |  Linear Solution |
 +----------------------+------+---------+---------+------------------+
 | U-Mom                | 0.00 | 1.0E-02 | 5.0E-01 |       4.9E-03  OK|
 | V-Mom                | 0.00 | 2.4E-14 | 5.6E-13 |       3.8E+09  ok|
 | W-Mom                | 0.00 | 2.5E-14 | 8.2E-13 |       8.3E+09  ok|
 | P-Mass               | 0.00 | 1.1E-02 | 3.4E-01 |  8.9  2.7E-02  OK|
 +----------------------+------+---------+---------+------------------+
 | K-TurbKE             | 0.00 | 1.8E+00 | 1.8E+00 |  5.8  2.2E-08  OK|
 | E-Diss.K             | 0.00 | 1.9E+00 | 2.0E+00 | 12.4  2.2E-08  OK|
 +----------------------+------+---------+---------+------------------+

 ======================================================================
 OUTER LOOP ITERATION =    2                     CPU SECONDS = 8.57E+01
 ----------------------------------------------------------------------
 |       Equation       | Rate | RMS Res | Max Res |  Linear Solution |
 +----------------------+------+---------+---------+------------------+
 | U-Mom                | 1.44 | 1.5E-02 | 5.3E-01 |       9.6E-03  OK|
 | V-Mom                |99.99 | 1.1E-03 | 6.2E-02 |       5.7E-02  OK|
 | W-Mom                |99.99 | 1.9E-03 | 6.0E-02 |       5.9E-02  OK|
 | P-Mass               | 0.27 | 3.0E-03 | 2.0E-01 |  8.9  7.9E-02  OK|
 +----------------------+------+---------+---------+------------------+
 | K-TurbKE             | 0.03 | 5.4E-02 | 4.4E-01 |  5.8  2.9E-08  OK|
 | E-Diss.K             | 0.05 | 8.9E-02 | 9.3E-01 | 12.4  2.6E-08  OK|
 +----------------------+------+---------+---------+------------------+



...
...
...


 ======================================================================
 OUTER LOOP ITERATION =  416                     CPU SECONDS = 2.28E+04
 ----------------------------------------------------------------------
 |       Equation       | Rate | RMS Res | Max Res |  Linear Solution |
 +----------------------+------+---------+---------+------------------+
 | U-Mom                | 0.96 | 1.8E-04 | 5.8E-03 |       1.8E-02  OK|
 | V-Mom                | 0.98 | 3.6E-05 | 1.5E-03 |       4.4E-02  OK|
 | W-Mom                | 0.99 | 4.5E-05 | 2.1E-03 |       4.3E-02  OK|
 | P-Mass               | 0.96 | 8.3E-06 | 3.0E-04 | 12.9  4.0E-02  OK|
 +----------------------+------+---------+---------+------------------+
 | K-TurbKE             | 0.98 | 1.5E-03 | 3.0E-02 |  5.7  2.5E-06  OK|
 | E-Diss.K             | 0.97 | 4.2E-04 | 1.1E-02 | 12.3  3.9E-08  OK|
 +----------------------+------+---------+---------+------------------+


With my sed/awk/grep/gnuplot script I would extract the
values in the 'U-Mom' row using grep and print a certain
column (e.g. 'Max Res') to a file and print it with gnuplot.
Maybe I have to remove those '|' using sed before...
Do you have an idea, how I can do this completely using
python?

Thanks for your help!


Greetings!
 Fabian




More information about the Python-list mailing list