extract certain values from file with re

johnzenger at gmail.com johnzenger at gmail.com
Fri Oct 6 11:47:57 EDT 2006


Can you safely assume that the lines you want to extract all contain
numbers, and that the lines you do not wish to extract do not contain
numbers?

If so, you could just use the Linux grep utility:  "grep '[0123456789]'
filename"

Or, in Python:

import re
inf = file("your-filename-here.txt")
outf = file("result-file.txt","w")
digits = re.compile("\d")

for line in inf:
   if digits.search(line): outf.write(line)
outf.close()
inf.close()

As for your "more difficult" file, take a look at the CSV module.  I
think that by changing the delimiter from a comma to a |, you will be
95% of the way to your goal.

Fabian Braennstroem wrote:
> Hi,
>
> I would like to remove certain lines from a log files. I had
> some sed/awk scripts for this, but now, I want to use python
> with its re module for this task.
>
> Actually, I have two different log files. The first file looks
> like:
>
>    ...
>    'some text'
>    ...
>
>        ITER I----------------- GLOBAL ABSOLUTE RESIDUAL -----------------I  I------------ FIELD VALUES AT MONITORING LOCATION  ----------I
>         NO    UMOM     VMOM     WMOM     MASS     T EN     DISS     ENTH       U        V        W        P       TE       ED        T
>          1  9.70E-02 8.61E-02 9.85E-02 1.00E+00 1.61E+01 7.65E+04 0.00E+00  1.04E-01-8.61E-04 3.49E-02 1.38E-03 7.51E-05 1.63E-05 2.00E+01
>          2  3.71E-02 3.07E-02 3.57E-02 1.00E+00 3.58E-01 6.55E-01 0.00E+00  1.08E-01-1.96E-03 4.98E-02 7.11E-04 1.70E-04 4.52E-05 2.00E+01
>          3  2.64E-02 1.99E-02 2.40E-02 1.00E+00 1.85E-01 3.75E-01 0.00E+00  1.17E-01-3.27E-03 6.07E-02 4.02E-04 4.15E-04 1.38E-04 2.00E+01
>          4  2.18E-02 1.52E-02 1.92E-02 1.00E+00 1.21E-01 2.53E-01 0.00E+00  1.23E-01-4.85E-03 6.77E-02 1.96E-05 9.01E-04 3.88E-04 2.00E+01
>          5  1.91E-02 1.27E-02 1.70E-02 1.00E+00 8.99E-02 1.82E-01 0.00E+00  1.42E-01-6.61E-03 7.65E-02 1.78E-04 1.70E-03 9.36E-04 2.00E+01
>    ...
>    ...
>    ...
>
>       2997  3.77E-04 2.89E-04 3.05E-04 2.71E-02 5.66E-04 6.28E-04 0.00E+00 -3.02E-01 3.56E-02-7.97E-02-7.11E-02 4.08E-02 1.86E-01 2.00E+01
>       2998  3.77E-04 2.89E-04 3.05E-04 2.71E-02 5.65E-04 6.26E-04 0.00E+00 -3.02E-01 3.63E-02-8.01E-02-7.10E-02 4.02E-02 1.83E-01 2.00E+01
>       2999  3.76E-04 2.89E-04 3.05E-04 2.70E-02 5.64E-04 6.26E-04 0.00E+00 -3.02E-01 3.69E-02-8.04E-02-7.10E-02 3.96E-02 1.81E-01 2.00E+01
>       3000  3.78E-04 2.91E-04 3.07E-04 2.74E-02 5.64E-04 6.26E-04 0.00E+00 -3.01E-01 3.75E-02-8.07E-02-7.09E-02 3.91E-02 1.78E-01 2.00E+01
>     &&&&&&  --------------------------------------------------------------  ----
>
>    ....
>    'some text'
>    ....
>
> I actually want to extract the lines with the numbers, write
> them to a file and finally use gnuplot for plotting them. A
> nicer and more python way would be to extract those numbers,
> write them into an array according to their column and plot
> those using the gnuplot or matplotlib module :-)
>
> Unfortunately, I am pretty new to the re module and tried
> the following so far:
>
>
>   import re
>   pat = re.compile('\ \ \ NO.*?&&&&&&', re.DOTALL)
>   print re.sub(pat, '', open('log_star_orig').read())
>
>
> but this works just the other way around, which means that
> the original log file is printed without the number part. So
> the next step would be to delete the part from the first
> line to '\ \ \ \ NO' and the part from '&&&&&&' to the end,
> but I do not know how to address the first and last line!?
>
> Would be nice, if you can give me a hint and especially
> interesting would it be, when you have an idea, how I can
> put those columns in arrays, so I can plot them right away!
>
>
> A more difficult log file looks like:
>
>  ======================================================================
>  OUTER LOOP ITERATION =    1                     CPU SECONDS = 2.40E+01
>  ----------------------------------------------------------------------
>  |       Equation       | Rate | RMS Res | Max Res |  Linear Solution |
>  +----------------------+------+---------+---------+------------------+
>  | U-Mom                | 0.00 | 1.0E-02 | 5.0E-01 |       4.9E-03  OK|
>  | V-Mom                | 0.00 | 2.4E-14 | 5.6E-13 |       3.8E+09  ok|
>  | W-Mom                | 0.00 | 2.5E-14 | 8.2E-13 |       8.3E+09  ok|
>  | P-Mass               | 0.00 | 1.1E-02 | 3.4E-01 |  8.9  2.7E-02  OK|
>  +----------------------+------+---------+---------+------------------+
>  | K-TurbKE             | 0.00 | 1.8E+00 | 1.8E+00 |  5.8  2.2E-08  OK|
>  | E-Diss.K             | 0.00 | 1.9E+00 | 2.0E+00 | 12.4  2.2E-08  OK|
>  +----------------------+------+---------+---------+------------------+
>
>  ======================================================================
>  OUTER LOOP ITERATION =    2                     CPU SECONDS = 8.57E+01
>  ----------------------------------------------------------------------
>  |       Equation       | Rate | RMS Res | Max Res |  Linear Solution |
>  +----------------------+------+---------+---------+------------------+
>  | U-Mom                | 1.44 | 1.5E-02 | 5.3E-01 |       9.6E-03  OK|
>  | V-Mom                |99.99 | 1.1E-03 | 6.2E-02 |       5.7E-02  OK|
>  | W-Mom                |99.99 | 1.9E-03 | 6.0E-02 |       5.9E-02  OK|
>  | P-Mass               | 0.27 | 3.0E-03 | 2.0E-01 |  8.9  7.9E-02  OK|
>  +----------------------+------+---------+---------+------------------+
>  | K-TurbKE             | 0.03 | 5.4E-02 | 4.4E-01 |  5.8  2.9E-08  OK|
>  | E-Diss.K             | 0.05 | 8.9E-02 | 9.3E-01 | 12.4  2.6E-08  OK|
>  +----------------------+------+---------+---------+------------------+
>
>
>
> ...
> ...
> ...
>
>
>  ======================================================================
>  OUTER LOOP ITERATION =  416                     CPU SECONDS = 2.28E+04
>  ----------------------------------------------------------------------
>  |       Equation       | Rate | RMS Res | Max Res |  Linear Solution |
>  +----------------------+------+---------+---------+------------------+
>  | U-Mom                | 0.96 | 1.8E-04 | 5.8E-03 |       1.8E-02  OK|
>  | V-Mom                | 0.98 | 3.6E-05 | 1.5E-03 |       4.4E-02  OK|
>  | W-Mom                | 0.99 | 4.5E-05 | 2.1E-03 |       4.3E-02  OK|
>  | P-Mass               | 0.96 | 8.3E-06 | 3.0E-04 | 12.9  4.0E-02  OK|
>  +----------------------+------+---------+---------+------------------+
>  | K-TurbKE             | 0.98 | 1.5E-03 | 3.0E-02 |  5.7  2.5E-06  OK|
>  | E-Diss.K             | 0.97 | 4.2E-04 | 1.1E-02 | 12.3  3.9E-08  OK|
>  +----------------------+------+---------+---------+------------------+
>
>
> With my sed/awk/grep/gnuplot script I would extract the
> values in the 'U-Mom' row using grep and print a certain
> column (e.g. 'Max Res') to a file and print it with gnuplot.
> Maybe I have to remove those '|' using sed before...
> Do you have an idea, how I can do this completely using
> python?
> 
> Thanks for your help!
> 
> 
> Greetings!
>  Fabian




More information about the Python-list mailing list