extract text from log file using re

Paul McGuire ptmcg at austin.rr.com
Fri Sep 14 09:13:50 EDT 2007


On Sep 13, 4:09 pm, Fabian Braennstroem <f.braennstr... at gmx.de> wrote:
> Hi,
>
> I would like to delete a region on a log file which has this
> kind of structure:
>

How about just searching for what you want.  Here are two approaches,
one using pyparsing, one using the batteries-included re module.

-- Paul


# -*- coding: iso-8859-15 -*-
data = """\
   498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01  499
   499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01  499
reversed flow in 1 faces on pressure-outlet 35.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/
fluent-050­0.cas"...
 5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
   20004 triangular wall faces, zone 31, binary.
    1104 mixed velocity-inlet faces, zone 32, binary.
  133638 triangular wall faces, zone 33, binary.
   14529 triangular wall faces, zone 34, binary.
    1350 mixed pressure-outlet faces, zone 35, binary.
   11714 mixed wall faces, zone 36, binary.
 1232141 nodes, binary.
 1232141 node flags, binary.
Done.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/
fluent-050­0.dat"...
Done.


   500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04 8.3913e-04
3.8545e-03 1.3315e-01 11:14:10  500


 reversed flow in 2 faces on pressure-outlet 35.
   501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01  499
"""

print "search using pyparsing"
from pyparsing import *

integer = Word(nums).setParseAction(lambda t:int(t[0]))
scireal = Regex(r"\d*\.\d*e\-\d\d").setParseAction(lambda
t:float(t[0]))
time = Regex(r"\d\d:\d\d:\d\d")

logline = (integer("testNum") +
           And([scireal]*7)("data") +
           time("testTime") +
           integer("result"))

for tRes in logline.searchString(data):
    print "Test#:",tRes.testNum
    print "Data:", tRes.data
    print "Time:", tRes.testTime
    print "Output:", tRes.result
    print

print
print "search using re's"
import re
integer = r"\d*"
scireal = r"\d*\.\d*e\-\d\d"
time = r"\d\d:\d\d:\d\d"
ws = r"\s*"

namedField = lambda reStr,n: "(?P<%s>%s)" % (n,reStr)
logline = re.compile(
            namedField(integer,"testNum") + ws +
            namedField( (scireal+ws)*7,"data" ) +
            namedField(time,"testTime") + ws +
            namedField(integer,"result") )
for m in logline.finditer(data):
    print "Test#:",int(m.group("testNum"))
    print "Data:", map(float,m.group("data").split())
    print "Time:", m.group("testTime")
    print "Output:", int(m.group("result"))
    print

Prints:

search using pyparsing
Test#: 498
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 499
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 500
Data: [0.0010049, 0.00024630000000000002, 9.8394999999999996e-005,
0.00014865000000000001, 0.00083913, 0.0038544999999999999,
0.13314999999999999]
Time: 11:14:10
Output: 500

Test#: 501
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499


search using re's
Test#: 498
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 499
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 500
Data: [0.0010049, 0.00024630000000000002, 9.8394999999999996e-005,
0.00014865000000000001, 0.00083913, 0.0038544999999999999,
0.13314999999999999]
Time: 11:14:10
Output: 500

Test#: 501
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499





More information about the Python-list mailing list