extract text from log file using re
Paul McGuire
ptmcg at austin.rr.com
Fri Sep 14 09:13:50 EDT 2007
On Sep 13, 4:09 pm, Fabian Braennstroem <f.braennstr... at gmx.de> wrote:
> Hi,
>
> I would like to delete a region on a log file which has this
> kind of structure:
>
How about just searching for what you want. Here are two approaches,
one using pyparsing, one using the batteries-included re module.
-- Paul
# -*- coding: iso-8859-15 -*-
data = """\
498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01 499
499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01 499
reversed flow in 1 faces on pressure-outlet 35.
Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/
fluent-0500.cas"...
5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
20004 triangular wall faces, zone 31, binary.
1104 mixed velocity-inlet faces, zone 32, binary.
133638 triangular wall faces, zone 33, binary.
14529 triangular wall faces, zone 34, binary.
1350 mixed pressure-outlet faces, zone 35, binary.
11714 mixed wall faces, zone 36, binary.
1232141 nodes, binary.
1232141 node flags, binary.
Done.
Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/
fluent-0500.dat"...
Done.
500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04 8.3913e-04
3.8545e-03 1.3315e-01 11:14:10 500
reversed flow in 2 faces on pressure-outlet 35.
501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01 499
"""
print "search using pyparsing"
from pyparsing import *
integer = Word(nums).setParseAction(lambda t:int(t[0]))
scireal = Regex(r"\d*\.\d*e\-\d\d").setParseAction(lambda
t:float(t[0]))
time = Regex(r"\d\d:\d\d:\d\d")
logline = (integer("testNum") +
And([scireal]*7)("data") +
time("testTime") +
integer("result"))
for tRes in logline.searchString(data):
print "Test#:",tRes.testNum
print "Data:", tRes.data
print "Time:", tRes.testTime
print "Output:", tRes.result
print
print
print "search using re's"
import re
integer = r"\d*"
scireal = r"\d*\.\d*e\-\d\d"
time = r"\d\d:\d\d:\d\d"
ws = r"\s*"
namedField = lambda reStr,n: "(?P<%s>%s)" % (n,reStr)
logline = re.compile(
namedField(integer,"testNum") + ws +
namedField( (scireal+ws)*7,"data" ) +
namedField(time,"testTime") + ws +
namedField(integer,"result") )
for m in logline.finditer(data):
print "Test#:",int(m.group("testNum"))
print "Data:", map(float,m.group("data").split())
print "Time:", m.group("testTime")
print "Output:", int(m.group("result"))
print
Prints:
search using pyparsing
Test#: 498
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499
Test#: 499
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499
Test#: 500
Data: [0.0010049, 0.00024630000000000002, 9.8394999999999996e-005,
0.00014865000000000001, 0.00083913, 0.0038544999999999999,
0.13314999999999999]
Time: 11:14:10
Output: 500
Test#: 501
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499
search using re's
Test#: 498
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499
Test#: 499
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499
Test#: 500
Data: [0.0010049, 0.00024630000000000002, 9.8394999999999996e-005,
0.00014865000000000001, 0.00083913, 0.0038544999999999999,
0.13314999999999999]
Time: 11:14:10
Output: 500
Test#: 501
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499
More information about the Python-list
mailing list