[Tutor] text processing, reformatting

Steven D'Aprano steve at pearwood.info
Wed Dec 12 16:37:55 CET 2012


On 13/12/12 01:34, lconrad at go2france.com wrote:
>
>  From an much larger, messy report file, I extracted these lines:
>
>
> OPM010 HUNT INGR FRI 16/11/12 00:00:00 QRTR
> HTGP PEG OVFL
> 0012 00000 00000
> 0022 00000 00000
> 0089 00000 00000
> 0379 00000 00000
> OPM010 HUNT INGR FRI 16/11/12 00:15:00 QRTR
> HTGP PEG OVFL
> 0012 00000 00000
> 0022 00000 00000
[snip]

> the engineer needs that reformatted into the "log line" the original
>machine should have written anyway, for importing into Excel:
>
> yyyy-mm-dd;hh:mm:ss;<htgp>;<peg>;<ovfl>;
>
> With Bourne shell, I could eventually whack this out as I usually do,
>but as a Python pupil, I'd like to see how, learn from you aces and
>python could do it. :)

Well, it's not entirely clear what the relationship between the "before"
and "after" text should be. It's always useful to give an actual example
so as to avoid misunderstandings.

In the absence of a detailed specification, I will just have to guess,
and then you can complain when I guess wrongly :-)

My guess is that the above report lines should be reformatted into:

2012-11-16;00:00:00;0012;00000;00000;
2012-11-16;00:00:00;0022;00000;00000;
2012-11-16;00:00:00;0089;00000;00000;
2012-11-16;00:00:00;0379;00000;00000;
2012-11-16;00:15:00;0012;00000;00000;
2012-11-16;00:15:00;0022;00000;00000;


Here is a quick and dirty version, with little in the way of error
checking or sophistication, suitable for a throw-away script:

# === cut ===

date, time = "unknown", "unknown"
for line in open("input.txt"):
     line = line.strip()  # remove whitespace
     if not line:
         # skip blanks
         continue
     if line.startswith("OPM010") and line.endswith("QRTR"):
         # extract the date from line similar to
         # OPM010 HUNT INGR FRI 16/11/12 00:15:00 QRTR
         date, time = line.split()[4:5]
         # convert date from DD/MM/YY to YYYY-MM-DD
         dd, mm, yy = date.split("/")
         date = "20%s-%s-%s" % (yy, mm, dd)
     elif line == "HTGP PEG OVFL":
         continue
     else:
         # output log lines
         htgp, peg, ovfl = line.split()
         print(";".join([date, time, htgp, peg, ovfl]))

# === cut ===



-- 
Steven


More information about the Tutor mailing list