[Tutor] write program to extract data
Dave Angel
davea at ieee.org
Fri Aug 14 22:18:47 CEST 2009
Michael Miesner wrote:
> Hi-
> I work in a research lab and part of the lab I'm not usually associated with
> uses a program that outputs data in a .txt file for each participant that is
> run.
> The participant # is the title of the text document (ie E00343456.txt) style
> and I'd like to be able to take this and other data in the file and draw it
> into a spreadsheet.
> The first 3/4 of the output is the scenario. I've bolded the areas that I
> really want to be able to draw out. In case some people cant see the bold,
> they are the sections called "driver mistakes" and individual mistakes.
>
>
> Preferably, what I'd really like to do is make the script so that I execute
> it, and in doing so, tell it what folder to look in, and it takes all the
> .txt's out of that folder, and adds them to the spreadsheet. I'd like the
> title of the .txt to be the first column, and the data held in the
> spreadsheet to be the be the next columns.
>
> Below is output of 1 data file.
> ----------------------------------------------------------------
> Date: August 13, 2009
> Time: 12:16:18:151 PM
> ID:
> Scenario file: C:\Documents and Settings\APL02\Desktop\Driving Sim
> Files\Sim 10-21-08.txt
> Configuration file:
>
> <Snip lots of lines>
>
> 7 0 0 0 204 130.55 0 999.00
> 8 0 0 0
> 9 0 0 0
> 10 0 0 0
>
> *Driver mistakes:
>
> Total number of off road accidents = 1
> Total number of collisions = 3
> Total number of pedestrians hit = 3
> Total number of speed exceedances = 11
> Total number of speeding tickets = 0
> Total number of traffic light tickets = 1
> Total number of stop signs missed = 0
> Total number of centerline crossings = 5
> Total number of road edge excursions = 4
> Total number of stops at traffic lights = 2
> Total number of correct DA responses = 0
> Total number of incorrect DA responses = 0
> Total number of DAs with no response = 0
> Total run length (Drive T, X, Total T) = 761.92 34000 761.92
>
> Total number of illegal turns = 0
> Total number of low speed warnings = 0
> Total number of high speed warnings = 0
> Over speed limit (% Time, % Distance) = 27.15 47.77
> Out of lane (% Time, % Distance) = 4.22 3.74
>
> Individual mistakes (Time, Distance, Elapsed distance or object number,
> Elapsed time, Maximum value):
>
> Centerline crossing 68.16 3311.11 110.69 2.60
> -3.46
> Centerline crossing 94.26 4598.99 252.83 4.45
> -3.19
> Centerline crossing 127.85 6758.39 109.70 1.92
> -2.78
> Centerline crossing 162.14 9082.09 273.04 3.27
> -9.72
> Speed exceedance 162.80 9135.05 4596.82 48.87
> 120.00
> Road edge excursion 166.34 9438.89 77.08 0.83
> 13.45
> Hit pedestrian 204.46 13731.87 1
> Speed exceedance 221.51 13829.40 912.17 22.75
> 67.13
> Hit pedestrian 237.05 14741.43 5
> Speed exceedance 259.93 15171.25 746.18 13.35
> 60.35
> Centerline crossing 317.69 16126.40 14.76 8.15
> -4.37
> Vehicle collision (F) 318.63 16141.15 66
> Vehicle collision (F) 341.91 16166.52 91
> Red light ticket 452.04 16548.74 2
> Speed exceedance 458.24 16802.82 1484.23 18.15
> 95.58
> Speed exceedance 516.14 20362.78 685.87 10.13
> 68.67
> Speed exceedance 551.24 21996.05 2111.93 24.31
> 91.76
> Vehicle collision (F) 580.58 24459.77 163
> Speed exceedance 606.49 25376.76 3929.19 42.84
> 120.00
> Road edge excursion 622.00 26916.68 86.74 0.77
> 13.30
> Road edge excursion 623.62 27100.59 251.81 2.17
> 14.42
> Road edge excursion 641.31 29212.44 93.52 8.02
> 19.52
> Off road accident 642.11 29305.93
> Speed exceedance 664.99 30081.20 777.80 10.76
> 82.64
> Hit pedestrian 695.29 31013.94 30
> Speed exceedance 719.30 31513.55 915.64 14.16
> 71.88
> Speed exceedance 742.46 32837.36 81.72 1.57
> 52.90
> Speed exceedance 749.76 33199.52 801.52 12.16
> 76.04
>
>
> *
>
Did you really have to quote all 1100 lines, when what you really wanted
to say was there's a bunch of junk at the beginning of the file, then
the following stuff ?
Anyway, the most important question is how rigid is this file format?
Will there always be exactly one line starting "* Driver mistakes"?
Will the lines between that and the final "*" always be the same ones,
and in the same order? Or is "Road edge excursion" somehow an optional
line. or whatever? Are the numbers identified by the column they start
in, or by spaces between them? Is a zero stored sometimes as an empty
field?
It already appears that some lines are appearing multiple times, and not
in order. Assuming that's deliberate, is there a maximum number of
times for any given line? If not, how do you plan to encode it in a
spreadsheet, which generally has fixed columns, with a specific meaning
for each one?
Once you've got a spec for the file requirements, you're left with three
pieces to write. And if the file requirements are tricky, you may want
to write a validator for the file. GIGO, you know.
1) loop through all the txt files in a given directory. Check out
glob, or os.path, or even inputfile.
2) parse the lines of the file. A simple loop, checking the line
against "* Driver mistakes". Then another loop, building a list of the
fields from those lines. This should do some integrity checking, in
case some line is short one of its numeric fields, for example.
3) write the results to a csv file, one call per input file. Check out
the csv module.
I'd suggest building a dummy set of files, with just a few lines in
each, but the same general format. Then when you have the program
working for those, generalize it.
DaveA
More information about the Tutor
mailing list