help with processing text file
Dave Angel
davea at davea.name
Fri Dec 5 00:46:04 EST 2014
On 12/04/2014 11:46 PM, C. Ng wrote:
> Hi,
>
> Given the sample text file below (where the gibberish represent the irrelevant portions) :
>
> ....
> abcddsdfffgfg
> ggfhghghgfhghgh round 5 xccdcxcfd
> sdfdffdfbcvcvbbvnghg score = 0.4533
> abcddsdfffgfg round 5 level = 0.15
> ggfhghghgfhghgh round 10 dfsdfdcdsd
> sdfdffdfbcvcvbbvnghg score = 0.4213
> sdsdaawddddsds round 10 level = 0.13
> ......and so on....
>
>
> I would like to extract the values for round, score and level:
> 5 0.4533 0.15
> 10 0.4213 0.13
> ....and so on...
>
> Please advise me how it can be done, and what Python functions are useful.
>
There's lots of ambiguity in that "specification." Can you be sure, for
example that the gibberish does not ever include the string "round",
"score", or "level"?
Can you be sure that the relevant 3 lines for a given record are
adjacent, and in that order? Do you happen to know that "round" always
starts in a particular column?, and that "score" starts in another
particular column?
How would you solve it by hand? Something like the following?
OPen the file.
Skip all lines till column 19-23 contain "round"
find the first space delimited field starting in column 25, and call it
round_num
On the next line, split the line into words, and save the last word into
score_val
On the next line, take a substring of the line starting with column 23,
parse it into words, and store the second word in level_num
Save the values round_num,score_val, and level_num in a tuple, or a
string, or whatever you find useful, and append it to a result list.
Repeat till end of file.
Lots more error checking is possible, and advisable, but without knowing
what the file really looks like, I see no point in guessing.
--
DaveA
More information about the Python-list
mailing list