help with processing text file

Dave Angel davea at davea.name
Fri Dec 5 00:46:04 EST 2014


On 12/04/2014 11:46 PM, C. Ng wrote:
> Hi,
>
> Given the sample text file below (where the gibberish represent the irrelevant portions) :
>
> ....
> abcddsdfffgfg
> ggfhghghgfhghgh   round 5 xccdcxcfd
> sdfdffdfbcvcvbbvnghg score = 0.4533
> abcddsdfffgfg     round 5 level = 0.15
> ggfhghghgfhghgh   round 10 dfsdfdcdsd
> sdfdffdfbcvcvbbvnghg score = 0.4213
> sdsdaawddddsds    round 10 level = 0.13
> ......and so on....
>
>
> I would like to extract the values for round, score and level:
> 5 0.4533 0.15
> 10 0.4213 0.13
> ....and so on...
>
> Please advise me how it can be done, and what Python functions are useful.
>
There's lots of ambiguity in that "specification."  Can you be sure, for 
example that the gibberish does not ever include the string "round", 
"score", or "level"?

Can you be sure that the relevant 3 lines for a given record are 
adjacent, and in that order?  Do you happen to know that "round" always 
starts in a particular column?, and that "score" starts in another 
particular column?

How would you solve it by hand?  Something like the following?

OPen the file.
Skip all lines till column 19-23 contain "round"
find the first space delimited field starting in column 25, and call it 
round_num

On the next line, split the line into words, and save the last word into 
score_val

On the next line, take a substring of the line starting with column 23, 
parse it into words, and store the second word in level_num

Save the values round_num,score_val, and level_num in a tuple, or a 
string, or whatever you find useful, and append it to a result list.

Repeat till end of file.

Lots more error checking is possible, and advisable, but without knowing 
what the file really looks like, I see no point in guessing.

-- 
DaveA



More information about the Python-list mailing list