How to efficiently extract information from structured text file

Imaginationworks xiajunyi at gmail.com
Wed Feb 17 18:37:02 EST 2010


On Feb 17, 1:40 pm, Paul McGuire <pt... at austin.rr.com> wrote:
> On Feb 16, 5:48 pm, Imaginationworks <xiaju... at gmail.com> wrote:
>
> > Hi,
>
> > I am trying to read object information from a text file (approx.
> > 30,000 lines) with the following format, each line corresponds to a
> > line in the text file.  Currently, the whole file was read into a
> > string list using readlines(), then use for loop to search the "= {"
> > and "};" to determine the Object, SubObject,and SubSubObject.
>
> If you open(filename).read() this file into a variable named data, the
> following pyparsing parser will pick out your nested brace
> expressions:
>
> from pyparsing import *
>
> EQ,LBRACE,RBRACE,SEMI = map(Suppress,"={};")
> ident = Word(alphas, alphanums)
> contents = Forward()
> defn = Group(ident + EQ + Group(LBRACE + contents + RBRACE + SEMI))
>
> contents << ZeroOrMore(defn | ~(LBRACE|RBRACE) + Word(printables))
>
> results = defn.parseString(data)
>
> print results
>
> Prints:
>
> [
>  ['Object1',
>    ['...',
>     ['SubObject1',
>       ['....',
>         ['SubSubObject1',
>           ['...']
>         ]
>       ]
>     ],
>     ['SubObject2',
>       ['....',
>        ['SubSubObject21',
>          ['...']
>        ]
>       ]
>     ],
>     ['SubObjectN',
>       ['....',
>        ['SubSubObjectN',
>          ['...']
>        ]
>       ]
>     ]
>    ]
>  ]
> ]
>
> -- Paul

Wow, that is great! Thanks



More information about the Python-list mailing list