How to efficiently extract information from structured text file

Paul McGuire ptmcg at austin.rr.com
Wed Feb 17 14:40:17 EST 2010


On Feb 16, 5:48 pm, Imaginationworks <xiaju... at gmail.com> wrote:
> Hi,
>
> I am trying to read object information from a text file (approx.
> 30,000 lines) with the following format, each line corresponds to a
> line in the text file.  Currently, the whole file was read into a
> string list using readlines(), then use for loop to search the "= {"
> and "};" to determine the Object, SubObject,and SubSubObject.

If you open(filename).read() this file into a variable named data, the
following pyparsing parser will pick out your nested brace
expressions:

from pyparsing import *

EQ,LBRACE,RBRACE,SEMI = map(Suppress,"={};")
ident = Word(alphas, alphanums)
contents = Forward()
defn = Group(ident + EQ + Group(LBRACE + contents + RBRACE + SEMI))

contents << ZeroOrMore(defn | ~(LBRACE|RBRACE) + Word(printables))

results = defn.parseString(data)

print results

Prints:

[
 ['Object1',
   ['...',
    ['SubObject1',
      ['....',
        ['SubSubObject1',
          ['...']
        ]
      ]
    ],
    ['SubObject2',
      ['....',
       ['SubSubObject21',
         ['...']
       ]
      ]
    ],
    ['SubObjectN',
      ['....',
       ['SubSubObjectN',
         ['...']
       ]
      ]
    ]
   ]
 ]
]

-- Paul



More information about the Python-list mailing list