Ideas for parsing this text?

Paul McGuire ptmcg at austin.rr.com
Wed Apr 23 22:05:35 EDT 2008


On Apr 23, 8:00 pm, "Eric Wertman" <ewert... at gmail.com> wrote:
> I have a set of files with this kind of content (it's dumped from WebSphere):
>
> [propertySet "[[resourceProperties "[[[description "This is a required
> property. This is an actual database name, and its not the locally
> catalogued database name. The Universal JDBC Driver does not rely on
> ...

A couple of comments first:
- What is the significance of '"[' vs. '[' ?  I stripped them all out
using
    text = text.replace('"[','[')
- Your input text was missing 5 trailing ]'s.

Here's the parser I used, using pyparsing:


from pyparsing import nestedExpr,Word,alphanums,QuotedString
from pprint import pprint

content = Word(alphanums+"_.") | QuotedString('"',multiline=True)
structure = nestedExpr("[", "]", content).parseString(text)

pprint(structure.asList())


Prints (I've truncated the long lines, but the long quoted strings do
parse intact):

[['propertySet',
  [['resourceProperties',
    [[['description',
       'This is a required \nproperty. This is an actual data...
      ['name', 'databaseName'],
      ['required', 'true'],
      ['type', 'java.lang.String'],
      ['value', 'DB2Foo']],
     [['description',
       'The JDBC connectivity-type of a data \nsource. If you...
      ['name', 'driverType'],
      ['required', 'true'],
      ['type', 'java.lang.Integer'],
      ['value', '4']],
     [['description',
       '"The TCP/IP address or host name for the DRDA server."'],
      ['name', 'serverName'],
      ['required', 'false'],
      ['type', 'java.lang.String'],
      ['value', 'ServerFoo']],
     [['description',
       'The TCP/IP port number where the \nDRDA server resides.'],
      ['name', 'portNumber'],
      ['required', 'false'],
      ['type', 'java.lang.Integer'],
      ['value', '007']],
     [['description', '"The description of this datasource."'],
      ['name', 'description'],
      ['required', 'false'],
      ['type', 'java.lang.String'],
      ['value', []]],
     [['description',
       'The DB2 trace level for logging to the \nlogWriter ...
      ['name', 'traceLevel'],
      ['required', 'false'],
      ['type', 'java.lang.Integer'],
      ['value', []]],
     [['description',
       'The trace file to store the trace output. \nIf you ...
       ]]]]]]]

-- Paul
The pyparsing wiki is at http://pyparsing.wikispaces.com.



More information about the Python-list mailing list