Ideas for parsing this text?

Gerard Flanagan grflanagan at gmail.com
Thu Apr 24 05:52:21 EDT 2008


On Apr 24, 4:05 am, Paul McGuire <pt... at austin.rr.com> wrote:
> On Apr 23, 8:00 pm, "Eric Wertman" <ewert... at gmail.com> wrote:
>
> > I have a set of files with this kind of content (it's dumped from WebSphere):
>
> > [propertySet "[[resourceProperties "[[[description "This is a required
> > property. This is an actual database name, and its not the locally
> > catalogued database name. The Universal JDBC Driver does not rely on
> > ...
>
> A couple of comments first:
> - What is the significance of '"[' vs. '[' ?  I stripped them all out
> using

The data can be thought of as a serialised object. A simple attribute
looks like:

[name someWebsphereObject]

or

[jndiName []]

if 'jndiName is None'.

A complex attribute is an attribute whose value is itself an object
(or dict if you prefer). The *value* is indicated with "[...]":

[connectionPool "[[agedTimeout 0]
[connectionTimeout 180]
[freePoolDistributionTableSize 0]
[maxConnections 10]
[minConnections 1]
[numberOfFreePoolPartitions 0]
[numberOfSharedPoolPartitions 0]
[unusedTimeout 1800]]"]

However, 'propertySet' is effectively a keyword and its value may be
thought of as a 'data table' or 'list of data rows', where 'data row'
== dict/object

You can see how the posted example is incomplete because the last
'row' is missing all but one 'column'.

>     text = text.replace('"[','[')
> - Your input text was missing 5 trailing ]'s.
>

I think only 2 (the original isn't Python). To fix the example, remove
the last 'description' and add two ]'s

> Here's the parser I used, using pyparsing:
>
> from pyparsing import nestedExpr,Word,alphanums,QuotedString
> from pprint import pprint
>
> content = Word(alphanums+"_.") | QuotedString('"',multiline=True)
> structure = nestedExpr("[", "]", content).parseString(text)
>
> pprint(structure.asList())
>

By the way, I think this would be a good example for the pyparsing
recipes page (even an IBM developerworks article?)

http://www.ibm.com/developerworks/websphere/library/techarticles/0801_simms/0801_simms.html

Gerard

example data (copied and pasted; doesn't have the case where a complex
attribute has a complex attribute):

[authDataAlias []]
[authMechanismPreference BASIC_PASSWORD]
[connectionPool "[[agedTimeout 0]
[connectionTimeout 180]
[freePoolDistributionTableSize 0]
[maxConnections 10]
[minConnections 1]
[numberOfUnsharedPoolPartitions 0]
[properties []]
[purgePolicy FailingConnectionOnly]
[reapTime 180]
[surgeThreshold -1]
[testConnection false]
[testConnectionInterval 0]
[unusedTimeout 1800]]"]
[propertySet "[[resourceProperties "[[[description "This is a required
property. This is an actual database name, and its not the locally
catalogued database name. The Universal JDBC Driver does not rely on
information catalogued in the DB2 database directory."]
[name databaseName]
[required true]
[type java.lang.String]
[value DB2Foo]] [[description "The JDBC connectivity-type of a data
source. If you want to use a type 4 driver, set the value to 4. If you
want to use a type 2 driver, set the value to 2. Use of driverType 2
is not supported on WAS z/OS."]
[name driverType]
[required true]
[type java.lang.Integer]
[value 4]] [[description "The TCP/IP address or name for the DRDA
server."]
[name serverName]
[required false]
[type java.lang.String]
[value ServerFoo]] [[description "The TCP/IP port number where the
DRDA server resides."]
[name portNumber]
[required false]
[type java.lang.Integer]
[value 007]] [[description "The description of this datasource."]
[name description]
[required false]
[type java.lang.String]
[value []]] [[description "The DB2 trace level for logging to the
logWriter or trace file. Possible trace levels are: TRACE_NONE =
0,TRACE_CONNECTION_CALLS = 1,TRACE_STATEMENT_CALLS =
2,TRACE_RESULT_SET_CALLS = 4,TRACE_DRIVER_CONFIGURATION =
16,TRACE_CONNECTS = 32,TRACE_DRDA_FLOWS =
64,TRACE_RESULT_SET_META_DATA = 128,TRACE_PARAMETER_META_DATA =
256,TRACE_DIAGNOSTICS = 512,TRACE_SQLJ = 1024,TRACE_ALL = -1, ."]
[name traceLevel]
[required false]
[type java.lang.Integer]
[value []]]
]]



More information about the Python-list mailing list