Pyparsing help

rh0dium steven.klass at gmail.com
Sun Mar 23 17:04:24 EDT 2008


On Mar 23, 1:48 pm, rh0dium <steven.kl... at gmail.com> wrote:
> On Mar 23, 12:26 am, Paul McGuire <pt... at austin.rr.com> wrote:
>
>
>
> > There are a couple of bugs in our program so far.
>
> > First of all, our grammar isn't parsing the METAL2 entry at all.  We
> > should change this line:
>
> >     md = mainDict.parseString(test1)
>
> > to
>
> >     md = (mainDict+stringEnd).parseString(test1)
>
> > The parser is reading as far as it can, but then stopping once
> > successful parsing is no longer possible.  Since there is at least one
> > valid entry matching the OneOrMore expression, then parseString raises
> > no errors.  By adding "+stringEnd" to our expression to be parsed, we
> > are saying "once parsing is finished, we should be at the end of the
> > input string".  By making this change, we now get this parse
> > exception:
>
> > pyparsing.ParseException: Expected stringEnd (at char 1948), (line:54,
> > col:1)
>
> > So what is the matter with the METAL2 entries?  After using brute
> > force "divide and conquer" (I deleted half of the entries and got a
> > successful parse, then restored half of the entries I removed, until I
> > added back the entry that caused the parse to fail), I found these
> > lines in the input:
>
> >     fatTblThreshold                 = (0,0.39,10.005)
> >     fatTblParallelLength            = (0,1,0)
>
> > Both of these violate the atflist definition, because they contain
> > integers, not just floatnums.  So we need to expand the definition of
> > aftlist:
>
> >     floatnum = Combine(Word(nums) + "." + Word(nums) +
> >         Optional('e'+oneOf("+ -")+Word(nums)))
> >     floatnum.setParseAction(lambda t:float(t[0]))
> >     integer = Word(nums).setParseAction(lambda t:int(t[0]))
> >     atflist = Suppress("(") + delimitedList(floatnum|integer) + \
> >                 Suppress(")")
>
> > Then we need to tackle the issue of adding nesting for those entries
> > that have sub-keys.  This is actually kind of tricky for your data
> > example, because nesting within Dict expects input data to be nested.
> > That is, nesting Dict's is normally done with data that is input like:
>
> > main
> >   Technology
> >   Layer
> >     PRBOUNDARY
> >     METAL2
> >   Tile
> >     unit
>
> > But your data is structured slightly differently:
>
> > main
> >   Technology
> >   Layer PRBOUNDARY
> >   Layer METAL2
> >   Tile unit
>
> > Because Layer is repeated, the second entry creates a new node named
> > "Layer" at the second level, and the first "Layer" entry is lost.  To
> > fix this, we need to combine Layer and the layer id into a composite-
> > type of key.  I did this by using Group, and adding the Optional alias
> > (which I see now is a poor name, "layerId" would be better) as a
> > second element of the key:
>
> >     mainDict = dictOf(
> >         Group(Word(alphas)+Optional(quotedString)),
> >         Suppress("{") + attrDict + Suppress("}")
> >         )
>
> > But now if we parse the input with this mainDict, we see that the keys
> > are no longer nice simple strings, but they are 1- or 2-element
> > ParseResults objects.  Here is what I get from the command "print
> > md.keys()":
>
> > [(['Technology'], {}), (['Tile', 'unit'], {}), (['Layer',
> > 'PRBOUNDARY'], {}), (['Layer', 'METAL2'], {})]
>
> > So to finally clear this up, we need one more parse action, attached
> > to the mainDict expression, that rearranges the subdicts using the
> > elements in the keys.  The parse action looks like this, and it will
> > process the overall parse results for the entire data structure:
>
> >     def rearrangeSubDicts(toks):
> >         # iterate over all key-value pairs in the dict
> >         for key,value in toks.items():
> >             # key is of the form ['name'] or ['name', 'name2']
> >             # and the value is the attrDict
>
> >             # if key has just one element, use it to define
> >             # a simple string key
> >             if len(key)==1:
> >                 toks[key[0]] = value
> >             else:
> >                 # if the key has two elements, create a
> >                 # subnode with the first element
> >                 if key[0] not in toks:
> >                     toks[key[0]] = ParseResults([])
>
> >                 # add an entry for the second key element
> >                 toks[key[0]][key[1]] = value
>
> >             # now delete the original key that is the form
> >             # ['name'] or ['name', 'name2']
> >             del toks[key]
>
> > It looks a bit messy, but the point is to modify the tokens in place,
> > by rearranging the attrdicts to nodes with simple string keys, instead
> > of keys nested in structures.
>
> > Lastly, we attach the parse action in the usual way:
>
> >     mainDict.setParseAction(rearrangeSubDicts)
>
> > Now you can access the fields of the different layers as:
>
> >     print md.Layer.METAL2.lineStyle
>
> > I guess this all looks pretty convoluted.  You might be better off
> > just doing your own Group'ing, and then navigating the nested lists to
> > build your own dict or other data structure.
>
> > -- Paul
>
> Hi Paul,
>
> Before I continue this I must thank you for your help.  You really did
> do an outstanding job on this code and it is really straight forward
> to use and learn from.  This was a fun weekend task and I really
> wanted to use pyparsing to do it.  Because this is one of several type
> of files I want to parse.  I (as I'm sure you would agree) think the
> rearrangeSubDicts is a bit of a hack but never the less absolutely
> required and due to the limitations of the data I am parsing.   Once
> again thanks for your great help.  Now the problem..
>
> I attempted to use this code on another testcase.  This testcase had
> tabs in it.  I think 1.4.11 is missing the expandtabs attribute.  I
> ran my code (which had tabs) and I got this..
>
> AttributeError: 'builtin_function_or_method' object has no attribute
> 'expandtabs'
>
> Ugh oh.  Is this a pyparsing problem or am I just an idiot..
>
> Thanks again!

Doh!!  Nevermind I am an idiot.  Nope I got it what a bonehead..

I needed to tweak it a bit to ignore the comments..  Namely this fixed
it up..

    mainDict = dictOf(
            Group(Word(alphas)+Optional(quotedString)),
            Suppress("{") + attrDict + Suppress("}")
            ) | cStyleComment.suppress()

Thanks again.  Now I just need to figure out how to use your dicts to
do some work..




More information about the Python-list mailing list