[Tutor] Parsing problem

Liam Clarke cyresse at gmail.com
Mon Jul 25 04:49:31 CEST 2005


Hi Paul, 

I am kicking myself for the pp.OneOrMore(assignments) bit. It's parsing well 
now, it stops whenever it hits a character it can't handle, ( a minus sign 
snuck in instead of a equals at one point, but I think I put it there) this 
makes tweaking quite easy. 

Just a quick query on how Word works. 

These two lines - 

identifier = pp.Word(pp.alphas, pp.alphanums + "_/:.")
integer = pp.Word(pp.nums+"-+.", pp.nums)

It's stopped at integer values which contain a decimal point, which I 
thought I'd taken care of with my additions to the above. How do the 
initChars and bodyChars affect a token?

Regards, 

Liam Clarke



On 7/25/05, Paul McGuire <paul at alanweberassociates.com> wrote:
> 
> Liam -
> 
> I just uploaded an update to pyparsing, version 1.3.2, that should fix the
> problem with using nested Dicts. Now you won't need to use [0] to
> dereference the "0'th" element, just reference the nested elements as 
> a.b.c,
> or a["b"]["c"].
> 
> -- Paul
> 
> 
> -----Original Message-----
> From: Liam Clarke [mailto:cyresse at gmail.com]
> Sent: Sunday, July 24, 2005 10:21 AM
> To: Paul McGuire
> Cc: tutor at python.org
> Subject: Re: [Tutor] Parsing problem
> 
> Hi Paul,
> 
> That is fantastic. It works, and using that pp.group is the key with the
> nested braces.
> 
> I just ran this on the actual file after adding a few more possible values
> inside the group, and it parsed the entire header structure rather nicely.
> 
> Now this will probably sound silly, but from the bit
> 
> header = {...
> ...
> }
> 
> it continues on with
> 
> province = {...
> }
> 
> and so forth.
> 
> Now, once it reads up to the closing bracket of the header section, it
> returns that parsed nicely.
> Is there a way I can tell it to continue onwards? I can see that it's
> stopping at one group.
> 
> Pyparsing is wonderful, but boy... as learning curves go, I'm somewhat 
> over
> my head.
> 
> I've tried this -
> 
> Code http://www.rafb.net/paste/results/3Dm7FF35.html
> Current data http://www.rafb.net/paste/results/3cWyt169.html
> 
> assignment << (pp.OneOrMore(pp.Group( LHS + EQUALS + RHS )))
> 
> to try and continue the parsing, but no luck.
> 
> I've been running into the
> 
> File "c:\python24\Lib\site-packages\pyparsing.py", line 1427, in parseImpl
> raise maxException
> pyparsing.ParseException: Expected "}" (at char 742), (line:35, col:5)
> 
> hassle again. From the CPU loading, I'm worried I've got myself something
> very badly recursive going on, but I'm unsure of how to use validate()
> 
> I've noticed that a few of the sections in between contain values like 
> this
> -
> 
> foo = { BAR = { HUN = 10 SOB = 6 } oof = { HUN = { } SOB = 4 } }
> 
> and so I've stuck pp.empty into my RHS possible values. What unintended 
> side
> effects may I get from using pp.empty? From the docs, it sounds like a
> wildcard token, rather than matching a null.
> 
> Using pp.empty has resolved my apparent problem with empty {}'s causing my
> favourite exception, but I'm just worried that I'm casting my net too 
> wide.
> 
> Oh, and, if there's a way to get a 'last line parsed' value so as to start
> parsing onwards, it would ease my day, as the only way I've found to get 
> the
> whole thing parsed is to use another x = { ... } around the whole of the
> data, and now, I'm only getting the 'x' returned, so if I could parse by
> section, it would help my understanding of what's happening.
> 
> I'm still trial and error-ing a bit too much at the moment.
> 
> Regards,
> 
> Liam Clarke
> 
> 
> 
> 
> 
> On 7/24/05, Paul McGuire <paul at alanweberassociates.com> wrote:
> 
> Liam -
> 
> Glad you are sticking with pyparsing through some of these
> idiosyncracies!
> 
> One thing that might simplify your life is if you are a bit more
> strict on
> specifying your grammar, especially using pp.printables as the
> character set
> for your various words and values. Is this statement really valid?
> 
> Lw)r*)*dsflkj = sldjouwe)r#jdd
> 
> According to your grammar, it is. Also, by using printables, you
> force your
> user to insert whitespace between the assignment target and the
> equals sign.
> I'm sure your users would like to enter a quick "a=1" once in a
> while, but
> since there is no whitespace, it will all be slurped into the
> left-hand side
> identifier.
> 
> Let's create two expressions, LHS and RHS, to dictate what is valid
> on the
> left and right-hand side of the equals sign. (Well, it turns out I
> create a
> bunch of expressions here, in the process of defining LHS and RHS,
> but
> hopefullly, this will make some sense):
> 
> EQUALS = pp.Suppress ("=")
> LBRACE = pp.Suppress("{")
> RBRACE = pp.Suppress("}")
> identifier = pp.Word(pp.alphas, pp.alphanums + "_")
> integer = pp.Word(pp.nums+"-+", pp.nums)
> assignment = pp.Forward()
> LHS = identifier
> RHS = pp.Forward().setName("RHS")
> RHS << ( pp.dblQuotedString ^ identifier ^ integer ^ pp.Group(
> LBRACE +
> pp.OneOrMore(assignment) + RBRACE ) )
> assignment << pp.Group( LHS + EQUALS + RHS )
> 
> I leave it to you to flesh out what other possible value types can
> be
> included in RHS.
> 
> Note also the use of the Group. Try running this snippet with and
> without
> Group and see how the results change. I think using Group will help
> you to
> build up a good parse tree for the matched tokens.
> 
> Lastly, please note in the '<<' assignment to RHS that the
> expression is
> enclosed in parens. I originally left this as
> 
> RHS << pp.dblQuotedString ^ identifier ^ integer ^ pp.Group( LBRACE
> +
> pp.OneOrMore(assignment) + RBRACE )
> 
> And it failed to match! A bug! In my own code! The shame...
> 
> This fails because '<<' has a higher precedence then '^', so RHS
> only worked
> if it was handed a quoted string. Probably good practice to always
> enclose
> in quotes the expression being assigned to a Forward using '<<'.
> 
> -- Paul
> 
> 
> -----Original Message-----
> From: Liam Clarke [mailto: cyresse at gmail.com]
> Sent: Saturday, July 23, 2005 9:03 AM
> To: Paul McGuire
> Cc: tutor at python.org
> Subject: Re: [Tutor] Parsing problem
> 
> *sigh* I just read the documentation more carefully and found the
> difference
> between the
> | operator and the ^ operator.
> 
> Input -
> 
> j = { line = { foo = 10 bar = 20 } }
> 
> New code
> 
> sel = pp.Forward ()
> values = ((pp.Word(pp.printables) + pp.Suppress("=") +
> pp.Word(pp.printables)) ^ sel)
> sel << (pp.Word(pp.printables) + pp.Suppress("=") + pp.Suppress("{")
> +
> pp.OneOrMore(values) + pp.Suppress("}"))
> 
> Output -
> 
> (['j', 'line', 'foo', '10', 'bar', '20'], {})
> 
> My apologies for the deluge.
> 
> Regards,
> 
> Liam Clarke
> 
> 
> On 7/24/05, Liam Clarke < cyresse at gmail.com
> <mailto:cyresse at gmail.com> > wrote:
> 
> Hmmm... just a quick update, I've been poking around and I'm
> obviously making some error of logic.
> 
> Given a line -
> 
> f = "j = { line = { foo = 10 bar = 20 } }"
> 
> And given the following code -
> 
> select = pp.Forward()
> select <<
> pp.Word(pp.printables) + pp.Suppress("=") + pp.Suppress("{")
> +
> pp.OneOrMore ( (pp.Word(pp.printables) + pp.Suppress("=") +
> pp.Word(pp.printables) ) | select ) + pp.Suppress("}")
> 
> sel.parseString(f) gives -
> 
> (['j', 'line', '{', 'foo', '10', 'bar', '20'], {})
> 
> So I've got a bracket sneaking through there. Argh. My brain
> hurts.
> 
> Is the | operator an exclusive or?
> 
> Befuddled,
> 
> Liam Clarke
> 
> 
> 
> On 7/23/05, Liam Clarke < cyresse at gmail.com > wrote:
> 
> Howdy,
> 
> I've attempted to follow your lead and have started
> from
> scratch, I could just copy and paste your solution (which works
> pretty
> well), but I want to understand what I'm doing *grin*
> 
> However, I've been hitting a couple of ruts in the
> path to
> enlightenment. Is there a way to tell pyparsing that to treat
> specific
> escaped characters as just a slash followed by a letter? For the
> time being
> I've converted all backslashes to forwardslashes, as it was choking
> on \a in
> a file path.
> 
> But my latest hitch, takes this form (apologies for
> large
> traceback)
> 
> Traceback (most recent call last):
> File "<interactive input>", line 1, in ?
> File "parse.py", line 336, in parse
> parsedEntries = dicts.parseString(test_data)
> File "c:\python24\Lib\site-packages\pyparsing.py",
> line
> 616, in parseString
> loc, tokens = self.parse( instring.expandtabs(),
> 0 )
> File "c:\python24\Lib\site-packages\pyparsing.py",
> line
> 558, in parse
> loc,tokens = self.parseImpl( instring, loc,
> doActions )
> File "c:\python24\Lib\site-packages\pyparsing.py",
> line
> 1518, in parseImpl
> return self.expr.parse( instring, loc, doActions
> )
> File "c:\python24\Lib\site-packages\pyparsing.py",
> line
> 558, in parse
> loc,tokens = self.parseImpl( instring, loc,
> doActions )
> File "c:\python24\Lib\site-packages\pyparsing.py",
> line
> 1367, in parseImpl
> loc, exprtokens = e.parse( instring, loc,
> doActions )
> File "c:\python24\Lib\site-packages\pyparsing.py",
> line
> 558, in parse
> loc,tokens = self.parseImpl( instring, loc,
> doActions )
> File "c:\python24\Lib\site-packages\pyparsing.py",
> line
> 1518, in parseImpl
> return self.expr.parse( instring, loc, doActions
> )
> File "c:\python24\Lib\site-packages\pyparsing.py",
> line
> 560, in parse
> raise ParseException, ( instring, len(instring),
> self.errmsg, self )
> 
> ParseException: Expected "}" (at char 9909),
> (line:325,
> col:5)
> 
> The offending code can be found here (includes the
> data) -
> http://www.rafb.net/paste/results/L560wx80.html
> 
> It's like pyparsing isn't recognising a lot of my
> "}"'s, as
> if I add another one, it throws the same error, same for adding
> another
> two...
> 
> No doubt I've done something silly, but any help in
> finding
> the tragic flaw would be much appreciated. I need to get a
> parsingResults
> object out so I can learn how to work with the basic structure!
> 
> Much regards,
> 
> Liam Clarke
> 
> 
> 
> On 7/21/05, Paul McGuire <
> paul at alanweberassociates.com
> <mailto:paul at alanweberassociates.com> > wrote:
> 
> Liam, Kent, and Danny -
> 
> It sure looks like pyparsing is taking on a
> life of
> its own! I can see I no
> longer am the only one pitching pyparsing at
> some of
> these applications!
> 
> Yes, Liam, it is possible to create
> dictionary-like
> objects, that is,
> ParseResults objects that have named values
> in them.
> I looked into your
> application, and the nested assignments seem
> very
> similar to a ConfigParse
> type of structure. Here is a pyparsing
> version that
> handles the test data
> in your original post (I kept Danny Yoo's
> recursive
> list values, and added
> recursive dictionary entries):
> 
> --------------------------
> import pyparsing as pp
> 
> listValue = pp.Forward()
> listSeq = pp.Suppress ('{') +
> pp.Group(pp.ZeroOrMore(listValue)) +
> pp.Suppress('}')
> listValue << (
> pp.dblQuotedString.setParseAction(pp.removeQuotes) |
> pp.Word(pp.alphanums) |
> listSeq )
> 
> keyName = pp.Word( pp.alphas )
> 
> entries = pp.Forward()
> entrySeq = pp.Suppress('{') +
> pp.Group(pp.OneOrMore(entries)) +
> pp.Suppress('}')
> entries << pp.Dict(
> pp.OneOrMore (
> pp.Group( keyName +
> pp.Suppress('=')
> + (entrySeq |
> listValue) ) ) )
> --------------------------
> 
> 
> Dict is one of the most confusing classes to
> use,
> and there are some
> examples in the examples directory that
> comes with
> pyparsing (see
> dictExample2.py), but it is still tricky.
> Here is
> some code to access your
> input test data, repeated here for easy
> reference:
> 
> --------------------------
> testdata = """\
> country = {
> tag = ENG
> ai = {
> flags = { }
> combat = { DAU FRA ORL PRO }
> continent = { }
> area = { }
> region = { "British Isles" "NorthSeaSea"
> "ECAtlanticSea" "NAtlanticSea"
> "TagoSea" "WCAtlanticSea" }
> war = 60
> ferocity = no
> }
> }
> """
> parsedEntries =
> entries.parseString(testdata)
> 
> def dumpEntries(dct,depth=0):
> keys = dct.keys()
> keys.sort()
> for k in keys:
> print (' '*depth) + '- ' + k + ':',
> if
> isinstance(dct[k],pp.ParseResults):
> if dct[k][0].keys():
> print
> 
> dumpEntries(dct[k][0],depth+1)
> else:
> print dct[k][0]
> else:
> print dct[k]
> 
> dumpEntries( parsedEntries )
> 
> print
> print parsedEntries.country[0].tag
> print parsedEntries.country[0].ai[0].war
> print
> parsedEntries.country[0].ai[0].ferocity
> --------------------------
> 
> This will print out:
> 
> --------------------------
> - country:
> - ai:
> - area: []
> - combat: ['DAU', 'FRA', 'ORL', 'PRO']
> - continent: []
> - ferocity: no
> - flags: []
> - region: ['British Isles',
> 'NorthSeaSea',
> 'ECAtlanticSea',
> 'NAtlanticSea', 'TagoSea', 'WCAtlanticSea']
> - war: 60
> - tag: ENG
> 
> ENG
> 60
> No
> --------------------------
> 
> But I really dislike having to dereference
> those
> nested values using the
> 0'th element. So I'm going to fix pyparsing
> so that
> in the next release,
> you'll be able to reference the sub-elements
> as:
> 
> print parsedEntries.country.tag
> print parsedEntries.country.ai.war
> print parsedEntries.country.ai.ferocity
> 
> This *may* break some existing code, but
> Dict is not
> heavily used, based on
> feedback from users, and this may make it
> more
> useful in general, especially
> when data parses into nested Dict's.
> 
> Hope this sheds more light than confusion!
> -- Paul McGuire
> 
> 
> _______________________________________________
> Tutor maillist - Tutor at python.org
> <mailto:Tutor at python.org>
> 
> http://mail.python.org/mailman/listinfo/tutor
> <http://mail.python.org/mailman/listinfo/tutor>
> 
> 
> 
> 
> 
> --
> 'There is only one basic human right, and that is to
> do as
> you damn well please.
> And with it comes the only basic human duty, to take
> the
> consequences.'
> 
> 
> 
> 
> --
> 'There is only one basic human right, and that is to do as
> you damn
> well please.
> And with it comes the only basic human duty, to take the
> consequences.'
> 
> 
> 
> 
> --
> 'There is only one basic human right, and that is to do as you damn
> well
> please.
> And with it comes the only basic human duty, to take the
> consequences.'
> 
> 
> 
> 
> 
> 
> --
> 'There is only one basic human right, and that is to do as you damn well
> please.
> And with it comes the only basic human duty, to take the consequences.'
> 
> 


-- 
'There is only one basic human right, and that is to do as you damn well 
please.
And with it comes the only basic human duty, to take the consequences.'
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20050725/d8592823/attachment-0001.htm


More information about the Tutor mailing list