[Tutor] Parsing problem

Paul McGuire paul at alanweberassociates.com
Sat Jul 23 22:50:02 CEST 2005


Liam -

Glad you are sticking with pyparsing through some of these idiosyncracies!

One thing that might simplify your life is if you are a bit more strict on
specifying your grammar, especially using pp.printables as the character set
for your various words and values.  Is this statement really valid?

Lw)r*)*dsflkj = sldjouwe)r#jdd

According to your grammar, it is.  Also, by using printables, you force your
user to insert whitespace between the assignment target and the equals sign.
I'm sure your users would like to enter a quick "a=1" once in a while, but
since there is no whitespace, it will all be slurped into the left-hand side
identifier.

Let's create two expressions, LHS and RHS, to dictate what is valid on the
left and right-hand side of the equals sign.  (Well, it turns out I create a
bunch of expressions here, in the process of defining LHS and RHS, but
hopefullly, this will make some sense):

EQUALS = pp.Suppress("=")
LBRACE = pp.Suppress("{")
RBRACE = pp.Suppress("}")
identifier = pp.Word(pp.alphas, pp.alphanums + "_")
integer = pp.Word(pp.nums+"-+", pp.nums)
assignment = pp.Forward()
LHS = identifier
RHS = pp.Forward().setName("RHS")
RHS << ( pp.dblQuotedString ^ identifier ^ integer ^ pp.Group( LBRACE +
pp.OneOrMore(assignment) + RBRACE ) )
assignment << pp.Group( LHS + EQUALS + RHS )

I leave it to you to flesh out what other possible value types can be
included in RHS.

Note also the use of the Group.  Try running this snippet with and without
Group and see how the results change.  I think using Group will help you to
build up a good parse tree for the matched tokens.

Lastly, please note in the '<<' assignment to RHS that the expression is
enclosed in parens.  I originally left this as

RHS << pp.dblQuotedString ^ identifier ^ integer ^ pp.Group( LBRACE +
pp.OneOrMore(assignment) + RBRACE )

And it failed to match!  A bug! In my own code!  The shame...

This fails because '<<' has a higher precedence then '^', so RHS only worked
if it was handed a quoted string.  Probably good practice to always enclose
in quotes the expression being assigned to a Forward using '<<'.

-- Paul


-----Original Message-----
From: Liam Clarke [mailto:cyresse at gmail.com] 
Sent: Saturday, July 23, 2005 9:03 AM
To: Paul McGuire
Cc: tutor at python.org
Subject: Re: [Tutor] Parsing problem

*sigh* I just read the documentation more carefully and found the difference
between the 
| operator and the ^ operator. 

Input - 

j = { line = { foo = 10 bar = 20 } }

New code

sel = pp.Forward()
values = ((pp.Word(pp.printables) + pp.Suppress("=") +
pp.Word(pp.printables)) ^ sel)
sel << (pp.Word(pp.printables) + pp.Suppress("=") + pp.Suppress("{") +
pp.OneOrMore(values) + pp.Suppress("}"))

Output - 

(['j', 'line', 'foo', '10', 'bar', '20'], {})

My apologies for the deluge. 

Regards, 

Liam Clarke


On 7/24/05, Liam Clarke <cyresse at gmail.com> wrote:

	Hmmm... just a quick update, I've been poking around and I'm
obviously making some error of logic. 
	
	Given a line - 
	
	 f = "j = { line = { foo = 10 bar = 20 } }"
	
	And given the following code - 
	
	select = pp.Forward()
	select << 
	pp.Word(pp.printables) + pp.Suppress("=") + pp.Suppress("{") + 
	pp.OneOrMore( (pp.Word(pp.printables) + pp.Suppress("=") + 
	pp.Word(pp.printables) ) | select ) + pp.Suppress("}")
	
	sel.parseString(f) gives - 
	
	(['j', 'line', '{', 'foo', '10', 'bar', '20'], {})
	
	So I've got a bracket sneaking through there. Argh. My brain hurts. 
	
	Is the | operator an exclusive or? 
	
	Befuddled, 
	
	Liam Clarke
	
	
	
	On 7/23/05, Liam Clarke <cyresse at gmail.com > wrote:

		Howdy, 
		
		I've attempted to follow your lead and have started from
scratch, I could just copy and paste your solution (which works pretty
well), but I want to understand what I'm doing *grin*
		
		However, I've been hitting a couple of ruts in the path to
enlightenment. Is there a way to tell pyparsing that to treat specific
escaped characters as just a slash followed by a letter? For the time being
I've converted all backslashes to forwardslashes, as it was choking on \a in
a file path.
		
		But my latest hitch, takes this form (apologies for large
traceback)
		
		Traceback (most recent call last):
		  File "<interactive input>", line 1, in ?
		  File "parse.py", line 336, in parse
		    parsedEntries = dicts.parseString(test_data)
		  File "c:\python24\Lib\site-packages\pyparsing.py", line
616, in parseString
		    loc, tokens = self.parse( instring.expandtabs(), 0 )
		  File "c:\python24\Lib\site-packages\pyparsing.py", line
558, in parse
		    loc,tokens = self.parseImpl( instring, loc, doActions )
		  File "c:\python24\Lib\site-packages\pyparsing.py", line
1518, in parseImpl
		    return self.expr.parse( instring, loc, doActions )
		  File "c:\python24\Lib\site-packages\pyparsing.py", line
558, in parse
		    loc,tokens = self.parseImpl( instring, loc, doActions )
		  File "c:\python24\Lib\site-packages\pyparsing.py", line
1367, in parseImpl
		    loc, exprtokens = e.parse( instring, loc, doActions )
		  File "c:\python24\Lib\site-packages\pyparsing.py", line
558, in parse
		    loc,tokens = self.parseImpl( instring, loc, doActions )
		  File "c:\python24\Lib\site-packages\pyparsing.py", line
1518, in parseImpl
		    return self.expr.parse( instring, loc, doActions )
		  File "c:\python24\Lib\site-packages\pyparsing.py", line
560, in parse
		    raise ParseException, ( instring, len(instring),
self.errmsg, self )
		
		ParseException: Expected "}" (at char 9909), (line:325,
col:5)
		
		The offending code can be found here (includes the data) -
http://www.rafb.net/paste/results/L560wx80.html 
		
		It's like pyparsing isn't recognising a lot of my "}"'s, as
if I add another one, it throws the same error, same for adding another
two...
		
		No doubt I've done something silly, but any help in finding
the tragic flaw would be much appreciated. I need to get a parsingResults
object out so I can learn how to work with the basic structure!
		
		Much regards,
		
		Liam Clarke
		
		
		
		On 7/21/05, Paul McGuire < paul at alanweberassociates.com
<mailto:paul at alanweberassociates.com> > wrote:

			Liam, Kent, and Danny -
			
			It sure looks like pyparsing is taking on a life of
its own!  I can see I no
			longer am the only one pitching pyparsing at some of
these applications!
			
			Yes, Liam, it is possible to create dictionary-like
objects, that is, 
			ParseResults objects that have named values in them.
I looked into your
			application, and the nested assignments seem very
similar to a ConfigParse
			type of structure.  Here is a pyparsing version that
handles the test data 
			in your original post (I kept Danny Yoo's recursive
list values, and added
			recursive dictionary entries):
			
			--------------------------
			import pyparsing as pp
			
			listValue = pp.Forward()
			listSeq = pp.Suppress ('{') +
pp.Group(pp.ZeroOrMore(listValue)) +
			pp.Suppress('}')
			listValue << (
pp.dblQuotedString.setParseAction(pp.removeQuotes) |
			                pp.Word(pp.alphanums) | listSeq )
			
			keyName = pp.Word( pp.alphas )
			
			entries = pp.Forward()
			entrySeq = pp.Suppress('{') +
pp.Group(pp.OneOrMore(entries)) +
			pp.Suppress('}')
			entries << pp.Dict(
			            pp.OneOrMore (
			                pp.Group( keyName + pp.Suppress('=')
+ (entrySeq |
			listValue) ) ) )
			--------------------------
			
			
			Dict is one of the most confusing classes to use,
and there are some
			examples in the examples directory that comes with
pyparsing (see 
			dictExample2.py), but it is still tricky.  Here is
some code to access your
			input test data, repeated here for easy reference:
			
			--------------------------
			testdata = """\
			country = {
			tag = ENG 
			ai = {
			flags = { }
			combat = { DAU FRA ORL PRO }
			continent = { }
			area = { }
			region = { "British Isles" "NorthSeaSea"
"ECAtlanticSea" "NAtlanticSea"
			"TagoSea" "WCAtlanticSea" } 
			war = 60
			ferocity = no
			}
			}
			"""
			parsedEntries = entries.parseString(testdata)
			
			def dumpEntries(dct,depth=0):
			    keys = dct.keys()
			    keys.sort()
			    for k in keys:
			        print ('  '*depth) + '- ' + k + ':', 
			        if isinstance(dct[k],pp.ParseResults):
			            if dct[k][0].keys():
			                print
			                dumpEntries(dct[k][0],depth+1)
			            else:
			                print dct[k][0]
			        else:
			            print dct[k]
			
			dumpEntries( parsedEntries )
			
			print
			print parsedEntries.country[0].tag
			print parsedEntries.country[0].ai[0].war
			print parsedEntries.country[0].ai[0].ferocity 
			--------------------------
			
			This will print out:
			
			--------------------------
			- country:
			  - ai:
			    - area: []
			    - combat: ['DAU', 'FRA', 'ORL', 'PRO']
			    - continent: []
			    - ferocity: no 
			    - flags: []
			    - region: ['British Isles', 'NorthSeaSea',
'ECAtlanticSea',
			'NAtlanticSea', 'TagoSea', 'WCAtlanticSea']
			    - war: 60
			  - tag: ENG
			
			ENG
			60
			No
			--------------------------
			
			But I really dislike having to dereference those
nested values using the
			0'th element.  So I'm going to fix pyparsing so that
in the next release,
			you'll be able to reference the sub-elements as:
			
			print parsedEntries.country.tag 
			print parsedEntries.country.ai.war
			print parsedEntries.country.ai.ferocity
			
			This *may* break some existing code, but Dict is not
heavily used, based on
			feedback from users, and this may make it more
useful in general, especially 
			when data parses into nested Dict's.
			
			Hope this sheds more light than confusion!
			-- Paul McGuire
			
			_______________________________________________
			Tutor maillist  -   Tutor at python.org
<mailto:Tutor at python.org> 
			http://mail.python.org/mailman/listinfo/tutor
			




		-- 
		'There is only one basic human right, and that is to do as
you damn well please. 
		And with it comes the only basic human duty, to take the
consequences.' 




	-- 
	'There is only one basic human right, and that is to do as you damn
well please.
	And with it comes the only basic human duty, to take the
consequences.' 




-- 
'There is only one basic human right, and that is to do as you damn well
please.
And with it comes the only basic human duty, to take the consequences.' 



More information about the Tutor mailing list