Getting pyparsing to backtrack

John Nagle nagle at animats.com
Mon Jul 5 18:19:53 EDT 2010


   I'm working on street address parsing again, and I'm trying to deal
with some of the harder cases.

   Here's a subparser, intended to take in things like "N MAIN" and 
"SOUTH", and break out the "directional" from street name.

Directionals =  ['southeast', 'northeast', 'north', 'northwest',
	 'west', 'east', 'south', 'southwest', 'SE', 'NE', 'N', 'NW',
	 'W', 'E', 'S', 'SW']

direction = Combine(MatchFirst(map(CaselessKeyword, directionals)) + 
Optional(".").suppress())
	
streetNameParser = Optional(direction.setResultsName("predirectional")) 
+ Combine(OneOrMore(Word(alphanums)),
	adjacent=False, joinString=" ").setResultsName("streetname")



This parses something like "N WEBB" fine; "N" is the "predirectional",
and "WEBB" is the street name.

"SOUTH" (which, when not followed by another word, is a streetname,
not a predirectional), raises a parsing exception:

  Street address line parse failed for SOUTH : Expected W:(abcd...)
   (at  char 5), (line:1, col:6)

The problem is that "direction" matched SOUTH, and even though
"direction" is within an "Optional" and followed by another word,
the parser didn't back up when it hit the end of the expression
without satisfying the OneOrMore clause.

Pyparsing does some backup, but I'm not clear on how much,
or how to force it to happen.  There's some discussion at
"http://www.mail-archive.com/python-list@python.org/msg169559.html".
Apparently the "Or" operator will force some backup, but it's not
clear how much lookahead and backtracking is supported.

				John Nagle



More information about the Python-list mailing list