pyparsing Catch-22

Paul McGuire ptmcg at austin.rr.com
Mon Apr 16 08:25:54 EDT 2007


On Apr 16, 3:27 am, "7stud" <bbxx789_0... at yahoo.com> wrote:
> <sample problem snipped>
> Any tips?

7stud -

Here is the modified code, followed by my comments.

Oh, one general comment - you mention that you are quite facile with
regexp's.  pyparsing has a slightly different philosophy from that of
regular expressions, especially in the areas of whitespace skipping
and backtracking.  pyparsing will automatically skip whitespace
between parsing expressions, whereas regexp's require explicit
'\s*' (unless you specify the magic "whitespace between elements
allowed" attribute which I don't remember its magic attribute
character at the moment, but I rarely see regexp examples use it).
And pyparsing is purely a left-to-right recursive descent parser
generator.  It wont look ahead to the next element past a repetition
operation to see when to stop repeating.  There's an FAQ on this on
the wiki.

------------------
from pyparsing import Word, alphas, commaSeparatedList, delimitedList,
sglQuotedString, removeQuotes

name = Word(alphas)
lookFor = name + "=" + "[" + commaSeparatedList + "]"

# comment #0
my_file = """\
mara = [
'U2FsdGVkX185IX5PnFbzUYSKg+wMyYg9',
'U2FsdGVkX1+BCxltXVTQ2+mo83Si9oAV0sasmIGHVyk=',
'U2FsdGVkX18iUS8hYBXgyWctqpWPypVz6Fj49KYsB8s='
]"""
my_file = "".join(my_file.splitlines())
# uncomment next line once debugging of grammar is finished
# my_file = open("aaa.txt").read()


# comment #1
#~ my_file = open("aaa.txt")
#~ for line in my_file:
for line in [my_file,]:
    alist = lookFor.parseString(line)

globals()[alist[0] ] = [ alist[3].strip("'"), alist[4].strip("'"),
alist[5].strip("'") ]


# comment #2
def stripSingleQuotes(s):
    return s.strip("'")
globals()[alist[0] ] = map(stripSingleQuotes, alist[3:-1] )

print mara[2]
mara = None


# comment #3
lookFor = name.setResultsName("var") + "=" + "[" + \
    commaSeparatedList.setResultsName("listValues") + "]"
alist = lookFor.parseString(my_file)

# evaluate parsed assignment
globals()[ alist.var ] = map(stripSingleQuotes, alist.listValues )
print len(mara), mara[1]


# comment #4
lookFor = name.setResultsName("var") + "=" + "[" + \
    delimitedList( sglQuotedString.setParseAction(removeQuotes) )\
    .setResultsName("listValues") + "]"

alist = lookFor.parseString(my_file)
globals()[ alist.var ] = list( alist.listValues )
print len(mara), mara[1]

------------------
Comment #0:
When I am debugging a pyparsing application, I find it easier to embed
the input text, or a subset of it, into the program itself using a
triple-quoted string.  Then later, I'll go back and change to reading
data from an input file.  Purely a matter of taste, but it simplifies
posting to mailing lists and newsgroups.

Comment #1:
Since you are going line by line in reading the input file, be *sure*
you have the complete assignment expression on each line.  Since
pyparsing will read past line breaks for you, and since your input
file contains only this one assignment, you might be better off
calling parseString with: alist =
lookFor.parseString( my_file.read() )

Comment #2:
Your assignment of the "mara" global is a bit clunky on two fronts:
- the explicit accessing of elements 3,4, and 5
- the repeated calls to strip("'")
You can access the pyparsing returned tokens (passed as a ParseResults
object) using slices.  In your case, you want the elements 3 through
n-1, so alist[3:-1] will give you this.  It's nice to avoid hard-
coding things like list lengths and numbers of list elements.  Note
that you can also use len to find out the length of the list.

As for calling strip("'") for each of these elements, have you learned
to use Python's map built-in yet?  Define a function or lambda that
takes a single element, return from the function what you want done
with that element, and then call map with that function, and the list
you want to process.  This modified version of your call is more
resilient than the original.

Comment #3:
Personally, I am not keen on using too much explicit indexing into the
returned results.  This is another area where pyparsing goes beyond
typical lexing and tokenizing.  Just as you can assign names to fields
in regexp's, pyparsing allows you to give names to elements within the
parsed results.  You can then access these by name, using either dict
or object attribute syntax.  This gets rid of most if not all of the
magic numbers from your code, and makes it yet again more resilient in
light of changes in the future.  (Say for example you decided to
suppress the "=", "[", and "]" punctuation from the parsed results.
The parsing logic would remain the same, but the returned tokens would
contain only the significant content, the variable name and list
contents.  Using explicit list indexing would force you to renumber
the list elements you are extracting, but with results names, no
change would be required.)

Comment #4:
I thought I'd show you an alternative to commaSeparatedList, called
delimitedList.  delimitedList is a method that gives you more control
over the elements you expect to find within the list, and what to do
with them when you find them.  delimitedList is a constructor method
that takes a pyparsing expression 'expr' and expands it to 'expr +
ZeroOrMore(Suppress(",") + expr)'.  You can also change the delimiter
from ',' to some other character, or even to a pyparsing expression.
Pyparsing includes predefined expressions for some common text
patterns, such as single and double quoted strings, and comments of
various forms.  Look for a directory called htmldoc in your pyparsing
directory tree, and open the index.html file there to look through the
classes and methods defined for you in pyparsing.  Or just type
"help(pyparsing)" in the Python interpreter (after typing "import
pyparsing" first, of course).

Now that we have access to the expression defined to be matched within
the list, we can attach a parse action.  A parse action will get run
against the matched tokens at parse time, and can be used to modify
the matched data before continuing.  In this example, we'd like to
remove those annoying opening and closing quotation marks.  Again,
this is such a common task that pyparsing includes a built-in for
this, called removeQuotes.  It is equivalent to the following:

removeQuotes = lambda tokens : tokens[0][1:-1]

What?!  No verifying that the first and last characters are in fact
quotation marks?  Nope.  Another part of the pyparsing philosophy is
that parse actions *know* that they will only be called with text that
matches their associated input pattern.  removeQuotes is a parse
action that *knows* that the string passed to it will have opening and
closing "'" or '"' characters.  You'll also see this quite often when
parsing integers:

integer = Word(nums).setParseAction(lambda toks: int(toks[0]))

No testing for "are the characters all numeric?" or trapping for
ValueError.  We *know* that the only time this lambda will be invoked
is after having matched a word group composed only of numeric digits.

Any way, to wrap up this comment, now that we have attached a parse
action to remove the "'" characters as we parse, the listValues field
is ready to use as is from the parseString method, without having to
clutter our code up with maps, or lambdas, or other post-processing
junk.

Enjoy!

-- Paul





More information about the Python-list mailing list