pyparsing Combine without merging sub-expressions
Steven Bethard
steven.bethard at gmail.com
Sat Jan 20 15:49:52 EST 2007
Within a larger pyparsing grammar, I have something that looks like::
wsj/00/wsj_0003.mrg
When parsing this, I'd like to keep around both the full string, and the
AAA_NNNN substring of it, so I'd like something like::
>>> foo.parseString('wsj/00/wsj_0003.mrg')
(['wsj/00/wsj_0003.mrg', 'wsj_0003'], {})
How do I go about this? I was using something like::
>>> digits = pp.Word(pp.nums)
>>> alphas = pp.Word(pp.alphas)
>>> wsj_name = pp.Combine(alphas + '_' + digits)
>>> wsj_path = pp.Combine(alphas + '/' + digits + '/' + wsj_name +
... '.mrg')
But of course then all I get back is the full path::
>>> wsj_path.parseString('wsj/00/wsj_0003.mrg')
(['wsj/00/wsj_0003.mrg'], {})
I could leave off the final Combine and add a parse action::
>>> wsj_path = alphas + '/' + digits + '/' + wsj_name + '.mrg'
>>> def parse_wsj_path(string, index, tokens):
... wsj_name = tokens[4]
... return ''.join(tokens), wsj_name
...
>>> wsj_path.setParseAction(parse_wsj_path)
>>> wsj_path.parseString('wsj/00/wsj_0003.mrg')
([('wsj/00/wsj_0003.mrg', 'wsj_0003')], {})
But that then allows whitespace between the pieces of the path, which
there shouldn't be::
>>> wsj_path.parseString('wsj / 00 / wsj_0003.mrg')
([('wsj/00/wsj_0003.mrg', 'wsj_0003')], {})
How do I make sure no whitespace intervenes, and still have access to
the sub-expression?
Thanks,
STeVe
More information about the Python-list
mailing list