pyparsing Combine without merging sub-expressions

Paul McGuire ptmcg at austin.rr.com
Sun Jan 21 17:15:33 EST 2007


Steven Bethard wrote:
> Within a larger pyparsing grammar, I have something that looks like::
>
>      wsj/00/wsj_0003.mrg
>
> When parsing this, I'd like to keep around both the full string, and the
> AAA_NNNN substring of it, so I'd like something like::
>
>      >>> foo.parseString('wsj/00/wsj_0003.mrg')
>      (['wsj/00/wsj_0003.mrg', 'wsj_0003'], {})
>
> How do I go about this? I was using something like::
>
>      >>> digits = pp.Word(pp.nums)
>      >>> alphas = pp.Word(pp.alphas)
>      >>> wsj_name = pp.Combine(alphas + '_' + digits)
>      >>> wsj_path = pp.Combine(alphas + '/' + digits + '/' + wsj_name +
>      ... '.mrg')
>
> But of course then all I get back is the full path::
>
>      >>> wsj_path.parseString('wsj/00/wsj_0003.mrg')
>      (['wsj/00/wsj_0003.mrg'], {})
>
The tokens are what the tokens are, so if you want to replicate a
sub-field, then you'll need a parse action to insert it into the
returned tokens.  BUT, if all you want is to be able to easily *access*
that sub-field, then why not give it a results name?  Like this:

wsj_name = pp.Combine(alphas + '_' + digits).setResultsName("name")

Leave everything else the same, but now you can access the name field
independently from the rest of the combined tokens.

result = wsj_path.parseString('wsj/00/wsj_0003.mrg')
print result.dump()
print result.name
print result.asList()

-- Paul




More information about the Python-list mailing list