Pyparsing: Grammar Suggestion

Khoa Nguyen khoa.coffee at gmail.com
Wed May 17 13:57:53 EDT 2006


> record = f1,f2,...,fn END_RECORD
> All the f(i) has to be in that order.
> Any f(i) can be absent (e.g. f1,,f3,f4,,f6 END_RECORD)
> Number of f(i)'s can vary. For example, the followings are allowed:
> f1,f2 END_RECORD
> f1,f2,,f4,,f6 END_RECORD
>
> Any suggestions?
>

>
> --------
> pyparsing includes a built-in expression, commaSeparatedList, for just such
> a case.  Here is a simple pyparsing program to crack your input text:
>
>
> data = """f1,f2,f3,f4,f5,f6 END_RECORD
> f1,f2 END_RECORD
> f1,f2,,f4,,f6 END_RECORD"""
>
> from pyparsing import commaSeparatedList
>
> for tokens,start,end in commaSeparatedList.scanString(data):
>     print tokens
>
>
> This returns:
> ['f1', 'f2', 'f3', 'f4', 'f5', 'f6 END_RECORD']
> ['f1', 'f2 END_RECORD']
> ['f1', 'f2', '', 'f4', '', 'f6 END_RECORD']
>
> Note that consecutive commas in the input return empty strings at the
> corresponding places in the results.
>
> Unfortunately, commaSeparatedList embeds its own definition of what is
> allowed between commas, so the last field looks like it always has
> END_RECORD added to the end.  We could copy the definition of
> commaSeparatedList and exclude this, but it is simpler just to add a parse
> action to commaSeparatedList, to remove END_RECORD from the -1'th list
> element:
>
> def stripEND_RECORD(s,l,t):
>     last = t[-1]
>     if last.endswith("END_RECORD"):
>         # return a copy of t with last element trimmed of "END_RECORD"
>         return t[:-1] + [last[:-(len("END_RECORD"))].rstrip()]
>
> commaSeparatedList.setParseAction(stripEND_RECORD)
>
>
> for tokens,start,end in commaSeparatedList.scanString(data):
>     print tokens
>
>
> This returns:
>
> ['f1', 'f2', 'f3', 'f4', 'f5', 'f6']
> ['f1', 'f2']
> ['f1', 'f2', '', 'f4', '', 'f6']
>

Thanks for your reply. This looks promising, but I have a few more questions:
1. If f(i) is non-terminal (e.g f(i) is another grammar expression),
how would I adapt your idea to a more generic way?
2. The field delimiter is not always ',' in my case. So I guess I'll
have to use delimtedList instead?

Thanks again,
Khoa



More information about the Python-list mailing list