Pyparsing: Grammar Suggestion. 2nd thought

Paul McGuire ptmcg at austin.rr._bogus_.com
Wed May 17 15:24:30 EDT 2006


"Khoa Nguyen" <khoa.coffee at gmail.com> wrote in message
news:mailman.5827.1147889120.27775.python-list at python.org...
>
> for tokens,start,end in commaSeparatedList.scanString(data):
>     print tokens
>
>
> This returns:
>
> ['f1', 'f2', 'f3', 'f4', 'f5', 'f6']
> ['f1', 'f2']
> ['f1', 'f2', '', 'f4', '', 'f6']
>

<snip>

> On 2nd thought, I don't think this will check for the correct order of
> the fields. For example, the following would be incorrectly accepted:
>
> f1,f5,f2 END_RECORD
>
> Thanks,
> Khoa

Well, what are the rules for the comma-separated entries?  Are they
distinguished by type, or are they in ascending lexical or arithmetic order,
or by ascending length?

Two approaches you can take:
- if at parse time you can determine if f5 is out of position because it is
a specific type, then you can define your grammar like:

Optional(f1SpecificFormat) + "," + Optional(f2SpecificFormat) + "," + ...
and so on.

Then f5 would only match if in the fifth position.

Or, if even the commas are optional (as in f2,f5 END_RECORD), then you would
need a grammar such as:

Optional(f1SpecificFormat) + Optional(Optional(",") + f2SpecificFormat) +
... + "END_RECORD"

- if f5 is out of order because it is followed by f2, but would have been ok
if followed only by f6-fN values, then you'll need to read everything in,
and then test for validity, most easily in a parse action.  If the
validation rule fails, then have the parse action raise a ParseException, so
that the match would be rejected.

-- Paul





More information about the Python-list mailing list