[Tutor] pattern expressions
spir
denis.spir at free.fr
Fri Nov 7 16:58:11 CET 2008
Hello,
I'm learning to use parsers: trying pyParsing, construct and simpleparse to
have a better overview. I know a bit regular expressions and rather used to
BNF-like formats such as used for specification of languages. But I have never
really employed them personly, so the following may be trivial. Below is used a
BNF dialect that I think is clear and non-ambiguous.
format_code := '+' | '-' | '*' | '#'
I need to specify that a single, identical, format_code code may be repeated.
Not that a there may be several one on a sequence.
format := (format_code)+
would catch '+-', which is wrong. I want only patterns such as '--', '+++',...
style_code := '/' | '!' | '_'
Similar case, but different. I want patterns like:
styled_text := style plain_text style
where both style instances are identical. As the number of styles may grow (and
even be impredictable: the style_code line will actually be written at runtime
according to a config file) I don't want, and anyway can't, specify all
possible kinds of styled_text. Even if possible, it would be ugly!
I would like to specify a "side-condition" for a pattern, meaning that it
should only when a specific token lies aside. For instance:
A := A_pattern {X}
X is not part of the pattern, thus should not be extracted. If X is just
"garbage", I can write an enlarged pattern, then let it down later:
A := A_pattern
A_X := A X
If X itself is a token, I can write a super pattern, then extract both items
from the combination, and let down As that come alone:
X := X_pattern
A := A_pattern
A_X := A X
But what if X is part of another production? For example:
B := X B_end_pattern
A_X := A X
I tried it, but I can't get X in both productions. So that I catch either B or
A_X -- according to mysterious priority rules I don't fully understand (it
seems to be neither the longest string, nor the first written pattern, by
pyParsing).
Now, precisely, what about priority? I mean ambiguous cases, when an actual
production can match several patterns. Parsers have tricks, rules, or explicit
features to cope with such cases, but, as I understand it, these apply during
or after the parsing process, as additional treatment. Is there a way to
specify priority in the grammar itself?
Denis
More information about the Tutor
mailing list