[Tutor] pattern expressions

spir denis.spir at free.fr
Fri Nov 7 16:58:11 CET 2008


Hello,

I'm learning to use parsers: trying pyParsing, construct and simpleparse to 
have a better overview. I know a bit regular expressions and rather used to 
BNF-like formats such as used for specification of languages. But I have never 
really employed them personly, so the following may be trivial. Below is used a 
BNF dialect that I think is clear and non-ambiguous.

format_code	:= '+' | '-' | '*' | '#'
I need to specify that a single, identical, format_code code may be repeated. 
Not that a there may be several one on a sequence.
format		:= (format_code)+
would catch '+-', which is wrong. I want only patterns such as '--', '+++',...

style_code	:= '/' | '!' | '_'
Similar case, but different. I want patterns like:
styled_text	:= style plain_text style
where both style instances are identical. As the number of styles may grow (and 
even be impredictable: the style_code line will actually be written at runtime 
according to a config file) I don't want, and anyway can't, specify all 
possible kinds of styled_text. Even if possible, it would be ugly!

I would like to specify a "side-condition" for a pattern, meaning that it 
should only when a specific token lies aside. For instance:
A	:= A_pattern {X}
X is not part of the pattern, thus should not be extracted. If X is just 
"garbage", I can write an enlarged pattern, then let it down later:
A	:= A_pattern
A_X	:= A X
If X itself is a token, I can write a super pattern, then extract both items 
from the combination, and let down As that come alone:
X	:= X_pattern
A	:= A_pattern
A_X	:= A X
But what if X is part of another production? For example:
B	:= X B_end_pattern
A_X	:= A X
I tried it, but I can't get X in both productions. So that I catch either B or 
A_X -- according to mysterious priority rules I don't fully understand (it 
seems to be neither the longest string, nor the first written pattern, by 
pyParsing).

Now, precisely, what about priority? I mean ambiguous cases, when an actual 
production can match several patterns. Parsers have tricks, rules, or explicit 
features to cope with such cases, but, as I understand it, these apply during 
or after the parsing process, as additional treatment. Is there a way to 
specify priority in the grammar itself?

Denis



More information about the Tutor mailing list