regular expression problem

MRAB python at mrabarnett.plus.com
Mon Oct 29 13:16:11 EDT 2018


On 2018-10-29 08:02, Karsten Hilbert wrote:
> On Sun, Oct 28, 2018 at 11:14:15PM +0000, MRAB wrote:
> 
>> > - lines can contain several placeholders
>> > 
>> > - placeholders start and end with '$'
>> > 
>> > - placeholders are parsed in three passes
>> > 
>> > - the pass in which a placeholder is parsed is denoted by the number of '<' and '>' next to the '$':
>> > 
>> > 	$<...>$ / $<<...>>$ / $<<<...>>>$
>> > 
>> > - placeholders for different parsing passes must be nestable:
>> > 
>> > 	$<<<...$<...>$...>>>$
>> > 	....
>> > 	(lower=earlier parsing passes will be inside)
>> > 
>> > - the internal structure is "name::options::range"
>> > 
>> > 	$<name::options::range>$
>> > 
>> > - name will *not* contain '$' '<' '>' ':'
>> > 
>> > - range can be either a length or a "from-until"
>> > 
>> > - a length will be a positive integer (no bounds checking)
>> > 
>> > - "from-until" is: a positive integer, a '-', and a positive integer (no sanity checking)
>> > 
>> > - options needs to be able to contain nearly anything, except '::'
>> > 
>> > 
>> > Is that sufficiently defined and helpful to design the regular expression ?
>> > 
>> How can they be nested inside one another?
>> Is the string scanned, placeholders filled in for that level, and then the
>> string scanned again for the next level? (That would mean that the fill
>> value itself will be scanned in the next pass.)
> 
> Exactly. But *different* levels can be nested inside each other.
> 
>> You could try matching the top level, for each match then match the next
>> level, and for each of those matches then match for the final level.
> 
> So I do.
> 
>> Trying to do it all in one regex is usually a bad idea.
> 
> Right, I am not trying to do that. I was, however, worried
> that I need to make the expression not "trip over" fragments
> of what might seem to constitute part of another placeholder.
> 
> 	$<<ph_1::option=$<ph_2::option=3::10>$::15>>$
> 
> Pass 1 might fill in to:
> 
> 	$<<ph_1::option=3 '>s'::15>>$
> 
> and I was worried to make sure the second pass does not stop here:
> 
> 	$<<ph_1::option=3 '>s'::15>>$
>                         ^
> 
> Logically it should not because
> 
> 	>s'::15>>$
> 
> does not match
> 
> 	::\d*>>$
> 
> but I am not sure how to tell it that :-)
> 
For something like that, I'd use parsing by recursive descent.

It might be worth looking at pyparsing.



More information about the Python-list mailing list