regular expression problem

MRAB python at mrabarnett.plus.com
Sun Oct 28 19:14:15 EDT 2018


On 2018-10-28 21:04, Karsten Hilbert wrote:
> On Sun, Oct 28, 2018 at 09:43:27PM +0100, Karsten Hilbert wrote:
> 
>> Let my try to explain the expression I am actually after
>> (assuming .compile with re.VERBOSE):
>> 
>> rx_works = '
>> 	\$<				# start of match is literal '$<' anywhere inside string
>> 	[^<:]+?::		# followed by at least one "character", except '<' or ':', until the next '::'		(this is the placeholder "name")
>> 	.*?::			# followed by any number of any "character", until the next '::'					(this is the placeholder "options")
>> 	\d*?			# followed by any number of digits													(the max length of placeholder output)
>> 	>\$				# followed by '>$'
>> 	|				# -- OR (in *either* order) --
>> 	\$<				# start of match is literal '$<' anywhere inside string
>> 	[^<:]+?::		# followed by at least one "character", except '<' or ':', until the next '::'		(this is the placeholder "name")
>> 	.*?::			# followed by any number of any "character", until the next '::'					(this is the placeholder "options")
>> 					# now the difference:
>> 	\d+-\d+			# followed by one-or-many digits, a '-', and one-or-many digits						(this is the *range* from with placeholder output)
>> 	>\$'			# followed by '>$'
> 
> Another try:
> 
> - lines can contain several placeholders
> 
> - placeholders start and end with '$'
> 
> - placeholders are parsed in three passes
> 
> - the pass in which a placeholder is parsed is denoted by the number of '<' and '>' next to the '$':
> 
> 	$<...>$ / $<<...>>$ / $<<<...>>>$
> 
> - placeholders for different parsing passes must be nestable:
> 
> 	$<<<...$<...>$...>>>$
> 	....
> 	(lower=earlier parsing passes will be inside)
> 
> - the internal structure is "name::options::range"
> 
> 	$<name::options::range>$
> 
> - name will *not* contain '$' '<' '>' ':'
> 
> - range can be either a length or a "from-until"
> 
> - a length will be a positive integer (no bounds checking)
> 
> - "from-until" is: a positive integer, a '-', and a positive integer (no sanity checking)
> 
> - options needs to be able to contain nearly anything, except '::'
> 
> 
> Is that sufficiently defined and helpful to design the regular expression ?
> 
How can they be nested inside one another?
Is the string scanned, placeholders filled in for that level, and then 
the string scanned again for the next level? (That would mean that the 
fill value itself will be scanned in the next pass.)

You could try matching the top level, for each match then match the next 
level, and for each of those matches then match for the final level.

Trying to do it all in one regex is usually a bad idea. Keep it simple! 
(Do you even need to use a regex?)



More information about the Python-list mailing list