regular expression problem

Karsten Hilbert Karsten.Hilbert at gmx.net
Sun Oct 28 17:04:39 EDT 2018


On Sun, Oct 28, 2018 at 09:43:27PM +0100, Karsten Hilbert wrote:

> Let my try to explain the expression I am actually after
> (assuming .compile with re.VERBOSE):
> 
> rx_works = '
> 	\$<				# start of match is literal '$<' anywhere inside string
> 	[^<:]+?::		# followed by at least one "character", except '<' or ':', until the next '::'		(this is the placeholder "name")
> 	.*?::			# followed by any number of any "character", until the next '::'					(this is the placeholder "options")
> 	\d*?			# followed by any number of digits													(the max length of placeholder output)
> 	>\$				# followed by '>$'
> 	|				# -- OR (in *either* order) --
> 	\$<				# start of match is literal '$<' anywhere inside string
> 	[^<:]+?::		# followed by at least one "character", except '<' or ':', until the next '::'		(this is the placeholder "name")
> 	.*?::			# followed by any number of any "character", until the next '::'					(this is the placeholder "options")
> 					# now the difference:
> 	\d+-\d+			# followed by one-or-many digits, a '-', and one-or-many digits						(this is the *range* from with placeholder output)
> 	>\$'			# followed by '>$'

Another try:

- lines can contain several placeholders

- placeholders start and end with '$'

- placeholders are parsed in three passes

- the pass in which a placeholder is parsed is denoted by the number of '<' and '>' next to the '$':

	$<...>$ / $<<...>>$ / $<<<...>>>$

- placeholders for different parsing passes must be nestable:

	$<<<...$<...>$...>>>$
	....
	(lower=earlier parsing passes will be inside)

- the internal structure is "name::options::range"

	$<name::options::range>$

- name will *not* contain '$' '<' '>' ':'

- range can be either a length or a "from-until"

- a length will be a positive integer (no bounds checking)

- "from-until" is: a positive integer, a '-', and a positive integer (no sanity checking)

- options needs to be able to contain nearly anything, except '::'


Is that sufficiently defined and helpful to design the regular expression ?

Karsten
-- 
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



More information about the Python-list mailing list