regular expression problem

Karsten Hilbert Karsten.Hilbert at gmx.net
Sun Oct 28 16:43:27 EDT 2018


Now that MRAB has shown me the follies of my ways I would
like to learn how to properly write the regular expression I
need.

This part:

> rx_works = '\$<[^<:]+?::.*?::\d*?>\$|\$<[^<:]+?::.*?::\d+-\d+>\$'
> # it fails if switched around:
> rx_fails = '\$<[^<:]+?::.*?::\d+-\d+>\$|\$<[^<:]+?::.*?::\d*?>\$'

suggests that I already have a solution. However, in reality this line:

> line = 'junk  $<match_A::options A::4>$  junk  $<match_B::options B::4-5>$  junk'

can be either way round (match_A, then match_B or the vice
versa) which, in turn, will switch the rx_works/rx_fails.

Let my try to explain the expression I am actually after
(assuming .compile with re.VERBOSE):

rx_works = '
	\$<				# start of match is literal '$<' anywhere inside string
	[^<:]+?::		# followed by at least one "character", except '<' or ':', until the next '::'		(this is the placeholder "name")
	.*?::			# followed by any number of any "character", until the next '::'					(this is the placeholder "options")
	\d*?			# followed by any number of digits													(the max length of placeholder output)
	>\$				# followed by '>$'
	|				# -- OR (in *either* order) --
	\$<				# start of match is literal '$<' anywhere inside string
	[^<:]+?::		# followed by at least one "character", except '<' or ':', until the next '::'		(this is the placeholder "name")
	.*?::			# followed by any number of any "character", until the next '::'					(this is the placeholder "options")
					# now the difference:
	\d+-\d+			# followed by one-or-many digits, a '-', and one-or-many digits						(this is the *range* from with placeholder output)
	>\$'			# followed by '>$'

I want this to work for

	any number of matches

	in any order of max-lenght or output-range

inside one string.

Now, why the [^<:]+? dance ?

Because three levels of placeholders

	$<...::...::>$
	$<<...::...::>>$
	$<<<...::...::>>>$

need to be nestable inside each other ;-)

Anyone able to help ?

This seems beyond my current grasp of regular expressions.

Thanks,
Karsten
-- 
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



More information about the Python-list mailing list