regular expression problem

Karsten Hilbert Karsten.Hilbert at gmx.net
Mon Oct 29 07:04:22 EDT 2018


On Sun, Oct 28, 2018 at 11:57:48PM +0100, Brian Oney wrote:

> On Sun, 2018-10-28 at 22:04 +0100, Karsten Hilbert wrote:
> > [^<:]
> 
> Would a simple regex work?

This brought about the solution.

However, not this way:

> >>> import re
> >>> t = '$<name::options::range>$'
> >>> re.findall('[^<>:$]+', t)
> ['name', 'options', 'range']

because I am not trying to parcel out the placeholder *parts*
(but rather the placeholders from a given line).

I eventually figured that denoting the parsing stages
differently made for easier matching. Rather than

	$<>$
	$<<>>$
	$<<<>>>$

do this

	$1<>1$
	$2<>2$
	$3<>3$

which makes it way less ambiguous, and more matchable:

regexen = [
	r'\$1{0,1}<[^<].*?>1{0,1}\$',
	r'\$2<[^<].*?>2\$',
	r'\$3<[^<].*?>3\$'
]

The [^<] part ("the single < is NOT to be followed directly
by another <") is actually superfluous but does protect
against legacy document templates still having
$<<(<)...(>)>>$ in them.

$<>$ is still retained as an alias for $1<>1$ because there is
A LOT of them in existing document templates. It is
normalized explicitely inside Python before fillin values are
generated.

Karsten
-- 
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



More information about the Python-list mailing list