[Doc-SIG] References in the same line as the target text

David Goodger goodger@users.sourceforge.net
Fri, 05 Jul 2002 19:10:24 -0400


[David Goodger:]
>> The "Inliner" class has to use one large regular expression.  If we
>> have some text like this::
>> 
>>     Here is an ``inline **literal**``.
>> 
>> If we check for "strong" (**) first, the result will be wrong.  No
>> ordering would get it right for all constructs.  We have to check
>> for each start-string simultaneously, because there are no
>> precedence rules (almost); first occurrence from left to right in
>> the text is the determinant.

[Simon Budig:]
> This is why I meant that it might be necessary to remember which
> match starts first. To emulate the behaviour of a big regex we have
> to match against all regexes, check which one starts closest to the
> beginning of the string and if this is ambigous check, which one is
> the longest match.
> 
> Advantage: This would immediately give the matching construct.

But at what cost?  Sounds very complex.  It ain't broke.  Why fix it?

Let's just use the big regexp, and not try to emulate it.

>> But that idea is close to the solution I'm thinking of.  My idea
>> is to break up the one huge regexp into several lists of
>> individual regexps, one list per construct/regexp type (find
>> start-string only, find the whole construct, etc.), and join them
>> dynamically into compound OR-groups, building the large regexp
>> from components at runtime.  Dynamic syntax directives can install
>> new regexps and rebuild the master regexp.
> 
> The advantage of this approach is that it might be a bit more quick
> since it is inside a single regular expression. It makes it a bit
> harder to detect what actually was the matching regex. Of course
> this is doable via
> ((?P<regex1>blablabla)|(?P<regex2>blu(?P<data>b*)lubb)) and then
> check, which of the named groups regex1 or regex2 matches.  It might
> be a problem because you have to be careful with the naming of
> additional groups in the different regexes to avoid conflicts.

If it ever does become a problem, we'll deal with it.  Until then, I
don't see the point of redesigning something that works well.  I don't
think we'll be adding much more to the regexp, so I don't anticipate
running into name clashes any time soon.

If you think it's worth doing though, please try it and show us.

-- 
David Goodger  <goodger@users.sourceforge.net>  Open-source projects:
  - Python Docutils: http://docutils.sourceforge.net/
    (includes reStructuredText: http://docutils.sf.net/rst.html)
  - The Go Tools Project: http://gotools.sourceforge.net/