a little parsing challenge ☺

Thomas 'PointedEars' Lahn PointedEars at web.de
Sun Jul 17 17:43:48 EDT 2011


Thomas 'PointedEars' Lahn wrote:

> It is possible [to parse the parentheses language], with Perl-compatible
> Regular Expressions (PCRE), provided that you have enough memory, to use
> such an extended Regular Expression (not to be confused with EREs³)⁴:
> 
>   \((([^()]*|(?R))*)\)
> 
> However, even Python 3.2 does not support those expressions (although it
> supports some other PCRE patterns, like named subexpressions)⁵, neither do
> standard and forked versions of sed(1) (BREs, EREs, using an NFA) nor awk
> (EREs, using a DFA or NFA).  [That is not to say it would not be possible
> with Python, or sed or awk (both of which are off-topic here), but that
> more than a Regular Expression would be required.]

Supplemental: Further research shows that the Python `re' module is not 
going to implement (PCRE) recursive Regular Expressions.  The maintainer's, 
Matthew Barnett's, argument (of 2009-03-24) is that such things are better 
left to modules such as `pyparsing' [1][2].

However, FWIW, here is the Python port of the start of a language parser 
originally written in (and for) ECMAScript:

import re

def matchingBraces(s):
    level = 0

    for match in re.finditer(r'[{}]', s):
        paren = match.group(0)

        if paren == "{":
            level += 1
        else:
            if level == 0: return False
            level -= 1

    return level == 0

As you can see, the theoretically necessary PDA¹ implementation can be 
simplified to a braces counter with range checks by iterative use of a 
Regular Expression.  Extensions to meet the "challenge" are left as an 
exercise to the reader.

It has also occurred to me that because parentheses (`(', `)') and brackets 
(`[', `]') have special meaning in Regular Expressions (grouping and 
character classes, respectively), you could escape all other special 
characters in a text and use the RE evaluator itself to find out whether 
they are properly nested, having it evaluate one large RE.  But I have not 
tested this idea yet.  (Obviously it cannot be used to satisfy the 
"challenge"'s condition that braces – `{', `}' – and other parenthesis-like 
characters are to be considered as well, and that the parenthesis-like 
characters are to be printed.)

______
¹ Pushdown automaton

References:
[1] <http://bugs.python.org/issue694374>
[2] <http://pyparsing.wikispaces.com/>
-- 
PointedEars

Bitte keine Kopien per E-Mail. / Please do not Cc: me.



More information about the Python-list mailing list