a little parsing challenge ☺
Thomas 'PointedEars' Lahn
PointedEars at web.de
Sun Jul 17 17:43:48 EDT 2011
Thomas 'PointedEars' Lahn wrote:
> It is possible [to parse the parentheses language], with Perl-compatible
> Regular Expressions (PCRE), provided that you have enough memory, to use
> such an extended Regular Expression (not to be confused with EREs³)⁴:
>
> \((([^()]*|(?R))*)\)
>
> However, even Python 3.2 does not support those expressions (although it
> supports some other PCRE patterns, like named subexpressions)⁵, neither do
> standard and forked versions of sed(1) (BREs, EREs, using an NFA) nor awk
> (EREs, using a DFA or NFA). [That is not to say it would not be possible
> with Python, or sed or awk (both of which are off-topic here), but that
> more than a Regular Expression would be required.]
Supplemental: Further research shows that the Python `re' module is not
going to implement (PCRE) recursive Regular Expressions. The maintainer's,
Matthew Barnett's, argument (of 2009-03-24) is that such things are better
left to modules such as `pyparsing' [1][2].
However, FWIW, here is the Python port of the start of a language parser
originally written in (and for) ECMAScript:
import re
def matchingBraces(s):
level = 0
for match in re.finditer(r'[{}]', s):
paren = match.group(0)
if paren == "{":
level += 1
else:
if level == 0: return False
level -= 1
return level == 0
As you can see, the theoretically necessary PDA¹ implementation can be
simplified to a braces counter with range checks by iterative use of a
Regular Expression. Extensions to meet the "challenge" are left as an
exercise to the reader.
It has also occurred to me that because parentheses (`(', `)') and brackets
(`[', `]') have special meaning in Regular Expressions (grouping and
character classes, respectively), you could escape all other special
characters in a text and use the RE evaluator itself to find out whether
they are properly nested, having it evaluate one large RE. But I have not
tested this idea yet. (Obviously it cannot be used to satisfy the
"challenge"'s condition that braces – `{', `}' – and other parenthesis-like
characters are to be considered as well, and that the parenthesis-like
characters are to be printed.)
______
¹ Pushdown automaton
References:
[1] <http://bugs.python.org/issue694374>
[2] <http://pyparsing.wikispaces.com/>
--
PointedEars
Bitte keine Kopien per E-Mail. / Please do not Cc: me.
More information about the Python-list
mailing list