[Python-Dev] one last SRE headache

Andrew Kuchling akuchlin@mems-exchange.org
Thu, 31 Aug 2000 15:46:03 -0400


On Thu, Aug 31, 2000 at 09:46:54PM +0200, Fredrik Lundh wrote:
>can anyone tell me how Perl treats this pattern?
>    r'((((((((((a))))))))))\41'

>if I understand this correctly, Perl treats as an *octal* escape
>(chr(041) == "!").

Correct.  From perlre:

       You may have as many parentheses as you wish.  If you have more
       than 9 substrings, the variables $10, $11, ... refer to the
       corresponding substring.  Within the pattern, \10, \11,
       etc. refer back to substrings if there have been at least that
       many left parentheses before the backreference.  Otherwise (for
       backward compatibility) \10 is the same as \010, a backspace,
       and \11 the same as \011, a tab.  And so on.  (\1 through \9
       are always backreferences.)  

In other words, if there were 41 groups, \41 would be a backref to
group 41; if there aren't, it's an octal escape.  This magical
behaviour was deemed not Pythonic, so pre uses a different rule: it's
always a character inside a character class ([\41] isn't a syntax
error), and outside a character class it's a character if there are
exactly 3 octal digits; otherwise it's a backref.  So \41 is a backref
to group 41, but \041 is the literal character ASCII 33.

--amk