How to decipher :re.split(r"(\(\([^)]+\)\))" in the example

Steven D'Aprano steve at pearwood.info
Fri Jul 11 05:04:02 EDT 2014


On Thu, 10 Jul 2014 23:33:27 -0400, Roy Smith wrote:

> In article <mailman.11747.1405046292.18130.python-list at python.org>,
>  Tim Chase <python.list at tim.thechases.com> wrote:
> 
>> On 2014-07-10 22:18, Roy Smith wrote:
>> > > Outside this are \( and \): these are literal opening and closing
>> > > bracket characters. So:
>> > > 
>> > >    \(\([^)]+\)\)
>> >
>> > although, even better would be to use to utterly awesome
>> >> re.VERBOSE
>> > flag, and write it as:
>> > 
>> >      \({2} [^)]+ \){2}
>> 
>> Or heck, use a multi-line verbose expression and comment it for
>> clarity:
>> 
>>   r = re.compile(r"""
>>     (            # begin a capture group
>>      \({2}       # two literal "(" characters [^)]+       # one or more
>>      non-close-paren characters \){2}       # two literal ")"
>>      characters
>>     )            # close the capture group """, re.VERBOSE)
>> 
>> -tkc
> 
> Ugh.  That reminds me of the classic commenting anti-pattern:

The sort of dead-simple commenting shown below is not just harmless but 
can be *critically important* for beginners, who otherwise may not know 
what "l = []" means.

> l = []                  # create an empty list 
> for i in range(10):     # iterate over the first 10 integers
>     l.append(i)         # append each one to the list


The difference is, most people get beyond that level of competence in a 
matter of a few weeks or months, whereas regexes are a different story. 

(1) It's possible to have spent a decade programming in Python without 
ever developing more than a basic understanding of regexes. Regular 
expressions are a specialist mini-language for a specialist task, and one 
might go months or even *years* between needing to use them.

(2) We're *Python* programmers, not *Regex* programmers, so regular 
expressions are as much a foreign language to us as Perl or Lisp or C 
might be. (And if you personally read any of those languages, 
congratulations. How about APL, J, REBOL, Smalltalk, Forth, or PL/I?)

(3) The syntax for regexes is painfully terse and violates a number of 
import rules of good design. Larry Wall has listed no fewer than 19 
problems with regex syntax/culture:

http://perl6.org/archive/doc/design/apo/A05.html


So all things considered, for the average Python programmer who has a 
basic understanding of regexes but has to keep turning to the manual to 
find out how to do even simple things, comments explaining what the regex 
does is an excellent idea.



-- 
Steven



More information about the Python-list mailing list