How to decipher :re.split(r"(\(\([^)]+\)\))" in the example
Steven D'Aprano
steve at pearwood.info
Fri Jul 11 05:04:02 EDT 2014
On Thu, 10 Jul 2014 23:33:27 -0400, Roy Smith wrote:
> In article <mailman.11747.1405046292.18130.python-list at python.org>,
> Tim Chase <python.list at tim.thechases.com> wrote:
>
>> On 2014-07-10 22:18, Roy Smith wrote:
>> > > Outside this are \( and \): these are literal opening and closing
>> > > bracket characters. So:
>> > >
>> > > \(\([^)]+\)\)
>> >
>> > although, even better would be to use to utterly awesome
>> >> re.VERBOSE
>> > flag, and write it as:
>> >
>> > \({2} [^)]+ \){2}
>>
>> Or heck, use a multi-line verbose expression and comment it for
>> clarity:
>>
>> r = re.compile(r"""
>> ( # begin a capture group
>> \({2} # two literal "(" characters [^)]+ # one or more
>> non-close-paren characters \){2} # two literal ")"
>> characters
>> ) # close the capture group """, re.VERBOSE)
>>
>> -tkc
>
> Ugh. That reminds me of the classic commenting anti-pattern:
The sort of dead-simple commenting shown below is not just harmless but
can be *critically important* for beginners, who otherwise may not know
what "l = []" means.
> l = [] # create an empty list
> for i in range(10): # iterate over the first 10 integers
> l.append(i) # append each one to the list
The difference is, most people get beyond that level of competence in a
matter of a few weeks or months, whereas regexes are a different story.
(1) It's possible to have spent a decade programming in Python without
ever developing more than a basic understanding of regexes. Regular
expressions are a specialist mini-language for a specialist task, and one
might go months or even *years* between needing to use them.
(2) We're *Python* programmers, not *Regex* programmers, so regular
expressions are as much a foreign language to us as Perl or Lisp or C
might be. (And if you personally read any of those languages,
congratulations. How about APL, J, REBOL, Smalltalk, Forth, or PL/I?)
(3) The syntax for regexes is painfully terse and violates a number of
import rules of good design. Larry Wall has listed no fewer than 19
problems with regex syntax/culture:
http://perl6.org/archive/doc/design/apo/A05.html
So all things considered, for the average Python programmer who has a
basic understanding of regexes but has to keep turning to the manual to
find out how to do even simple things, comments explaining what the regex
does is an excellent idea.
--
Steven
More information about the Python-list
mailing list