splitting words with brackets

Paul McGuire ptmcg at austin.rr._bogus_.com
Thu Jul 27 04:20:01 EDT 2006


"Tim Chase" <python.list at tim.thechases.com> wrote in message
news:mailman.8598.1153966351.27775.python-list at python.org...
> >> >>> r = re.compile(r'(?:\([^\)]*\)|\[[^\]]*\]|\S)+')
> >> >>> r.findall(s)
> >>['(a c)b(c d)', 'e']
> >
> > Ah, it's exactly what I want!  I thought the left and right
> > sides of "|" are equal, but it is not true.
>
> In theory, they *should* be equal. I was baffled by the nonparity
> of the situation.  You *should" be able to swap the two sides of
> the "|" and have it treated the same.  Yet, when I tried it with
> the above regexp, putting the \S first, it seemed to choke and
> give different results.  I'd love to know why.
>
Does the re do left-to-right matching?  If so, then the \S will eat the
opening parens/brackets, and never get into the other alternative patterns.
\S is the most "matchable" pattern, so if it comes ahead of the other
alternatives, then it will always be the one matched.  My guess is that if
you put \S first, you will only get the contiguous character groups,
regardless of ()'s and []'s.  The expression might as well just be \S+.

Or I could be completely wrong...

-- Paul





More information about the Python-list mailing list