regex question

proctor 12cc104 at gmail.com
Fri Apr 27 10:10:18 EDT 2007


On Apr 27, 1:33 am, Paul McGuire <p... at austin.rr.com> wrote:
> On Apr 27, 1:33 am, proctor <12cc... at gmail.com> wrote:
>
>
>
> > hello,
>
> > i have a regex:  rx_test = re.compile('/x([^x])*x/')
>
> > which is part of this test program:
>
> > ============
>
> > import re
>
> > rx_test = re.compile('/x([^x])*x/')
>
> > s = '/xabcx/'
>
> > if rx_test.findall(s):
> >         print rx_test.findall(s)
>
> > ============
>
> > i expect the output to be ['abc'] however it gives me only the last
> > single character in the group: ['c']
>
> > C:\test>python retest.py
> > ['c']
>
> > can anyone point out why this is occurring?  i can capture the entire
> > group by doing this:
>
> > rx_test = re.compile('/x([^x]+)*x/')
> > but why isn't the 'star' grabbing the whole group?  and why isn't each
> > letter 'a', 'b', and 'c' present, either individually, or as a group
> > (group is expected)?
>
> > any clarification is appreciated!
>
> > sincerely,
> > proctor
>
> As Josiah already pointed out, the * needs to be inside the grouping
> parens.
>
> Since re's do lookahead/backtracking, you can also write:
>
> rx_test = re.compile('/x(.*?)x/')
>
> The '?' is there to make sure the .* repetition stops at the first
> occurrence of x/.
>
> -- Paul

i am working through an example from the oreilly book mastering
regular expressions (2nd edition) by jeffrey friedl.  my post was a
snippet from a regex to match C comments.   every 'x' in the regex
represents a 'star' in actual usage, so that backslash escaping is not
needed in the example (on page 275).  it looks like this:

===========

/x([^x]|x+[^/x])*x+/

it is supposed to match '/x', the opening delimiter, then

(
either anything that is 'not x',

or,

'x' one or more times, 'not followed by a slash or an x'
) any number of times (the 'star')

followed finally by the closing delimiter.

===========

this does not seem to work in python the way i understand it should
from the book, and i simplified the example in my first post to
concentrate on just one part of the alternation that i felt was not
acting as expected.

so my question remains, why doesn't the star quantifier seem to grab
all the data.  isn't findall() intended to return all matches?  i
would expect either 'abc' or 'a', 'b', 'c' or at least just
'a' (because that would be the first match).  why does it give only
one letter, and at that, the /last/ letter in the sequence??

thanks again for replying!

sincerely,
proctor




More information about the Python-list mailing list