regex question

proctor 12cc104 at gmail.com
Fri Apr 27 11:49:16 EDT 2007


On Apr 27, 8:50 am, Paul McGuire <p... at austin.rr.com> wrote:
> On Apr 27, 9:10 am, proctor <12cc... at gmail.com> wrote:
>
>
>
> > On Apr 27, 1:33 am, Paul McGuire <p... at austin.rr.com> wrote:
>
> > > On Apr 27, 1:33 am, proctor <12cc... at gmail.com> wrote:
>
> > > > hello,
>
> > > > i have a regex:  rx_test = re.compile('/x([^x])*x/')
>
> > > > which is part of this test program:
>
> > > > ============
>
> > > > import re
>
> > > > rx_test = re.compile('/x([^x])*x/')
>
> > > > s = '/xabcx/'
>
> > > > if rx_test.findall(s):
> > > >         print rx_test.findall(s)
>
> > > > ============
>
> > > > i expect the output to be ['abc'] however it gives me only the last
> > > > single character in the group: ['c']
>
> > > > C:\test>python retest.py
> > > > ['c']
>
> > > > can anyone point out why this is occurring?  i can capture the entire
> > > > group by doing this:
>
> > > > rx_test = re.compile('/x([^x]+)*x/')
> > > > but why isn't the 'star' grabbing the whole group?  and why isn't each
> > > > letter 'a', 'b', and 'c' present, either individually, or as a group
> > > > (group is expected)?
>
> > > > any clarification is appreciated!
>
> > > > sincerely,
> > > > proctor
>
> > > As Josiah already pointed out, the * needs to be inside the grouping
> > > parens.
>
> > > Since re's do lookahead/backtracking, you can also write:
>
> > > rx_test = re.compile('/x(.*?)x/')
>
> > > The '?' is there to make sure the .* repetition stops at the first
> > > occurrence of x/.
>
> > > -- Paul
>
> > i am working through an example from the oreilly book mastering
> > regular expressions (2nd edition) by jeffrey friedl.  my post was a
> > snippet from a regex to match C comments.   every 'x' in the regex
> > represents a 'star' in actual usage, so that backslash escaping is not
> > needed in the example (on page 275).  it looks like this:
>
> > ===========
>
> > /x([^x]|x+[^/x])*x+/
>
> > it is supposed to match '/x', the opening delimiter, then
>
> > (
> > either anything that is 'not x',
>
> > or,
>
> > 'x' one or more times, 'not followed by a slash or an x'
> > ) any number of times (the 'star')
>
> > followed finally by the closing delimiter.
>
> > ===========
>
> > this does not seem to work in python the way i understand it should
> > from the book, and i simplified the example in my first post to
> > concentrate on just one part of the alternation that i felt was not
> > acting as expected.
>
> > so my question remains, why doesn't the star quantifier seem to grab
> > all the data.  isn't findall() intended to return all matches?  i
> > would expect either 'abc' or 'a', 'b', 'c' or at least just
> > 'a' (because that would be the first match).  why does it give only
> > one letter, and at that, the /last/ letter in the sequence??
>
> > thanks again for replying!
>
> > sincerely,
> > proctor- Hide quoted text -
>
> > - Show quoted text -
>
> Again, I'll repeat some earlier advice:  you need to move the '*'
> inside the parens - you are still leaving it outside.  Also, get in
> the habit of using raw literal notation (that is r"slkjdfljf" instead
> of "lsjdlfkjs") when defining re strings - you don't have backslash
> issues yet, but you will as soon as you start putting real '*'
> characters in your expression.
>
> However, when I test this,
>
> restr = r'/x(([^x]|x+[^/])*)x+/'
> re_ = re.compile(restr)
> print re_.findall("/xabxxcx/ /x123xxx/")
>
> findall now starts to give a tuple for each "comment",
>
> [('abxxc', 'xxc'), ('123xx', 'xx')]
>
> so you have gone beyond my limited re skill, and will need help from
> someone else.
>
> But I suggest you add some tests with multiple consecutive 'x'
> characters in the middle of your comment, and multiple consecutive 'x'
> characters before the trailing comment.  In fact, from my
> recollections of trying to implement this type of comment recognizer
> by hand a long time ago in a job far, far away, test with both even
> and odd numbers of 'x' characters.
>
> -- Paul

thanks paul,

the reason the regex now give tuples is that there are now 2 groups,
the inner and outer parens.  so group 1 matches with the star, and
group 2 matches without the star.

sincerely,
proctor




More information about the Python-list mailing list