regex question

proctor 12cc104 at gmail.com
Sun Apr 29 02:23:00 EDT 2007


On Apr 27, 8:26 am, Michael Hoffman <cam.ac... at mh391.invalid> wrote:
> proctorwrote:
> > On Apr 27, 1:33 am, Paul McGuire <p... at austin.rr.com> wrote:
> >> On Apr 27, 1:33 am,proctor<12cc... at gmail.com> wrote:
> >>> rx_test = re.compile('/x([^x])*x/')
> >>> s = '/xabcx/'
> >>> if rx_test.findall(s):
> >>>         print rx_test.findall(s)
> >>> ============
> >>> i expect the output to be ['abc'] however it gives me only the last
> >>> single character in the group: ['c']
>
> >> As Josiah already pointed out, the * needs to be inside the grouping
> >> parens.
> > so my question remains, why doesn't the star quantifier seem to grab
> > all the data.
>
> Because you didn't use it *inside* the group, as has been said twice.
> Let's take a simpler example:
>
>  >>> import re
>  >>> text = "xabc"
>  >>> re_test1 = re.compile("x([^x])*")
>  >>> re_test2 = re.compile("x([^x]*)")
>  >>> re_test1.match(text).groups()
> ('c',)
>  >>> re_test2.match(text).groups()
> ('abc',)
>
> There are three places that match ([^x]) in text. But each time you find
> one you overwrite the previous example.
>
> > isn't findall() intended to return all matches?
>
> It returns all matches of the WHOLE pattern, /x([^x])*x/. Since you used
> a grouping parenthesis in there, it only returns one group from each
> pattern.
>
> Back to my example:
>
>  >>> re_test1.findall("xabcxaaaxabc")
> ['c', 'a', 'c']
>
> Here it finds multiple matches, but only because the x occurs multiple
> times as well. In your example there is only one match.
>
> > i would expect either 'abc' or 'a', 'b', 'c' or at least just
> > 'a' (because that would be the first match).
>
> You are essentially doing this:
>
> group1 = "a"
> group1 = "b"
> group1 = "c"
>
> After those three statements, you wouldn't expect group1 to be "abc" or
> "a". You'd expect it to be "c".
> --
> Michael Hoffman

thank you all again for helping to clarify this for me.  of course you
were exactly right, and the problem lay not with python or the text,
but with me.  i mistakenly understood the text to be attempting to
capture the C style comment, when in fact it was merely matching it.

apologies.

sincerely,
proctor




More information about the Python-list mailing list