regex question
Michael Hoffman
cam.ac.uk at mh391.invalid
Fri Apr 27 10:26:51 EDT 2007
proctor wrote:
> On Apr 27, 1:33 am, Paul McGuire <p... at austin.rr.com> wrote:
>> On Apr 27, 1:33 am, proctor <12cc... at gmail.com> wrote:
>>> rx_test = re.compile('/x([^x])*x/')
>>> s = '/xabcx/'
>>> if rx_test.findall(s):
>>> print rx_test.findall(s)
>>> ============
>>> i expect the output to be ['abc'] however it gives me only the last
>>> single character in the group: ['c']
>
>> As Josiah already pointed out, the * needs to be inside the grouping
>> parens.
> so my question remains, why doesn't the star quantifier seem to grab
> all the data.
Because you didn't use it *inside* the group, as has been said twice.
Let's take a simpler example:
>>> import re
>>> text = "xabc"
>>> re_test1 = re.compile("x([^x])*")
>>> re_test2 = re.compile("x([^x]*)")
>>> re_test1.match(text).groups()
('c',)
>>> re_test2.match(text).groups()
('abc',)
There are three places that match ([^x]) in text. But each time you find
one you overwrite the previous example.
> isn't findall() intended to return all matches?
It returns all matches of the WHOLE pattern, /x([^x])*x/. Since you used
a grouping parenthesis in there, it only returns one group from each
pattern.
Back to my example:
>>> re_test1.findall("xabcxaaaxabc")
['c', 'a', 'c']
Here it finds multiple matches, but only because the x occurs multiple
times as well. In your example there is only one match.
> i would expect either 'abc' or 'a', 'b', 'c' or at least just
> 'a' (because that would be the first match).
You are essentially doing this:
group1 = "a"
group1 = "b"
group1 = "c"
After those three statements, you wouldn't expect group1 to be "abc" or
"a". You'd expect it to be "c".
--
Michael Hoffman
More information about the Python-list
mailing list