regex question

Michael Hoffman cam.ac.uk at mh391.invalid
Fri Apr 27 10:26:51 EDT 2007


proctor wrote:
> On Apr 27, 1:33 am, Paul McGuire <p... at austin.rr.com> wrote:
>> On Apr 27, 1:33 am, proctor <12cc... at gmail.com> wrote:

>>> rx_test = re.compile('/x([^x])*x/')
>>> s = '/xabcx/'
>>> if rx_test.findall(s):
>>>         print rx_test.findall(s)
>>> ============
>>> i expect the output to be ['abc'] however it gives me only the last
>>> single character in the group: ['c']
>
>> As Josiah already pointed out, the * needs to be inside the grouping
>> parens.

> so my question remains, why doesn't the star quantifier seem to grab
> all the data.

Because you didn't use it *inside* the group, as has been said twice. 
Let's take a simpler example:

 >>> import re
 >>> text = "xabc"
 >>> re_test1 = re.compile("x([^x])*")
 >>> re_test2 = re.compile("x([^x]*)")
 >>> re_test1.match(text).groups()
('c',)
 >>> re_test2.match(text).groups()
('abc',)

There are three places that match ([^x]) in text. But each time you find 
one you overwrite the previous example.

> isn't findall() intended to return all matches?

It returns all matches of the WHOLE pattern, /x([^x])*x/. Since you used 
a grouping parenthesis in there, it only returns one group from each 
pattern.

Back to my example:

 >>> re_test1.findall("xabcxaaaxabc")
['c', 'a', 'c']

Here it finds multiple matches, but only because the x occurs multiple 
times as well. In your example there is only one match.

> i would expect either 'abc' or 'a', 'b', 'c' or at least just
> 'a' (because that would be the first match).

You are essentially doing this:

group1 = "a"
group1 = "b"
group1 = "c"

After those three statements, you wouldn't expect group1 to be "abc" or 
"a". You'd expect it to be "c".
-- 
Michael Hoffman



More information about the Python-list mailing list