[ #456742 ] Failing test case for .*?

Tim Peters tim.one at home.com
Sat Nov 3 13:06:03 EST 2001


[A.M. Kuchling]
> +? is a non-greedy +, and it's not equivalent to  (...+)?.
>
> Here's a test program:
>
> import sre
> s = "a\nb\na1"
>
> # Original, buggy pattern
> p = sre.compile(r" [^\n]+? \d", sre.DEBUG | sre.VERBOSE)
> m = p.search(s)
> print (m and m.groups())

This is chasing an illusion:  print m.group(0) instead.  Since this pattern
contains no explicit capturing groups, m.groups() can't produce anything
other than an empty empty.  Here in a simpler setting:

>>> m = re.match('a', 'a')
>>> m.groups() # useless for a pattern without capturing groups
()
>>> m.group(0) # useful
'a'
>>>

> # Add a group
> p = sre.compile(r" ([^\n]+?) \d", sre.DEBUG | sre.VERBOSE)
> m = p.search(s)
> print (m and m.groups())
>
> When I run with the current CVS Python, two different results are
> produced, even though the only difference is adding a pair of
> parentheses:

Yes, and that means .groups() is working correctly in both cases.  Back to
the simpler example:

>>> m = re.match('(a)', 'a')
>>> m.groups()
('a',)
>>> m.group(0)
'a'
>>>

The bug is that the pattern should have found just the 'a1' tail, not all of
s; it's the same bug in both cases:

import re
s = "a\nb\na1"

m = re.search(r'[^\n]+?\d', s)
print m and `m.group(0)`  # prints 'a\nb\na1'; should have been 'a1'

m = re.search(r'([^\n]+?)\d', s)
print m and `m.group(0)`  # also prints 'a\nb\na1'





More information about the Python-list mailing list