Regex Question

Bill Mill bill.mill at gmail.com
Tue Jan 16 14:36:49 EST 2007


James Stroud wrote:
> Bill Mill wrote:
> > Hello all,
> >
> > I've got a test script:
> >
> > ==== start python code =====
> >
> > tests2 = ["item1: alpha; item2: beta. item3 - gamma--",
> > "item1: alpha; item3 - gamma--"]
> >
> > def test_re(regex):
> >    r = re.compile(regex, re.MULTILINE)
> >    for test in tests2:
> >        res = r.search(test)
> >        if res:
> >            print res.groups()
> >        else:
> >            print "Failed"
> >
> > ==== end python code ====
> >
> > And a simple question:
> >
> > Why does the first regex that follows successfully grab "beta", while
> > the second one doesn't?
> >
> > In [131]: test_re(r"(?:item2: (.*?)\.)")
> > ('beta',)
> > Failed
> >
> > In [132]: test_re(r"(?:item2: (.*?)\.)?")
> > (None,)
> > (None,)
> >
> > Shouldn't the '?' greedily grab the group match?
> >
> > Thanks
> > Bill Mill
> > bill.mill at gmail.com
>
> The question-mark matches at zero or one. The first match will be a
> group with nothing in it, which satisfies the zero condition. Perhaps
> you mean "+"?
>
> e.g.
>
> py> import re
> py> rgx = re.compile('1?')
> py> rgx.search('a1').groups()
> (None,)
> py> rgx = re.compile('(1)+')
> py> rgx.search('a1').groups()

But shouldn't the ? be greedy, and thus prefer the one match to the
zero? This is my sticking point - I've seen that plus works, and this
just confuses me more.

-Bill Mill
bill.mill at gmail.com




More information about the Python-list mailing list