How to write this repeat matching?

rxjwg98 at gmail.com rxjwg98 at gmail.com
Mon Jul 7 09:30:47 EDT 2014


On Sunday, July 6, 2014 3:26:44 PM UTC-4, Ian wrote:
> On Sun, Jul 6, 2014 at 12:57 PM,  <rxjwg98 at gmail.com> wrote:
> 
> > I write the following code:
> 
> >
> 
> > .......
> 
> > import re
> 
> >
> 
> > line = "abcdb"
> 
> >
> 
> > matchObj = re.match( 'a[bcd]*b', line)
> 
> >
> 
> > if matchObj:
> 
> >    print "matchObj.group() : ", matchObj.group()
> 
> >    print "matchObj.group(0) : ", matchObj.group()
> 
> >    print "matchObj.group(1) : ", matchObj.group(1)
> 
> >    print "matchObj.group(2) : ", matchObj.group(2)
> 
> > else:
> 
> >    print "No match!!"
> 
> > .........
> 
> >
> 
> > In which I have used its match pattern, but the result is not 'abcb'
> 
> 
> 
> You're never going to get a match of 'abcb' on that string, because
> 
> 'abcb' is not found anywhere in that string.
> 
> 
> 
> There are two possible matches for the given pattern over that string:
> 
> 'abcdb' and 'ab'.  The first one matches the [bcd]* three times, and
> 
> the second one matches it zero times.  Because the matching is greedy,
> 
> you get the result that matches three times.  It cannot match one, two
> 
> or four times because then there would be no 'b' following the [bcd]*
> 
> portion as required by the pattern.
> 
> 
> 
> >
> 
> > Only matchObj.group(0): abcdb
> 
> >
> 
> > displays. All other group(s) have no content.
> 
> 
> 
> Calling match.group(0) is equivalent to calling match.group without
> 
> arguments. In that case it returns the matched string of the entire
> 
> regular expression.  match.group(1) and match.group(2) will return the
> 
> value of the first and second matching group respectively, but the
> 
> pattern does not have any matching groups.  If you want a matching
> 
> group, then enclose the part that you want it to match in parentheses.
> 
> For example, if you change the pattern to:
> 
> 
> 
>     matchObj = re.match('a([bcd]*)b', line)
> 
> 
> 
> then the value of matchObj.group(1) will be 'bcd'

Because I am new to Python, I may not describe the question clearly. Could you
read the original problem on web:

https://docs.python.org/2/howto/regex.html

It says that it gets 'abcb'. Could you explain it to me? Thanks again


A step-by-step example will make this more obvious. Let's consider the
 expression a[bcd]*b. This matches the letter 'a', zero or more letters from
 the class [bcd], and finally ends with a 'b'. Now imagine matching this RE
 against the string abcbd.


Step                 Matched              Explanation

1 a The a in the RE matches. 
2 abcbd The engine matches [bcd]*, going as far as it can, which is to the end
 of the string. 
3 Failure The engine tries to match b, but the current position is at the end
 of the string, so it fails. 
4 abcb Back up, so that [bcd]* matches one less character. 
5 Failure Try b again, but the current position is at the last character, which
 is a 'd'. 

6 abc Back up again, so that [bcd]* is only matching bc. 

6 abcb Try b again. This time the character at the current position is 'b', so
 it succeeds. 



More information about the Python-list mailing list