Help for a complex RE

Sergio Spina sergio.am.spina at gmail.com
Sun May 8 12:32:02 EDT 2016


Il giorno domenica 8 maggio 2016 18:16:56 UTC+2, Peter Otten ha scritto:
> Sergio Spina wrote:
> 
> > In the following ipython session:
> > 
> >> Python 3.5.1+ (default, Feb 24 2016, 11:28:57)
> >> Type "copyright", "credits" or "license" for more information.
> >>
> >> IPython 2.3.0 -- An enhanced Interactive Python.
> >>
> >> In [1]: import re
> >>
> >> In [2]: patt = r"""  # the match pattern is:
> >> ...:     .+          # one or more characters
> >> ...:     [ ]         # followed by a space
> >> ...:     (?=[@#D]:)  # that is followed by one of the
> >> ...:                 # chars "@#D" and a colon ":"
> >> ...:    """
> >> 
> >> In [3]: pattern = re.compile(patt, re.VERBOSE)
> >> 
> >> In [4]: m = pattern.match("Jun at i Bun#i @:Janji")
> >> 
> >> In [5]: m.group()
> >> Out[5]: 'Jun at i Bun#i '
> >> 
> >> In [6]: m = pattern.match("Jun at i Bun#i @:Janji D:Banji")
> >> 
> >> In [7]: m.group()
> >> Out[7]: 'Jun at i Bun#i @:Janji '
> >> 
> >> In [8]: m = pattern.match("Jun at i Bun#i @:Janji D:Banji #:Junji")
> >> 
> >> In [9]: m.group()
> >> Out[9]: 'Jun at i Bun#i @:Janji D:Banji '
> > 
> > Why the regex engine stops the search at last piece of string?
> > Why not at the first match of the group "@:"?
> > What can it be a regex pattern with the following result?
> > 
> >> In [1]: m = pattern.match("Jun at i Bun#i @:Janji D:Banji #:Junji")
> >> 
> >> In [2]: m.group()
> >> Out[2]: 'Jun at i Bun#i '
> 
> Compare:
> 
> >>> re.compile("a+").match("aaaa").group()
> 'aaaa'
> >>> re.compile("a+?").match("aaaa").group()
> 'a'
> 
> By default pattern matching is "greedy" -- the ".+" part of your regex 
> matches as many characters as possible. Adding a ? like in ".+?" triggers 
> non-greedy matching.

>  In [2]: patt = r"""  # the match pattern is:
>  ...:     .+          # one or more characters
>  ...:     [ ]         # followed by a space
>  ...:     (?=[@#D]:)  # ONLY IF is followed by one of the <<< please note
>  ...:                 # chars "@#D" and a colon ":"
>  ...:    """ 

>From the python documentation

>  (?=...)
>      Matches if ... matches next, but doesn't consume any of the string.
>      This is called a lookahead assertion. For example,
>      Isaac (?=Asimov) will match 'Isaac ' only if it's followed by 'Asimov'.

I know about greedy and not-greedy, but the problem remains.




More information about the Python-list mailing list