[Python-Dev] Finding overlapping matches with re assertions: bug or feature?

Tim Peters tim.peters at gmail.com
Fri Nov 15 07:48:33 CET 2013


I was surprised to find that "this works":  if you want to find all
_overlapping_ matches for a regexp R, wrap it in

     (?=(R))

and feed it to (say) finditer.  Here's a very simple example, finding
all overlapping occurrences of "xx":

    pat = re.compile("(?=(xx))")
    for it in pat.finditer("xxxx"):
        print(it.span(1))

That displays:

    (0, 2)
    (1, 3)
    (2, 4)

Is that a feature?  Or an accident?  It's very surprising to find a
non-empty match inside an empty match (the outermost lookahead
assertion).  If it's intended behavior, it's just in time for the
holiday season; e.g., to generate ASCII art for half an upside-down
Christmas tree:

    pat = re.compile("(?=(x+))")
    for it in pat.finditer("xxxxxxxxxx"):
        print(it.group(1))


More information about the Python-Dev mailing list