Powerful perl paradigm I don't find in python

Peter Otten __peter__ at web.de
Fri Jan 15 08:34:40 EST 2016


Charles T. Smith wrote:

> What the original snippet does is parse *and consume* a string - actually,
> to avoid maintaining a cursor traverse the string.  The perl feature is
> that substitute allows the found pattern to be replaced, but retains the
> group after the expression is complete.

That is too technical for my taste. When is your "paradigm" more useful than 
a simple

re.finditer(), re.findall(), or re.split()

? 

>> things = []
>> while some_str != tail:
>>      m = re.match(pattern_str, some_str)
>>      things.append(some_str[:m.end()])
>>      some_str = some_str[m.end():]
 
If that were common (or even ever occured) I'd write a helper which avoids 
the brittle some_str != tail comparison and exposes the functionality in a 
for loop:

class MissingTailError(ValueError):
    pass


class UnparsedRestError(ValueError):
    pass


def shave_off(regex, text, tail=None):
    """
    >>> for s in shave_off(r"[a-z]+ \\d+\\s*",
    ...        "foo 12 bar 34 baz", tail="baz"):
    ...     s
    'foo 12 '
    'bar 34 '
    """
    if tail is not None:
        if text.endswith(tail):
            end = len(text) - len(tail)
        else:
            raise MissingTailError("%r does not end with %r" % (text, tail))
    else:
        end = len(text)

    start = 0
    r = re.compile(regex)
    while start != end:
        m = r.match(text, start, end)
        if m is None:
            raise UnparsedRestError(
                "%r does not match pattern %r"
                % (text[start:end], r.pattern))
        yield text[m.start():m.end()]
        start = m.end()





More information about the Python-list mailing list