match point

Thierry Closen no at mail.com
Tue Dec 22 12:18:25 EST 2015


I found the story behind the creation of re.fullmatch(). 

I had no luck before because I was searching under "www.python.org/dev",
while in reality it sprang out of a bug report:
https://bugs.python.org/issue16203

In summary, there were repeated bugs where during maintenance of code
the $ symbol disappeared from patterns, hence the decision to create a
function that anchors the pattern to the end of the string independently
of the presence of that symbol.

I am perplexed by what I discovered, as I would never have thought that
such prominent functions can be created to scratch such a minor itch:
The creation of fullmatch() might address this very specific issue, but 
I would tend to think that if really certain symbols disappear from
patterns inside a code base, this should be seen as the sign of more
profound problems in the code maintenance processes.

Anyway, the discussion around that bug inspired me another argument that
is more satisfying:

When I was saying that
        re.fullmatch(pattern, string)
is exactly the same as
        re.search(r'\A'+pattern+r'\Z', string)
I was wrong.

For example if pattern starts with an inline flag like (?i), we cannot
simply stick \A in front of it.

Other example, consider pattern is 'a|b'. We end up with:
        re.search(r'\Aa|b\Z', string)
which is not what we want.

To avoid that problem we need to add parentheses:
        re.search(r'\A('+pattern+r')\Z', string)
But now we created a group, and if the pattern already contained groups
and backreferences we may just have broken it.

So we need to use a non-capturing group:
        re.search(r'\A(?:'+pattern+r')\Z', string)
...and now I think we can say we are at a level of complexity where we
cannot reasonably expect the average user to always remember to write
exactly this, so it makes sense to add an easy-to-use fullmatch function
to the re namespace.

It may not be the real historical reason behind re.fullmatch, but
personally I will stick with that one :)

Cheers,

Thierry





More information about the Python-list mailing list