[Python-Dev] why we have both re.match and re.string?

Wed Feb 10 18:05:51 EST 2016

On Wed, Feb 10, 2016 at 10:59:18PM +0100, Luca Sangiacomo wrote:
> Hi,
> I hope the question is not too silly, but why I would like to understand 
> the advantages of having both re.match() and re.search(). Wouldn't be 
> more clear to have just one function with one additional parameters like 
> this:
> 
> re.search(regexp, text, from_beginning=True|False) ?

I guess the most important reason now is backwards compatibility. The 
oldest Python I have installed here is version 1.5, and it has the brand 
new "re" module (intended as a replacement for the old "regex" module). 
Both have search() and match() top-level functions. So my guess is that 
you would have to track down the author of the original "regex" module.

But a more general answer is the principle, "Functions shouldn't take 
constant bool arguments". It is an API design principle which (if I 
remember correctly) Guido has stated a number of times. Functions should 
not take a boolean argument which (1) exists only to select between two 
different modes and (2) are nearly always given as a constant.

Do you ever find yourself writing code like this?

if some_calculation():
    result = re.match(regex, string)
else:
    result = re.search(regex, string)

If you do, that would be a hint that perhaps match() and search() should 
be combined so you can write:

result = re.search(regex, string, some_calculation())

But I expect that you almost never do. I would expect that if we 
combined the two functions into one, we would nearly always call them 
with a constant bool:

# I always forget whether True means match from the start or not, 
# and which is the default...
result = re.search(regex, string, False)

which suggests that search() is actually two different functions, and 
should be split into two, just as we have now.

It's a general principle, not a law of nature, so you may find 
exceptions in the standard library. But if I were designing the re 
module from scratch, I would either keep the two distinct functions, or 
just provide search() and let users use ^ to anchor the search to the 
beginning.

> In this way we prevent, as written in the documentation, people writing 
> ".*" in front of the regexp used with re.match()

I only see one example that does that:

https://docs.python.org/3/library/re.html#checking-for-a-pair

Perhaps it should be changed.

-- 
Steve