[Python-ideas] Give regex operations more sugar

Steven D'Aprano steve at pearwood.info
Thu Jun 14 03:10:28 EDT 2018


On Wed, Jun 13, 2018 at 11:12:43PM -0700, Brendan Barnwell wrote:

> >How would this work? If I say:
> >
> >"My string".re.match(...)
> >
> >if str.re is "just a namespace" how will the match function know the
> >string it is to operate on?
> 
> 	str.re can be a descriptor object which "knows" which string 
> 	instance it is bound to.

Obviously, but then its not "just a namespace".

This idea might be common in libraries like pandas, but I don't like it. 
Common is not necessarily good.

Unless str.re is something meaningful on its own, what purpose does it 
hold? If str.re doesn't carry its own weight as a meaningful object, 
then it shouldn't exist.

Particularly since we're only talking about a handful of new methods.

Looking at re, we have these public functions:

- match
- fullmatch
- search

match and fullmatch are redundant; they're the same as calling search 
with a pattern that matches "start of string" and "end of string".

str.find() could easily take a pattern object instead of needing a 
separate search object, particularly if we have dedicated syntax for 
regexes.

- sub
- subn

sub is redundant since it does the same as subn; or the str.replace() 
method could take a pattern object.

- split

Likewise str.split() could take a pattern object.

- findall
- finditer 

findall is just list(finditer); search, match and fullmatch are just 
next(finditer).

The re module API is full of redundancies. That's okay; I'm not 
proposing we "fix" that. But we don't have to duplicate that in string 
objects.

Rather than add eight new methods, we could allow the existing string 
methods to take pattern objects as arguments. That gives us potentially:

    count, endswith, find, index, lstrip, partition, replace, rfind, 
    rindex, rpartition, rsplit, rstrip, split, startswith, strip

(15 methods) that support regex pattern objects, pretty much covering 
all the functionality of:

    match, fullmatch, search, split, sub, subn

and then some. re.findall is redundant. That leaves (potentially) only a 
single re function to turn into a string method: finditer.

How do you get the pattern object? We have three possible tactics:

- import re and call re.compile;

- add a compile method to str;

- add special regex syntax, let's say /pattern/ for the sake of the 
argument.


With pattern literals, we can do this with a single new string method, 
finditer. (Or whatever name we choose.)

Without pattern literals, it won't be so convenient, but we could do 
this with just a one more method: compile.

Or we could simply require people to import re to compile their 
patterns, which would be even less convenient, but it would work.

(But maybe that's a good thing, to encourage people to think before 
reaching for a regular expression, not to encourage them to see 
every problem as a nail and regexes as the hammer.)


-- 
Steve


More information about the Python-ideas mailing list