[Python-ideas] Give regex operations more sugar
Steven D'Aprano
steve at pearwood.info
Thu Jun 14 03:10:28 EDT 2018
On Wed, Jun 13, 2018 at 11:12:43PM -0700, Brendan Barnwell wrote:
> >How would this work? If I say:
> >
> >"My string".re.match(...)
> >
> >if str.re is "just a namespace" how will the match function know the
> >string it is to operate on?
>
> str.re can be a descriptor object which "knows" which string
> instance it is bound to.
Obviously, but then its not "just a namespace".
This idea might be common in libraries like pandas, but I don't like it.
Common is not necessarily good.
Unless str.re is something meaningful on its own, what purpose does it
hold? If str.re doesn't carry its own weight as a meaningful object,
then it shouldn't exist.
Particularly since we're only talking about a handful of new methods.
Looking at re, we have these public functions:
- match
- fullmatch
- search
match and fullmatch are redundant; they're the same as calling search
with a pattern that matches "start of string" and "end of string".
str.find() could easily take a pattern object instead of needing a
separate search object, particularly if we have dedicated syntax for
regexes.
- sub
- subn
sub is redundant since it does the same as subn; or the str.replace()
method could take a pattern object.
- split
Likewise str.split() could take a pattern object.
- findall
- finditer
findall is just list(finditer); search, match and fullmatch are just
next(finditer).
The re module API is full of redundancies. That's okay; I'm not
proposing we "fix" that. But we don't have to duplicate that in string
objects.
Rather than add eight new methods, we could allow the existing string
methods to take pattern objects as arguments. That gives us potentially:
count, endswith, find, index, lstrip, partition, replace, rfind,
rindex, rpartition, rsplit, rstrip, split, startswith, strip
(15 methods) that support regex pattern objects, pretty much covering
all the functionality of:
match, fullmatch, search, split, sub, subn
and then some. re.findall is redundant. That leaves (potentially) only a
single re function to turn into a string method: finditer.
How do you get the pattern object? We have three possible tactics:
- import re and call re.compile;
- add a compile method to str;
- add special regex syntax, let's say /pattern/ for the sake of the
argument.
With pattern literals, we can do this with a single new string method,
finditer. (Or whatever name we choose.)
Without pattern literals, it won't be so convenient, but we could do
this with just a one more method: compile.
Or we could simply require people to import re to compile their
patterns, which would be even less convenient, but it would work.
(But maybe that's a good thing, to encourage people to think before
reaching for a regular expression, not to encourage them to see
every problem as a nail and regexes as the hammer.)
--
Steve
More information about the Python-ideas
mailing list