[Python-ideas] Give regex operations more sugar
Ken Hilton
kenlhilton at gmail.com
Wed Jun 13 07:06:09 EDT 2018
Hi all,
Regexes are really useful in many places, and to me it's sad to see the
builtin "re" module having to resort to requiring a source string as an
argument. It would be much more elegant to simply do "s.search(pattern)"
than "re.search(pattern, s)".
I suggest building all regex operations into the str class itself, as well
as a new syntax for regular expressions.
Thus a "findall" for any lowercase letter in a string would look like this:
>>> "1a3c5e7g9i".findall(!%[a-z]%)
['a', 'c', 'e', 'g', 'i']
A "findall" for any letter, case insensitive:
>>> "1A3c5E7g9I".findall(!%[a-z]%i)
['A', 'c', 'E', 'g', 'I']
A substitution of any letter for the string " WOOF WOOF ":
>>> "1a3c5e7g9i".sub(!%[a-z]% WOOF WOOF %)
'1 WOOF WOOF 3 WOOF WOOF 5 WOOF WOOF 7 WOOF WOOF 9 WOOF WOOF '
A substitution of any letter, case insensitive, for the string "hovercraft":
>>> "1A3c5E7g9I".sub(!%[a-z]%hovercraft%i)
'1hovercraft3hovercraft5hovercraft7hovercraft9hovercraft'
You may wonder why I chose the regex delimiters as "!%" ... "%" [ ... "%" ]
...
The choice of "%" was purely arbitrary; I just thought of it since there
seems to be a convention to use "%" in PHP regex patterns. The "!" is in
front to disambiguate it from the "%" modulo operator or the "%" string
formatting operator, and because "!" is currently not used in Python.
Another potential idea is to simply use "!" to denote the start of a regex,
and use the character immediately following it to delimit the regex. Thus
all of the following would be regexes matching a single lowercase letter:
!%[a-z]%
!#[a-z]#
!?[a-z]?
!/[a-z]/
And all of the following would be substitution regexes replacing a single
case-insensitive letter with "@":
!%[a-z]%@%i
!#[a-z]#@#i
!?[a-z]?@?i
!/[a-z]/@/i
Some examples of how to use this:
>>> "pneumonoultramicroscopicsilicovolcanokoniosis".findall(!%[aeiou]+%)
['eu', 'o', 'ou', 'a', 'i', 'o', 'o', 'i', 'i', 'i', 'o', 'o', 'a',
'o', 'o', 'io', 'i']
>>>
"GMzKqtnnyGdqIQNlQSLidbDlqpdhoRbHrrUAgyhMgkZKYVhQuI".search(!%[^A-Z][A-Z]{3}([a-z])[A-Z]{3}[^A-Z]%)
<regex_match; span=(11, 20); match='qIQNlQSLi'>
>>> "My name is Joanne.".findall(!%[A-Z][a-z]+%)
['My', 'Joanne']
Thoughts?
Sincerely,
Ken;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180613/7de2df91/attachment.html>
More information about the Python-ideas
mailing list