[Python-ideas] Give regex operations more sugar

Fri Jun 15 02:22:12 EDT 2018

On Thu, Jun 14, 2018 at 2:12 AM, Brendan Barnwell <brenbarn at brenbarn.net> wrote:
> On 2018-06-13 22:29, Steven D'Aprano wrote:
>>
>> On Wed, Jun 13, 2018 at 10:59:34PM +0200, Michel Desmoulin wrote:
>>
>>> > Attaching an entire module to a type is probably worse than
>>> > adding a slew of extra methods to the type.
>>> >
>>>
>>> Not my point.
>>>
>>> str.re would not be the re module, just a namespace where to group all
>>> regex related string methods.
>>
>>
>> That's what a module is :-)
>>
>> How would this work? If I say:
>>
>> "My string".re.match(...)
>>
>> if str.re is "just a namespace" how will the match function know the
>> string it is to operate on?
>
>
>         str.re can be a descriptor object which "knows" which string
> instance it is bound to.  This kind of thing is common in many libraries.
> Pandas for example has all kinds of things like df.loc[1:3],
> df.column.str.startswith('blah'), etc.  The "loc" and "str" attributes give
> objects which are bound (in the sense that bound methods are bound) to the
> objects on which they are accessed, so when you use these attributes to do
> things, the effect takes account of on the "root" object on which you
> accessed the attribute.
>
>         Personally I think this is a great way to reduce namespace clutter
> and group related functionality without having to worry about using up all
> the short or "good" names at the top level.  I'm not sure I agree with the
> specific proposal here for allowing regex operations on strings, but if we
> do do it, this would be a good way to do it.

It's a clever idea, but it's a completely new (at least to standard
Python) way to call a function that acts on a given argument. That
means more to learn.

We already have foo.bar(...) and bar(foo):
    "Hello!".count("o")
    len("Hello!")

Nesting is hiding. Hiding can be good or bad. Adding `foo.b.ar()` will
make it harder to discover. It's also magical: To understand what
`foo.b.ar()` does, you can't think of `foo.b` as a (semantic) property
of the object, or a method of the object, but as a descriptor trick
which holds more methods of that object.

I mainly use Python on a REPL. When I'm on IPython, I can ask what
properties and methods an object has. When I'm on the basic Python
REPL, I use `dir`, or a function which filters and prints `dir` in a
nicer way. Nested method namespaces will be harder to navigate
through. I would not be able to programmatically tell whether a
property is just a property. I'd need to manually inspect each
oddly-named property, to make sure it's not hiding more methods of the
object (and that would only work if the docstrings are maintained and
clear enough for me).

I don't see any advantage of using `foo.b.ar()` over `foo.b_ar()`. In
either case, you'd need to spell out the whole name each time (unlike
with import statements), unless you save the bound method, which you
can do in both cases.

P.S.: Is there any way of guessing what proportion of Python programs
use `re`, either explicitly or implicitly? How many programs will, at
some point in their runtime, load the `re` module?