[Python-ideas] Match statement brainstorm

Wed May 25 05:26:14 EDT 2016

On 25 May 2016 at 08:38, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 05/24/2016 10:52 PM, Nick Coghlan wrote:
>
>> Using the running demo:
>>
>>      def demo(arg):
>>          given arg:
>>              case x, y, *_: # Tuple matching (implicit name binding)
>>                  ...
>>              case (.x, .y) as p, q: # Attribute matching
>>                  ...
>>              case (["x"], ["y"]) as p, q: # Item matching
>>                  ...
>>              case (.x) as p and isinstance(p, int): # Match + condition
>>                  ...
>>              case if isinstance(arg, int): # Condition only
>>                  ...
>>              else: # Default
>>                  ...
>>
>> The other key change there is introducing "as" to the individual cases
>> in order to be able to separate the match pattern definition from the
>> local name binding.
>
>
> With this one I have a clue as to what's going on.

On first reading, I felt the same way. But rereading, I find that a
number of odd cases start bothering me: (["x"], ["y"]) doesn't look
like an item match the more I think about it. And to match "a 2-tuple
with ["x"] as the first item would be ["x"], y which is very close to
the item matching case.

As an *example* it works, but I don't see any way to describe the
detailed *semantics* without it being a mess of special cases.

One thought that this *does* prompt, though - the details of the
statement syntax are more or less bikeshed material in all this. The
*real* meat of the debate is around how we express matches.

So maybe we should focus solely on pattern matching as a primitive
construct. If we have an agreed syntax for "a match" then working out
how we incorporate that into a switch statement (or a "does this
match" maybe-assignment expression, or whatever) would likely be a lot
easier.

So looking solely at a "match" as a concept, let's see where that takes us:

- Unpacking syntax (NAME, NAME, *REST) handles matching sequences.
- Matches should probably nest, so NAME in the above could be a (sub-)match.
- For matching constants, if NAME is a constant rather than a name,
then that element must equal the literal for the match to succeed.
We'd probably need to restrict this to literals
(https://docs.python.org/3/reference/expressions.html#grammar-token-literal)
to avoid ambiguity.
- Matching mappings could be handled using dict syntax {literal: NAME,
literal: NAME, **REST}. Again allow recursive submatches and literal
matches?
- There's no "obvious" syntax for object mappings - maybe use type
constructor syntax TYPE(attr=NAME, attr=NAME). In the case where you
don't care about the type, we could use "object" (in theory, I don't
think it's ambiguous to omit the type, but that may be difficult for
the reader to understand). Also, TYPE() would then be an isinstance
check - do we want that?
- Top-level matches can have "and CONDITION" to do further tests on
the matched values (we don't want to allow this for nested matches,
though!)

Translating (and extending a bit) Nick's example:

    def demo(arg):
        given arg:
            case (x, y, *_): # Tuple matching (implicit name binding)
                ...
            case object(x=p, y=q): # Attribute matching
                ...
            case {"x": p, "y": q): # Item matching
                ...
            case object(x=p) and isinstance(p, int): # Match + condition
                ...
            case int(): # Type match
                ...
            case (1, p, {"key": 0, "match": q}):
                # Match a sequence of length 3, first item must be 1, last
                # must be a mapping with a key "key" with value 0 and
a key "match"
                ...
            else: # Default

The worst one to my mind is the object match (not just in this style,
but basically everywhere) - that's because there's no existing display
or unpacking syntax for objects, so whatever we come up with is
unfamiliar.

I'm still a bit meh on this, though. Every proposal I've seen now
(including the above!) looks natural for simple examples - and would
probably look natural for 99% of real-world uses, which are typically
simple! - but gets awfully messy in the corner cases. It feels like
"If the implementation is hard to explain, it's a bad idea." may apply
here (although it's less the "implementation" and more the "detailed
semantics" that's hard to explain).

On 25 May 2016 at 10:04, Franklin? Lee <leewangzhong+python at gmail.com> wrote:
> Problem: `Point(x, y, ...)` is a legitimate function call, so if
> `Point(x, 0)` is a legal pattern (i.e. no distinguishing syntax
> between values and bindnames), you'd need the syntax to be `Point(x,
> y, *...)`. Personally, I'd require that everything is a bindname
> (unless it looks like a constructor call), and require checks to be in
> guards.

With the above syntax, "bare" values aren't a valid match, so in a
match, Point(x, y) can never be a function call, it must be a match
"Object of type Point, with x and y attributes (which we check for but
don't bind)".

Paul