Notice: While Javascript is not essential for this website, your interaction with the content will be limited. Please turn Javascript on for the full experience.

PEP 642 -- Constraint Pattern Syntax for Structural Pattern Matching

PEP:642
Title:Constraint Pattern Syntax for Structural Pattern Matching
Author:Nick Coghlan <ncoghlan at gmail.com>
BDFL-Delegate:
Discussions-To:Python-Dev <python-dev at python.org>
Status:Draft
Type:Standards Track
Requires:634
Created:26-Sep-2020
Python-Version:3.10
Post-History:31-Oct-2020, 8-Nov-2020
Resolution:

Abstract

This PEP covers an alternative syntax proposal for PEP 634's structural pattern matching that explicitly anchors match patterns in the existing syntax for assignment targets, while retaining most semantic aspects of the existing proposal.

Specifically, this PEP adopts an additional design restriction that PEP 634's authors considered unreasonable: that any novel match pattern semantics must offer syntax that future PEPs could plausibly propose for adoption in assignment targets. It is (reluctantly) considered acceptable to offer syntactic sugar that is specific to match patterns, as long as there is an underlying more explicit form that is compatible (or potentially compatible) with assignment targets.

As a consequence, this PEP proposes the following changes to the proposed match pattern syntax:

  • a new pattern type is introduced: "constraint patterns"
  • constraint patterns are either equality constraints or identity constraints
  • equality constraints use == as a prefix marker on an otherwise arbitrary primary expression: == EXPR
  • identity constraints use is as a prefix marker on an otherwise arbitrary primary expression: is EXPR
  • value patterns and literal patterns (with some exceptions) are redefined as "inferred equality constraints", and become a syntactic shorthand for an equality constraint
  • None and ... are defined as "inferred identity constraints" and become a syntactic shorthand for an identity constraint
  • due to ambiguity of intent, neither True nor False are accepted as implying an inferred constraint (instead requiring the use of an explicit constraint, a class pattern, or a capture pattern with a guard expression)
  • inferred constraints are not defined in the Abstract Syntax Tree. Instead, inferred constraints are converted to explicit constraints by the parser
  • The wildcard pattern changes from _ (single underscore) to __ (double underscore), and gains a dedicated SkippedBinding node in the AST
  • Mapping patterns change to allow arbitrary primary expressions as keys

Relationship with other PEPs

This PEP both depends on and competes with PEP 634 - the PEP author agrees that match statements would be a sufficiently valuable addition to the language to be worth the additional complexity that they add to the learning process, but disagrees with the idea that "simple name vs literal or attribute lookup" really offers an adequate syntactic distinction between name binding and value lookup operations in match patterns. (Even though this PEP ultimately retained that shorthand to reduce the verbosity of common use cases, it still redefines it in terms of a more explicit underlying construct).

This PEP agrees with the spirit of PEP 640 (that the chosen wildcard pattern to skip a name binding should be supported everywhere, not just in match patterns), but is now proposing a different spelling for the wildcard syntax (__ rather than ?). As such, it competes with PEP 640 as written, but would complement a proposal to deprecate the use of __ as an ordinary identifier and instead turn it into a general purpose wildcard marker that always skips making a new local variable binding.

Motivation

The original PEP 622 (which was later split into PEP 634, PEP 635, and PEP 636) incorporated an unstated but essential assumption in its syntax design: that neither ordinary expressions nor the existing assignment target syntax provide an adequate foundation for the syntax used in match patterns.

While the PEP didn't explicitly state this assumption, one of the PEP authors explained it clearly on python-dev [1]:

The actual problem that I see is that we have different cultures/intuitions fundamentally clashing here. In particular, so many programmers welcome pattern matching as an "extended switch statement" and find it therefore strange that names are binding and not expressions for comparison. Others argue that it is at odds with current assignment statements, say, and question why dotted names are _/not/_ binding. What all groups seem to have in common, though, is that they refer to _/their/_ understanding and interpretation of the new match statement as 'consistent' or 'intuitive' --- naturally pointing out where we as PEP authors went wrong with our design.

But here is the catch: at least in the Python world, pattern matching as proposed by this PEP is an unprecedented and new way of approaching a common problem. It is not simply an extension of something already there. Even worse: while designing the PEP we found that no matter from which angle you approach it, you will run into issues of seeming 'inconsistencies' (which is to say that pattern matching cannot be reduced to a 'linear' extension of existing features in a meaningful way): there is always something that goes fundamentally beyond what is already there in Python. That's why I argue that arguments based on what is 'intuitive' or 'consistent' just do not make sense _/in this case/_.

PEP 635 (and PEP 622 before it) makes a strong case that treating capture patterns as the default usage for simple names in match patterns is the right approach, and provides a number of examples where having names express value constraints by default would be confusing (this difference from C/C++ switch statement semantics is also a key reason it makes sense to use match as the introductory keyword for the new statement rather than switch).

However, PEP 635 doesn't even try to make the case for the second assertion, that treating match patterns as a variation on assignment targets also leads to inherent contradictions. Even a PR submitted to explicitly list this option in the "Rejected Ideas" section of the original PEP 622 was declined [2].

This PEP instead starts from the assumption that it is possible to treat match patterns as a variation on assignment targets, and the only essential differences that emerge relative to the syntactic proposal in PEP 634 are:

  • a requirement to offer an explicit marker prefix for value lookups rather than only allowing them to be inferred from the use of dotted names or literals; and
  • a requirement to use a non-binding wildcard marker other than _.

This PEP proposes constraint expressions as a way of addressing the first point, and changes the proposed non-binding wildcard marker to a double-underscore to address the latter.

PEP 634 also proposes special casing the literals None, True, and False so that they're compared by identity when written directly as a literal pattern, but by equality when referenced by a value pattern. This PEP eliminates the need for those special cases by proposing distinct syntax for matching by identity and matching by equality, but does accept the convenience and consistency argument in allowing None as a shorthand for is None.

Specification

This PEP retains the overall match/case statement syntax from PEP 634, and retains both the syntax and semantics for the following match pattern variants:

  • class patterns
  • group patterns
  • sequence patterns

Pattern combination (both OR and AS patterns) and guard expressions also remain the same as they are in PEP 634.

Capture patterns are essentially unchanged, except that _ becomes a regular capture pattern, due to the wildcard pattern marker changing to __.

Constraint patterns are added, offering equality constraints and identity constraints.

Literal patterns and value patterns are replaced by inferred constraint patterns, offering inferred equality constraints for strings, numbers and attribute lookups, and inferred identity constraints for None and ....

Mapping patterns change to allow arbitrary primary expressions for keys, rather than being restricted to literal patterns or value patterns.

Wildcard patterns are changed to use __ (double underscore) rather than _ (single underscore), and are also given a new dedicated node in the Abstract Syntax Tree produced by the parser.

Constraint patterns

Constraint patterns use the following simplified syntax:

constraint_pattern: id_constraint | eq_constraint
eq_constraint: '==' primary
id_constraint: 'is' primary

The constraint expression is an arbitrary primary expression - it can be a simple name, a dotted name lookup, a literal, a function call, or any other primary expression.

If this PEP were to be adopted in preference to PEP 634, then all literal and value patterns could instead be written more explicitly as constraint patterns:

# Literal patterns
match number:
    case == 0:
        print("Nothing")
    case == 1:
        print("Just one")
    case == 2:
        print("A couple")
    case == (-1):
        print("One less than nothing")
    case == (1-1j):
        print("Good luck with that...")

# Additional literal patterns
match value:
    case == True:
        print("True or 1")
    case == False:
        print("False or 0")
    case == None:
        print("None")
    case == "Hello":
        print("Text 'Hello'")
    case == b"World!":
        print("Binary 'World!'")
    case == ...:
        print("May be useful when writing __getitem__ methods?")

# Matching by identity rather than equality
SENTINEL = object()
match value:
    case is True:
        print("True, not 1")
    case is False:
        print("False, not 0")
    case is None:
        print("None, following PEP 8 comparison guidelines")
    case is SENTINEL:
        print("Matches the sentinel by identity, not just value")

# Constant value patterns
from enum import Enum
class Sides(str, Enum):
    SPAM = "Spam"
    EGGS = "eggs"
    ...

preferred_side = Sides.EGGS
match entree[-1]:
    case == Sides.SPAM:  # Compares entree[-1] == Sides.SPAM.
        response = "Have you got anything without Spam?"
    case == preferred_side:  # Compares entree[-1] == preferred_side
        response = f"Oh, I love {preferred_side}!"
    case side:  # Assigns side = entree[-1].
        response = f"Well, could I have their Spam instead of the {side} then?"

Note the == preferred_side example: using an explicit prefix marker on constraint expressions removes the restriction to only working with attributes or literals for value lookups. The == (-1) and == (1-1j) examples illustrate the use of parentheses to turn any subexpression into an atomic one.

This PEP retains the caching property specified for value patterns in PEP 634: if a particular constraint pattern occurs more than once in a given match statement, language implementations are explicitly permitted to cache the first calculation on any given match statement execution and re-use it in other clauses. (This implicit caching is less necessary in this PEP, given that explicit local variable caching becomes a valid option, but it still seems a useful property to preserve)

Inferred constraint patterns

Inferred constraint patterns use the syntax proposed for literal and value patterns in PEP 634, but arrange them differently in the proposed grammar to allow for a straightforward transformation by the parser into explicit constraints in the AST output:

inferred_constraint_pattern:
    | inferred_id_constraint # Emits same parser output as id_constraint
    | inferred_eq_constraint # Emits same parser output as eq_constraint

inferred_id_constraint:
    | 'None'
    | '...'

inferred_eq_constraint:
    | attr_constraint
    | numeric_constraint
    | strings

attr_constraint: attr !('.' | '(' | '=')
attr: name_or_attr '.' NAME
name_or_attr: attr | NAME

numeric_constraint:
    | signed_number !('+' | '-')
    | signed_number '+' NUMBER
    | signed_number '-' NUMBER
signed_number: NUMBER | '-' NUMBER

The terminology changes slightly to refer to them as a kind of constraint rather than as a kind of pattern, clearly separating the subelements inside patterns into "patterns", which define structures and name binding targets to match against, and "constraints", which look up existing values to compare against.

In practice, the key differences between this PEP's inferred constraint patterns and PEP 634's value patterns and literal patterns are that

  • inferred constraint patterns won't actually exist in the AST definition. Instead, they'll be replaced by an explicit constraint node, exactly as if they had been written with the explicit == or is prefix
  • None and ... are handled as part of a separate grammar rule, rather than needing to be handled as a special case of literal patterns in the parser
  • equality constraints are inferred for f-strings in addition to being inferred for string literals
  • inferred constraints for True and False are dropped entirely on grounds of ambiguity
  • Numeric constraints don't enforce the restriction that they be limited to complex literals (only that they be limited to single numbers, or the addition or subtraction of two such numbers)

Note: even with inferred constraints handled entirely at the parser level, it would still be possible to limit the inference of equality constraints to complex numbers if the tokeniser was amended to emit a different token type (e.g. INUMBER) for imaginary numbers. The PEP doesn't currently propose making that change (in line with its generally permissive approach), but it could be amended to do so if desired.

Mapping patterns

Mapping patterns inherit the change to replace literal patterns and value patterns with constraint patterns that allow arbitrary primary expressions:

mapping_pattern: '{' [items_pattern] '}'
items_pattern: ','.key_value_pattern+ ','?
key_value_pattern:
    | primary ':' or_pattern
    | '**' capture_pattern

However, the constraint marker prefix is not needed in this case, as the fact this is a key to be looked up rather than a name to be bound can already be inferred from its position within a mapping pattern.

This means that in simple cases, mapping patterns look exactly as they do in PEP 634:

import constants

match config:
    case {"route": route}:
        process_route(route)
    case {constants.DEFAULT_PORT: sub_config, **rest}:
        process_config(sub_config, rest)

Unlike PEP 634, however, ordinary local and global variables can also be used to match mapping keys:

ROUTE_KEY="route"
ADDRESS_KEY="local_address"
PORT_KEY="port"
match config:
    case {ROUTE_KEY: route}:
        process_route(route)
    case {ADDRESS_KEY: address, PORT_KEY: port}:
        process_address(address, port)

Note: as complex literals are written as binary operations that are evaluated at compile time, this PEP nominally requires that they be written in parentheses when used as a key in a mapping pattern. This requirement could be relaxed to match PEP 634's handling of complex numbers by also accepting numeric_constraint as defining a valid key expression, and this is how the draft reference implementation currently works (so the affected PEP 634 test cases will compile and run as expected).

Wildcard patterns

Wildcard patterns are changed to use __ (double underscore) rather than the _ (single underscore) syntax proposed in PEP 634:

match sequence:
    case [__]:               # any sequence with a single element
        return True
    case [start, *__, end]:  # a sequence with at least two elements
        return start == end
    case __:                 # anything
        return False

This PEP explicitly requires that wildcard patterns be represented in the Abstract Syntax Tree as something other than a regular Name node.

The draft reference implementation uses the node name SkippedBinding to indicate that the node appears where a simple name binding would ordinarily occur to indicate that nothing should actually be bound, but the exact name of the node is more an implementation decision than a design one. The key design requirement is to limit the special casing of __ to the parser and allow the rest of the compiler to distinguish wildcard patterns from capture patterns based entirely on the kind of the AST node, rather than needing to inspect the identifier used in Name nodes.

Design Discussion

Treating match pattern syntax as an extension of assignment target syntax

PEP 634 already draws inspiration from assignment target syntax in the design of its sequence pattern matching - while being restricted to sequences for performance and runtime correctness reasons, sequence patterns are otherwise very similar to the existing iterable unpacking and tuple packing features seen in regular assignment statements and function signature declarations.

By requiring that any new semantics introduced by match patterns be given new syntax that is currently disallowed in assignment targets, one of the goals of this PEP is to explicitly leave the door open to one or more future PEPs that enhance assignment target syntax to support some of the new features introduced by match patterns.

In particular, being able to easily deconstruct mappings into local variables seems likely to be generally useful, even when there's only one mapping variant to be matched:

{"host": host, "port": port, "mode": =="TCP"} = settings

While such code could already be written using a match statement (assuming either this PEP or PEP 634 were to be accepted into the language), an assignment statement level variant should be able to provide standardised exceptions for cases where the right hand side either wasn't a mapping (throwing TypeError), didn't have the specified keys (throwing KeyError), or didn't have the specific values for the given keys (throwing ValueError), avoiding the need to write out that exception raising logic in every case.

PEP 635 raises the concern that enough aspects of pattern matching semantics will differ from assignment target semantics that pursuing syntactic parallels will end up creating confusion rather than reducing it. However, the primary examples cited as potentially causing confusion are exactly those where the PEP 634 syntax is already the same as that for assignment targets: the fact that case patterns use iterable unpacking syntax, but only match on sequences (and specifically exclude strings and byte-strings) rather than consuming arbitrary iterables is an aspect of PEP 634 that this PEP leaves unchanged.

These semantic differences are intrinsic to the nature of pattern matching: whereas it is reasonable for a one-shot assignment statement to consume a one-shot iterator, it isn't reasonable to do that in a construct that's explicitly about matching a given value against multiple potential targets, making full use of the available runtime type information to ensure those checks are as side effect free as possible.

It's an entirely orthogonal question to how the distinction is drawn between capture patterns and patterns that check for expected values (constraint patterns in this PEP, literal and value patterns in PEP 634), and it's a big logical leap to take from "these specific semantic differences between iterable unpacking and sequence matching are needed in order to handle checking against multiple potential targets" to "we can reuse attribute binding syntax to mean equality constraints instead and nobody is going to get confused by that".

Interaction with caching of attribute lookups in local variables

The major change between this PEP and PEP 634 is to offer == EXPR for value constraint lookups, rather than only offering NAME.ATTR. The main motivation for this is to avoid the semantic conflict with regular assignment targets, where NAME.ATTR is already used in assignment statements to set attributes, so if NAME.ATTR were the only syntax for symbolic value matching, then we're pre-emptively ruling out any future attempts to allow matching against single patterns using the existing assignment statement syntax. We'd also be failing to provide users with suitable scaffolding to help build correct mental models of what the shorthand forms mean in match patterns (as compared to what they mean in assignment targets).

However, even within match statements themselves, the name.attr syntax for value patterns has an undesirable interaction with local variable assignment, where routine refactorings that would be semantically neutral for any other Python statement introduce a major semantic change when applied to a match statement.

Consider the following code:

while value < self.limit:
    ... # Some code that adjusts "value"

The attribute lookup can be safely lifted out of the loop and only performed once:

_limit = self.limit:
while value < _limit:
    ... # Some code that adjusts "value"

With the marker prefix based syntax proposal in this PEP, constraint patterns would be similarly tolerant of match patterns being refactored to use a local variable instead of an attribute lookup, with the following two statements being functionally equivalent:

match expr:
    case {"key": == self.target}:
        ... # Handle the case where 'expr["key"] == self.target'
    case _:
        ... # Handle the non-matching case

_target = self.target
match expr:
    case {"key": == _target}:
        ... # Handle the case where 'expr["key"] == self.target'
    case _:
        ... # Handle the non-matching case

By contrast, when using the syntactic shorthand that omits the marker prefix, the following two statements wouldn't be equivalent at all:

# PEP 634's value pattern syntax / this PEP's attribute constraint syntax
match expr:
    case {"key": self.target}:
        ... # Handle the case where 'expr["key"] == self.target'
    case _:
        ... # Handle the non-matching case

_target = self.target
match expr:
    case {"key": _target}:
        ... # Matches any mapping with "key", binding its value to _target
    case _:
        ... # Handle the non-matching case

This PEP offers a straightforward way to retain the original semantics under this style of simplistic refactoring: use == _target to force interpretation of the result as a constraint pattern instead of a capture pattern (i.e. drop the no longer applicable syntactic shorthand, and switch to the explicit form).

PEP 634's proposal to offer only the shorthand syntax, with no explicitly prefixed form, means that the primary answer on offer is "Well, don't do that, then, only compare against attributes in namespaces, don't compare against simple names".

PEP 622's walrus pattern syntax had another odd interaction where it might not bind the same object as the exact same walrus expression in the body of the case clause, but PEP 634 fixed that discrepancy by replacing walrus patterns with AS patterns (where the fact that the value bound to the name on the RHS might not be the same value as returned by the LHS is a standard feature common to all uses of the "as" keyword).

Using existing comparison operators as the constraint pattern prefix

If the need for a dedicated constraint pattern prefix is accepted, then the next question is to ask exactly what that prefix should be.

The initially published version of this PEP proposed using the previously unused ? symbol as the prefix for equality constraints, and ?is as the prefix for identity constraints. When reviewing the PEP, Steven D'Aprano presented a compelling counterproposal [5] to use the existing comparison operators (== and is) instead.

There were a few concerns with == as a prefix that kept it from being chosen as the prefix in the initial iteration of the PEP:

  • for common use cases, it's even more visually noisy than ?, as a lot of folks with PEP 8 trained aesthetic sensibilities are going to want to put a space between it and the following expression, effectively making it a 3 character prefix instead of 1
  • when used in a class pattern, there needs to be a space between the = keyword separator and the == prefix, or the tokeniser will split them up incorrectly (getting == and = instead of = and ==)
  • when used in a mapping pattern, there needs to be a space between the : key/value separator and the == prefix, or the tokeniser will split them up incorrectly (getting := and = instead of : and ==)
  • when used in an OR pattern, there needs to be a space between the | pattern separator and the == prefix, or the tokeniser will split them up incorrectly (getting |= and = instead of | and ==)

Rather than introducing a completely new symbol, Steven's proposed resolution to this verbosity problem was to retain the ability to omit the prefix marker in syntactically unambiguous cases.

This prompted a review of the PEP's goals and underlying concerns, and the determination that the author's core concern was with the idea of not even offering users the ability to be explicit when they wanted or needed to be, and instead telling them they could only express the intent that the compiler inferred that they wanted - they couldn't be more explicit and override the compiler's default inference when it turned out to be wrong (as it inevitably will be in at least some cases).

Given that perspective, PEP 635's arguments against using ? as part of the pattern matching syntax held for this proposal as well, and so the PEP was amended accordingly.

Using __ as the wildcard pattern marker

PEP 635 makes a solid case that introducing ? solely as a wildcard pattern marker would be a bad idea. With the syntax for constraint patterns now changed to use existing comparison operations rather than ? and ?is, that argument holds for this PEP as well.

However, as noted by Thomas Wouters in [6], PEP 634's choice of _ remains problematic as it would likely mean that match patterns would have a permanent difference from all other parts of Python - the use of _ in software internationalisation and at the interactive prompt means that there isn't really a plausible path towards using it as a general purpose "skipped binding" marker.

__ is an alternative "this value is not needed" marker drawn from a Stack Overflow answer [7] (originally posted by the author of this PEP) on the various meanings of _ in existing Python code.

This PEP also proposes adopting an implementation technique that limits the scope of the associated special casing of __ to the parser: defining a new AST node type (SkippedBinding) specifically for wildcard markers.

Within the parser, __ would still mean either a regular name or a wildcard marker in a match pattern depending on where you were in the parse tree, but within the rest of the compiler, Name("__") would still be a regular name, while SkippedBinding() would always be a wildcard marker.

Unlike _, the lack of other use cases for __ means that there would be a plausible path towards restoring identifier handling consistency with the rest of the language by making it mean "skip this name binding" everwhere in Python:

  • in the interpreter itself, deprecate loading variables with the name __. This would make reading from __ emit a deprecation warning, while writing to it would initially be unchanged. To avoid slowing down all name loads, this could be handled by having the compiler emit additional code for the deprecated name, rather than using a runtime check in the standard name loading opcodes.
  • after a suitable number of releases, change the parser to emit SkippedBinding for all uses of __ as an assignment target, not just those appearing inside match patterns
  • consider making __ a true hard keyword rather than a soft keyword

This deprecation path couldn't be followed for _, as there's no way for the interpreter to distinguish between attempts to read back _ when nominally used as a "don't care" marker, and legitimate reads of _ as either an i18n text translation function or as the last statement result at the interactive prompt.

Names starting with double-underscores are also already reserved for use by the language, whether that is for compile time constants (i.e. __debug__), special methods, or class attribute name mangling, so using __ here would be consistent with that existing approach.

Keeping inferred equality constraints

An early (not widely publicised) draft of this proposal considered keeping PEP 634's literal patterns, as they don't inherently conflict with assignment statement syntax the way that PEP 634's value patterns do (trying to assign to a literal is already a syntax error, whereas assigning to a dotted name sets the attribute).

They were removed in the initially published version due to the fact that they have the same syntax sensitivity problem as attribute constraints do, where naively attempting to move the literal pattern out to a local variable for naming clarity turns the value checking literal pattern into a name binding capture pattern:

# PEP 634's literal pattern syntax / this PEP's literal constraint syntax
match expr:
    case {"port": 443}:
        ... # Handle the case where 'expr["port"] == 443'
    case _:
        ... # Handle the non-matching case

HTTPS_PORT = 443
match expr:
    case {"port": HTTPS_PORT}:
        ... # Matches any mapping with "port", binding its value to HTTPS_PORT
    case _:
        ... # Handle the non-matching case

With explicit equality constraints, this style of refactoring keeps the original semantics (just as it would for a value lookup in any other statement):

# This PEP's equality constraints
match expr:
    case {"port": == 443}:
        ... # Handle the case where 'expr["port"] == 443'
    case _:
        ... # Handle the non-matching case

HTTPS_PORT = 443
match expr:
    case {"port": == HTTPS_PORT}:
        ... # Handle the case where 'expr["port"] == 443'
    case _:
        ... # Handle the non-matching case

As noted above, both literal patterns and value patterns made their return (in the form of inferred equality constraints) as a way to address the verbosity problem of offering explicit == prefixed equality constraints as the only way to express equality checks.

However, the presence of the explicit constraint nodes in the AST means that these special cases can be limited to the parser, with the implicit forms emitting the same AST nodes as their explicit counterparts.

Inferring equality constraints for f-strings

This is less a design decision in its own right, and more a consequence of other design decisions:

  • the tokeniser and parser don't distinquish f-strings from other kinds of strings, so inferring an explicit equality constraint for f-strings happens by default when defining the match pattern parser rule for string literals
  • the rest of the compiler then treats that output like any other explicit equality constraint in an AST pattern node (i.e. allowing arbitary expressions)

This combination of factors makes it awkward to implement a special case that disallows inferring equality constraints for f-strings while accepting them for string literals, so the PEP instead opts to just allow them (as they're just as syntactically unambiguous as any other string in a match pattern).

Keeping inferred identity constraints

PEP 635 makes a reasonable case that interpreting a check against None as == None would almost always be incorrect, whereas interpreting it as is None (as advised in PEP 8) would almost always be what the user intended.

Similar reasoning applies to checking against ....

Accordingly, this PEP defines the use of either of these tokens as implying an identity constraint.

However, as with inferred equality contraints, inferred identity constraints become explicit identity constraints in the parser output.

Disallowing inferred constraints for True and False

PEP 635 makes a reasonable case that comparing the True, and False literals by equality by default is problematic. PEP 8 advises against writing those comparisons out explicitly in code, so it doesn't make sense for us to implement a construct that does so implicitly inside the interpreter.

Unlike PEP 635, however, this PEP proposes to resolve the discrepancy by leaving these two names out of the initial iteration of the inferred constraint syntax definition entirely, rather than treating them as implying an identity constraint.

This means comparisons against True and False in match patterns would need to be written in one of the following forms:

  • comparison by numeric value:

    case 0:
        ...
    case 1:
        ...
    
  • comparison by equality (equivalent to comparison by numeric value):

    case == False:
        ...
    case == True:
        ...
    
  • comparison by identity:

    case is False:
        ...
    case is True:
        ...
    
  • comparison by value with class check (equivalent to comparison by identity):

    case bool(False):
        ...
    case bool(True):
        ...
    
  • comparison by boolean coercion:

    case (x, p) if not p:
        ...
    case (x, p) if p:
        ...
    

The last approach is the one that would most closely follow PEP 8's guidance for if-elif chains (comparing by boolean coercion), but it's far from clear at this point how True and False literals will end up being used in pattern matching use cases.

In particular, PEP 635's assessment that users will probably mean "comparison by value with class check", which effectively becomes "comparison by identity" due to True and False being singletons, is a genuinely plausible suggestion.

However, rather than attempting to guess up front, this PEP proposes that no shorthand form be offered for these two constants in the initial implementation, and we instead wait and see if a clearly preferred meaning emerges from actual usage of the new construct.

Inferred constraints rather than implied constraints

This PEP uses the term "inferred contraint" to make it clear that the parser is making assumptions about the user's intent when converting an inferred constraint to an explicit one.

Calling them "implied constraints" instead would also be reasonable, but that phrasing has a slightly stronger connotation that the inference is always going to be correct, and one of the motivations of this PEP is that the inference isn't always going to be correct, so we should be offering a way for users to be explicit when the parser's assumptions don't align with their intent.

Deferred Ideas

Allowing negated constraints in match patterns

The requirement that constraint expressions be primary expressions means that it isn't permitted to write != expr or is not expr.

Both of these forms have clear potential interpretions as a negated equality constraint (i.e. x != expr) and a negated identity constraint (i.e. x is not expr).

However, it's far from clear either form would come up often enough to justify the dedicated syntax, so the extension has been deferred pending further community experience with match statements.

Allowing containment checks in match patterns

The syntax used for equality and identity constraints would be straightforward to extend to containment checks: in container.

One downside of the proposals in both this PEP and PEP 634 is that checking for multiple values in the same case doesn't look like any existing set membership check in Python:

# PEP 634's literal patterns / this PEP's inferred constraints
match value:
    case 0 | 1 | 2 | 3:
        ...

Explicit equality constraints also become quite verbose if they need to be repeated:

match value:
    case == one | == two | == three | == four:
        ...

Containment constraints would provide a more concise way to check if the match subject was present in a container:

match value:
    case in {0, 1, 2, 3}:
        ...
    case in {one, two, three, four}:
        ...
    case in range(4): # It would accept any container, not just literal sets
        ...

Such a feature would also be readily extensible to allow all kinds of case clauses without any further syntax updates, simply by defining __contains__ appropriately on a custom class definition.

However, while this does seem like a useful extension, it isn't essential to making match statements a valuable addition to the language, so it seems more appropriate to defer it to a separate proposal, rather than including it here.

Rejected Ideas

Restricting permitted expressions in constraint patterns and mapping pattern keys

While it's entirely technically possible to restrict the kinds of expressions permitted in constraint patterns and mapping pattern keys to just attribute lookups and constant literals (as PEP 634 does), there isn't any clear runtime value in doing so, so this PEP proposes allowing any kind of primary expression (primary expressions are an existing node type in the grammar that includes things like literals, names, attribute lookups, function calls, container subscripts, parenthesised groups, etc).

While PEP 635 does emphasise several times that literal patterns and value patterns are not full expressions, it doesn't ever articulate a concrete benefit that is obtained from that restriction (just a theoretical appeal to it being useful to separate static checks from dynamic checks, which a code style tool could still enforce, even if the compiler itself is more permissive).

The last time we imposed such a restriction was for decorator expressions and the primary outcome of that was that users had to put up with years of awkward syntactic workarounds (like nesting arbitrary expressions inside function calls that just returned their argument) to express the behaviour they wanted before the language definition was finally updated to allow arbitrary expressions and let users make their own decisions about readability.

The situation in PEP 634 that bears a resemblance to the situation with decorator expressions is that arbitrary expressions are technically supported in value patterns, they just require awkward workarounds where either all the values to match need to be specified in a helper class that is placed before the match statement:

# Allowing arbitrary match targets with PEP 634's value pattern syntax
class mt:
    value = func()
match expr:
    case (_, mt.value):
        ... # Handle the case where 'expr[1] == func()'

Or else they need to be written as a combination of a capture pattern and a guard expression:

match expr:
    case (_, _matched) if _matched == func():
        ... # Handle the case where 'expr[1] == func()'

This PEP proposes skipping requiring any such workarounds, and instead supporting arbitrary value constraints from the start:

match expr:
    case (__, == func()):
        ... # Handle the case where 'expr == func()'

Whether actually writing that kind of code is a good idea would be a topic for style guides and code linters, not the language compiler.

In particular, if static analysers can't follow certain kinds of dynamic checks, then they can limit the permitted expressions at analysis time, rather than the compiler restricting them at compile time.

There are also some kinds of expressions that are almost certain to give nonsensical results (e.g. yield, yield from, await) due to the pattern caching rule, where the number of times the constraint expression actually gets evaluated will be implementation dependent. Even here, the PEP takes the view of letting users write nonsense if they really want to.

Aside from the recenty updated decorator expressions, another situation where Python's formal syntax offers full freedom of expression that is almost never used in practice is in except clauses: the exceptions to match against almost always take the form of a simple name, a dotted name, or a tuple of those, but the language grammar permits arbitrary expressions at that point. This is a good indication that Python's user base can be trusted to take responsibility for finding readable ways to use permissive language features, by avoiding writing hard to read constructs even when they're permitted by the compiler.

This permissiveness comes with a real concrete benefit on the implementation side: dozens of lines of match statement specific code in the compiler is replaced by simple calls to the existing code for compiling expressions (including in the AST validation pass, the AST optimization pass, the symbol table analysis pass, and the code generation pass). This implementation benefit would accrue not just to CPython, but to every other Python implementation looking to add match statement support.

Requiring the use of constraint prefix markers for mapping pattern keys

The initial (unpublished) draft of this proposal suggested requiring mapping pattern keys be constraint patterns, just as PEP 634 requires that they be valid literal or value patterns:

import constants

match config:
    case {?"route": route}:
        process_route(route)
    case {?constants.DEFAULT_PORT: sub_config, **rest}:
        process_config(sub_config, rest)

However, the extra character was syntactically noisy and unlike its use in constraint patterns (where it distinguishes them from capture patterns), the prefix doesn't provide any additional information here that isn't already conveyed by the expression's position as a key within a mapping pattern.

Accordingly, the proposal was simplified to omit the marker prefix from mapping pattern keys.

This omission also aligns with the fact that containers may incorporate both identity and equality checks into their lookup process - they don't purely rely on equality checks, as would be incorrectly implied by the use of the equality constraint prefix.

Providing dedicated syntax for binding matched constraint values

The initial (unpublished) draft of this proposal suggested allowing NAME?EXPR as a syntactically unambiguous shorthand for PEP 622's NAME := BASE.ATTR or PEP 634's BASE.ATTR as NAME.

This idea was dropped as it complicated the grammar for no gain in expressiveness over just using the general purpose approach to combining capture patterns with other match patterns (i.e. ?EXPR as NAME at the time, == EXPR as NAME now) when the identity of the matching object is important.

This idea is even less appropriate after the switch to using existing comparison operators as the marker prefix, as both NAME == EXPR and NAME is EXPR would look like ordinary comparison operations, with nothing to suggest that NAME is being bound by the pattern matching process.

Reference Implementation

A reference implementation for this PEP [3] has been derived from Brandt Bucher's reference implementation for PEP 634 [4].

Relative to the text of this PEP, the draft reference implementation currently implements the variant of mapping patterns where numeric constraints are accepted in addition to primary expressions (this allowed the PEP 634 mapping pattern checks for complex keys to run as written).

All other modified patterns have been updated to follow this PEP rather than PEP 634.

The AST validator for match patterns has not yet been implemented.

There is an implementation decision still to be made around representing constraint operators in the AST. The draft implementation adds them as new cases on the existing UnaryOp node, but there's an argument to be made that they would be better implemented as a new Constraint node, since they're accepted at different points in the syntax tree than other unary operators.

Acknowledgments

The PEP 622 and PEP 634/635/636 authors, as the proposal in this PEP is merely an attempt to improve the readability of an already well-constructed idea by proposing that reusing the existing attribute binding syntax to mean an attribute lookup will be more easily understood as syntactic sugar for a more explicit underlying expression that's compatible with the existing binding target syntax than it will be as the only way to spell such comparisons in match patterns.

Steven D'Aprano, who made a convincing case that the key goals of this PEP could be achieved by using existing comparison tokens to add the ability to override the compiler when our guesses as to "what most users will want most of the time" are inevitably incorrect for at least some users some of the time, and retaining some of PEP 634's syntactic sugar (with a slightly different semantic definition) to obtain the same level of brevity as PEP 634 in most situations. (Paul Sokolosvsky also independently suggested using == instead of ? as a more easily understood prefix for equality constraints).

Thomas Wouters, whose publication of PEP 640 and public review of the structured pattern matching proposals persuaded the author of this PEP to continue advocating for a wildcard pattern syntax that a future PEP could plausibly turn into a hard keyword that always skips binding a reference in any location a simple name is expected, rather than continuing indefinitely as the match pattern specific soft keyword that is proposed here.

References

[1]Post explaining the syntactic novelties in PEP 622 https://mail.python.org/archives/list/python-dev@python.org/message/2VRPDW4EE243QT3QNNCO7XFZYZGIY6N3/>
[2]Declined pull request proposing to list this as a Rejected Idea in PEP 622 https://github.com/python/peps/pull/1564
[3]In-progress reference implementation for this PEP https://github.com/ncoghlan/cpython/tree/pep-642-constraint-patterns
[4]PEP 634 reference implementation https://github.com/python/cpython/pull/22917
[5]Steven D'Aprano's cogent criticism of the first published iteration of this PEP https://mail.python.org/archives/list/python-dev@python.org/message/BTHFWG6MWLHALOD6CHTUFPHAR65YN6BP/
[6]Thomas Wouter's initial review of the structured pattern matching proposals https://mail.python.org/archives/list/python-dev@python.org/thread/4SBR3J5IQUYE752KR7C6432HNBSYKC5X/
[7]Stack Overflow answer regarding the use cases for _ as an identifier https://stackoverflow.com/questions/5893163/what-is-the-purpose-of-the-single-underscore-variable-in-python/5893946#5893946

Appendix A -- Full Grammar

Here is the full modified grammar for match_stmt, replacing Appendix A in PEP 634.

Notation used beyond standard EBNF is as per PEP 534:

  • 'KWD' denotes a hard keyword
  • "KWD" denotes a soft keyword
  • SEP.RULE+ is shorthand for RULE (SEP RULE)*
  • !RULE is a negative lookahead assertion
match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
subject_expr:
    | star_named_expression ',' [star_named_expressions]
    | named_expression
case_block: "case" patterns [guard] ':' block
guard: 'if' named_expression

patterns: open_sequence_pattern | pattern
pattern: as_pattern | or_pattern
as_pattern: or_pattern 'as' capture_pattern
or_pattern: '|'.closed_pattern+
closed_pattern:
    | capture_pattern
    | wildcard_pattern
    | constraint_pattern
    | inferred_constraint_pattern
    | group_pattern
    | sequence_pattern
    | mapping_pattern
    | class_pattern

capture_pattern: !"__" NAME !('.' | '(' | '=')

wildcard_pattern: "__"

constraint_pattern:
    | eq_constraint
    | id_constraint
eq_constraint: '==' primary
id_constraint: 'is' primary

inferred_constraint_pattern:
    | inferred_id_constraint
    | inferred_eq_constraint

inferred_id_constraint[expr_ty]:
    | 'None'
    | '...'

inferred_eq_constraint:
    | attr_constraint
    | numeric_constraint
    | strings

attr_constraint: attr !('.' | '(' | '=')
attr: name_or_attr '.' NAME
name_or_attr: attr | NAME
numeric_constraint:
    | signed_number !('+' | '-')
    | signed_number '+' NUMBER
    | signed_number '-' NUMBER
signed_number: NUMBER | '-' NUMBER

group_pattern: '(' pattern ')'

sequence_pattern:
| '[' [maybe_sequence_pattern] ']'
| '(' [open_sequence_pattern] ')'
open_sequence_pattern: maybe_star_pattern ',' [maybe_sequence_pattern]
maybe_sequence_pattern: ','.maybe_star_pattern+ ','?
maybe_star_pattern: star_pattern | pattern
star_pattern: '*' (capture_pattern | wildcard_pattern)

mapping_pattern: '{' [items_pattern] '}'
items_pattern: ','.key_value_pattern+ ','?
key_value_pattern:
    | primary ':' pattern
    | double_star_pattern
double_star_pattern: '**' capture_pattern

class_pattern:
    | name_or_attr '(' [pattern_arguments ','?] ')'
pattern_arguments:
    | positional_patterns [',' keyword_patterns]
    | keyword_patterns
positional_patterns: ','.pattern+
keyword_patterns: ','.keyword_pattern+
keyword_pattern: NAME '=' pattern
Source: https://github.com/python/peps/blob/master/pep-0642.rst