[issue46896] add support for watching writes to selected dictionaries

Thu Mar 10 15:40:35 EST 2022

Carl Meyer <carl at oddbird.net> added the comment:

Thanks for outlining the use cases. They make sense.

The current PR provides a flexible generic API that fully supports all three of those use cases (use cases 2 and 3 are strict subsets of use case 1.) Since the callback is called before the dict is modified, all the necessary information is available to the callback to decide whether the event is interesting to it or not.

The question is how much of the bookkeeping to classify events as "interesting" or "uninteresting" should be embedded in the core dispatch vs being handled by the callback.

One reason to prefer keeping this logic in the callback is that with potentially multiple chained callbacks in play, the filtering logic must always exist in the callback, regardless. E.g. if callback A wants to watch only keys-version changes to dict X, but callback B wants to watch all changes to it, events will fire for all changes, and callback A must still disregard "uninteresting" events that it may receive (just like it may receive events for dicts it never asked to watch at all.) So providing API for different "levels" of watching means that the "is this event interesting to me" predicate must effectively be duplicated both in the callback and in the watch level chosen.

The proposed rationale for this complexity and duplication is the idea that filtering out uninteresting events at dispatch will provide better performance. But this is hypothetical: it assumes the existence of perf-bottleneck code paths that repeatedly rebind globals. The only benchmark workload with this characteristic that I know of is pystone, and it is not even part of the pyperformance suite, I think precisely because it is not representative of real-world code patterns. And even assuming that we do need to optimize for such code, it's also not obvious that it will be noticeably cheaper in practice to filter on the dispatch side.

It may be more useful to focus on API. If we get the API right, internal implementation details can always be adjusted in future if a different implementation can be shown to be noticeably faster for relevant use cases. And if we get existing API right, we can always add new API if we have to. I don't think anything about the proposed simple API precludes adding `PyDict_WatchKeys` as an additional feature, if it turns out to be necessary.

One modification to the simple proposed API that should improve the performance (and ease of implementation) of use case #2 would be to split the current `PyDict_EVENT_MODIFIED` into two separate event types: `PyDict_EVENT_MODIFIED` and `PyDict_EVENT_NEW_KEY`. Then the callback-side event filtering for use case #2 would just be `event == PyDict_EVENT_NEW_KEY` instead of requiring a lookup into the dict to see whether the key was previously set or not.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue46896>
_______________________________________