[Distutils] Changing the "install hooks" mechanism for PEP 426

Thu Aug 15 00:32:14 CEST 2013

On Wed, Aug 14, 2013 at 3:14 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 14 August 2013 14:00, PJ Eby <pje at telecommunity.com> wrote:
>> On Wed, Aug 14, 2013 at 11:36 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>> * group - name of the export group to hook
>>> * preupdate - export to call prior to installing/updating/removing a
>>> distribution that exports this export group
>>> * postupdate - export to call after installing/updating/removing a
>>> distribution that exports this export group
>>> * refresh - export to call to resynchronise any caches with the system
>>> state. This will be invoked for every distribution on the system that
>>> exports this export group any time the distribution defining the
>>> export hook is itself installed or upgraded
>>
>> I've reread your post a few times and I'm not sure I understand it.
>> Let me try and spell out a scenario to see if I've got it:
>>
>> * Distribution A defines a refresh hook for group 'foo.bar' -- but
>> doesn't export anything in that group
>> * Distribution B defines an *export* (fka "entry point") -- any export
>> -- in export group 'foo.bar' -- but doesn't define any hooks
>> * Distribution A's refresh hook will be notified when B is installed,
>> updated, or removed
>
> No, A's preupdate and postupdate hooks would fire when B (or any other
> distro exporting the "foo.bar" group) is installed/updated/removed.
> refresh() would fire only when A was installed or updated.

Huh?  So refresh is only relevant to the package itself?  I guess I
don't understand the point of that, since you get the same info from
postupdate then, no?

> I realised that my proposed signature for the refresh() hook is wrong,
> though, since it doesn't deal with catching up on *removed*
> distributions properly. Rather than being called multiple times,
> refresh() instead needs to be called with an iterable providing the
> metadata for all currently installed distributions that export that
> group.

Ah.  But then why is it called for A, instead of..  oh, I think I see
now.  Gotcha.

This is the sort of thing that examples are really made for, so you
can see the use cases for the different hooks.

>> If so, my confusion is probably because of overloading of the term
>> "export" in this context; among other things, it's unclear whether
>> this is a separate data structure from exports themselves...  and if
>> so, why?
>
> Where "exports" is about publishing entries into an export group, the
> new "export_hooks" field would be about *subscribing* to an export
> group and being told about changes to it.

That's not actually a justification for not using exports.

> While you could use a naming convention to defined these hooks
> directly in "exports" without colliding with the export of the group
> itself, but I think it's better to separate them out so you can do
> stricter validation on the permitted keys and values (the rationale is
> similar to that for separating out commands from more general exports,
> and exports from arbitrary metadata extensions).

The separation of commands is (just barely) justifiable because it's
not a runtime use, it's installer use.

Stricter validation, OTOH, is a completely bogus justification for not
using exports, otherwise nobody would ever have any reason to use
exports, everybody would have to define their own extensions so they
could have stricter validation.  ;-)

The solution to providing more validation is to use *more* export groups, e.g.:

[mebs.export_validators]
mebs.refresh = module.that.validates.keys.in.the.refresh.group:somefunc

(In other words, define hooks for validating export groups, the way
setuptools uses an entry point group for validating setup keywords.)

Of course, even without that possibility, the stricter validation
concept is kind of bogus here: the only thing you can really validate
is that syntactically valid group names are being used as export
names, which isn't much of a validation.  You can't *semantically*
validate them, since there is no global registry of group names.  So
what's the point?

The build system *should* reserve at least one (subdivisible)
namespace for itself, and use that mechanism for its own extension,
for two reasons:

1. Entities should not be multiplied beyond necessity,
2. It serves as an example of how exports are to be used, and
3. The API is reusable...

No, three reasons!  Wait, I'll come in again...  the API is reusable,
it serves as an example, no duplication, and namespaces are a good
idea, let's do more of them...  no, four reasons...  chief amongst the
reasonry...

Seriously: I can *sort of* see a reason to keep commands separate, but
that's a "meh".  I admittedly just grabbed it as a handy way to
shoehorn that functionality into setuptools.

But keeping extensions to the build system itself in a separate place?
 No, a thousand times no.  This sort of extensibility is *precisely*
what the darn things are *for*.  If the build system doesn't use them,
what's the point?

> Mostly so you can validate them and display them differently, and
> avoid reserving any part of the shared namespace. I find documentation
> is also easier when the core use cases aren't wedged into the
> extension mechanisms (even if they share implementation details under
> the hood).

How is the documentation easier in this case?  Can you given an example?

I personally don't see the problem here, in part because this hook
mechanism can (and perhaps should) be described as a separate PEP.  In
fact, the more that you use extension facilities to implement features
of this sort, the *easier* it is to perform this separation of docs,
because you can write individual docs assuming the extension mechanism
is understood.

Already, PEP 426 is bigger than PEP 333 (!), and it might be wise to
break it into sub-PEPs anyway.  For example:

1. Main PEP, describes the format and core types' syntax, references
the other PEPs
2. Dependency and versioning PEP
3. Exports PEP (including an API proposal)
4. Build system extensions PEP (covering the hooks discussed in this
thread, referencing #3)

This would make the overall thing a lot more comprehensible, and the
audiences for #3 and #4 would be limited compared to #1 and #2.  Build
system developers and extenders would need to read all 4, but there
would be a well-gradated learning curve from "here's the rough concept
and JSON schema" to "now you are a Jedi, my Padawan".  ;-)

In particular, this arrangement means that the language of each PEP
can simply reference assumed-to-be-understood terms from prior PEPs,
and sufficient space can be allotted to explaining the use cases and
providing examples relevant to that layer of the system.  (Also, it
ought to make community review and consensus of the various PEPs
easier, too.)

IOW, another value of reusing the existing export mechanism is that if
you are going to implement something related to PEP #4, you *need* to
be sufficiently versed in the concepts of PEP #3 anwyay -- introducing
a different data structure and API for the metadata is just
duplication of entities and an extra thing to learn, instead of
reusing the One Obvious Way to handle importable hooks.