[Import-SIG] PEP 420 issue: extend_path

Wed May 9 10:19:20 CEST 2012

On Wed, May 9, 2012 at 4:48 PM,  <martin at v.loewis.de> wrote:
>>> I'm not sure why we *need* a list of portions, but if we do, simple
>>> return values seem like the way to go.  But the 2-element tuple wins
>>> even in the single path portion case, and the tuple-return protoocol is
>>> extensible if we need more data returned in future anyway.
>>
>>
>> Nick laid out a use case in a previous email. It makes sense to me. For
>> example, a zip file could contain multiple portions from the same
>> namespace package. You'd need a new path hook or mods to zipimport, but
>> it's conceivable.
>
>
> I must have missed Nick's message where he explained it, so I still need
> to ask again: how exactly would such a zip file be structured?
>
> I fail to see the need to ever report both a loader and a portion,
> as well as the need to report multiple portions, for a single sys.path
> item. That sounds like an unnecessary complication.

My actual objection is the same as Antoine's: that needing to
introspect the result of find_loader() to handle the PEP 420 use case
is a code smell that suggests the API design is flawed. The problem I
had with it was that find_loader() needs to report on 3 different
scenarios:

1. I am providing a loader to fully load this module, stop scanning
the path hooks
2. I am contributing to a potential namespace package, keep scanning
the path hooks
3. I have nothing to provide for that name, keep scanning the path hooks.

Using the type of the return value (or whether or not it has a
"load_module" attribute) to decide between scenario 1 and 2 just feels
wrong.

My proposed alternative was to treat the "portion_found" event as a
callback rather than as something to be handled via the return value.
Then loaders would be free to report as many portions as they wished,
with the final "continue scanning or not" decision handled via the
existing "loader or None" semantics.

The example I happened to use to illustrate the difference was one
where a loader actually internally implements its *own* path scan of
multiple locations. I wasn't specifically thinking of zipfiles, but
you could certainly use it that way. The core concept was that a
single entry on the main path would be handed off to a finder that
actually knew about *multiple* code locations, and hence may want to
report multiple path portions.

The 3 scenarios above would then correspond to:

1. Loader was returned (doesn't matter if callback was invoked)
2. None was returned, callback was invoked one or more times
2. None was returned, callback was never invoked

Eric's counter-proposal is to handle the 3 scenarios as:

1. (<loader>, <don't care>)
2. (None, [<path entries>])
3. (None, [])

Yet another option would be to pass a namespace_path list object
directly into the find_loader() call, instead of passing
namespace_path.append as a callback. Then the loader would append any
portions it finds directly to the list, with the return value again
left as the simple choice between a loader or None.

One final option would be add an optional "extend_namespace" method to
*loader* objects. Then the logic would become, instead of type
introspection, more like the following:

    loader = find_loader(fullpath)
    try:
        extend_namespace = loader.extend_namespace
    except AttributeError:
        pass
    else:
        if extend_namespace(namespace_path):
            # The loader contributed to the namespace package rather
than loading the full module
            continue
    if loader is not None:
        return loader

It's definitely the switch-statement feel of the proposed type checks
that rubs me the wrong way, though. Supporting multiple portions from
a single loader was just the most straightforward example I could
think of a limitation imposed by that mechanism.

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia