[Distutils] API for finding plugins

Wed Feb 8 06:36:25 CET 2006

At 09:40 PM 2/7/2006 -0600, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>I'm assuming here that the problem is needing to import each command to 
>>get its description and display it?
>
>Oh, yes, that too ;)  That probably is the bigger problem, and 
>inevitable.  That doesn't happen except with help.  So maybe I am worried 
>about nothing.

Probably.  ;)  A fix, however, would be to change your entry point names to 
include a description.  Currently, entry point names can contain any 
characters you like besides  '='.  (And leading/trailing spaces are skipped.)

This means that you can define entry points like this, as long as you do 
the name parsing in your own tools:

    commit (ci,checkin) - Commit the current version = some.module:commit

So, this entry point would have a name of "commit (ci,checkin) - Commit the 
current version", and you can then parse it to get what you need.  I plan 
to use this format for "nest" command entry points in 0.7, with the 
parenthesized names being used to identify aliases that refer to the same 
command.  This will let a help command operate without needing to import 
anything, and in particular it means that any extras required for the 
command won't have to be downloaded and installed unless you try to do 
something with it more directly.

The downside to this approach is that you can't just look up a command name 
directly, you have to iterate over all entry points in the command group 
until you find a match.  However, this search would take place anyway, it's 
just hidden under the API, so the performance is the same apart from the 
parsing overhead.

>>I'm not aware of any existing standards, either, but I'm thinking that 
>>what's needed is more of an API for resource retrieval keyed on some set 
>>of simple values or wildcards, with some way to aggregate search results 
>>from multiple sources (so that e.g. databases, eggs, and directory trees) 
>>can all "offer" resources on demand.
>>I'm thinking that you would call this API (at the low level) via 
>>something like:
>>      my_page_source = resource_set.find_resource(
>>          resource=['my_page'], for_project=['MyProject'],
>>          locale=['en','de'], layer=['some_layer', 'other_layer'],
>>      ).as_string()
>
>Are the arguments arbitrary?  I.e., can I add my own?  Like 
>domain=['blog.ianbicking.org', 'ianbicking.org'], or user=['ianb', None]?

Yeah, that's the idea.

>Are resources typed in any way?  Similar to entry point groups...?

I think resources should only provide access to string/stream/filename (ala 
pkg_resources resources) and their metadata attributes (like locale, layer, 
etc.)  If you want to have more elaborate typing, you can simply use 
another attribute to define it.  For example, a content_type attribute or 
an attribute that says what entry point to use to adapt the resource to 
some interface.

>>Of course, this would in most cases be wrapped by some higher-level API 
>>that eliminates most of the parameters from needing to be specified (e.g. 
>>by a framework that knows what locales and skin layers are in effect and 
>>what project the requesting code is calling from).  For performance, you 
>>could extract subset resource sets and use them instead of querying a 
>>top-level resource set.
>
>I often find myself wanting to just override one little bit.  Subset 
>resources would potentially break that, unless they are a subset that is 
>resolved all at once instead of a subset that has to provide all the 
>necessary resources.  At least if you are describing what I think you are 
>describing.

I'm not sure I follow you.  If it's something you "override", then you'd 
have to leave it out of your subset criteria.  What I'm describing is the 
ability to have a subset snapshot for performance reasons, not simply a 
restricted view over a larger set.  (Although that also sounds like a 
useful thing to have.)

Mainly, my concerns about this approach are that, without tuning or hinting 
for a particular access profile, it's going to be tough to have a fast data 
structure that's also memory-efficient.  Creating indexes on all attributes 
means a space consumption of roughly one dictionary or set per unique 
attribute value.  That is, every relatively-unique key consumes a 
dictionary of its own, consuming hundreds of bytes.  Every resource will 
have at least 1 relatively-unique attribute value, namely its ID.

On the plus side, even as the total number of resources grows due to 
variants, the raw overhead for the dictionaries should remain the same, 
since each new language or layer will only add one new unique key (the 
language or layer).  So, it's probably not as bad to just index everything 
as I'm worrying it would be.

Efficiently handling search precedence across multiple resource providers 
is also an interesting problem.  You really want the result precedence to 
be based on stuff like the locale and layers, *not* on which provider found 
the data.  This means that searches like the example I gave have to either 
be broken down into a variety of single-value searches done in sequence, 
each one executed in "parallel" against all backends.  Either that, or 
there has to be a kind of sort-merge done against results yielded by the 
backends to ensure that the "best" results are yielded first.

Probably the best thing to do is going to be to require searches to be 
prioritized on input, e.g.:

     find_resource(
        ('resource', ['my_page']),
        ('for_project', ['MyProject']),
        ('layer', ['some_layer','other_layer']),
        ('locale', ['en','de']),
     ...

The above is saying, "first look for resources named my_page that are for 
MyProject, and of those you find, give precedence to 'some_layer' over 
'other_layer' ones.  And within those, give precedence to locales of 'en' 
then 'de'.

This approach has the benefit of allowing entire backends to be excluded 
early from the search, since it doesn't matter what layers or locales an 
egg has resources for if it doesn't have 'my_page' for 'MyProject'.