[Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

Wed Nov 11 04:17:53 EST 2015

In case it's useful to make this discussion more concrete, here's a
sketch of what the pip code for dealing with a build system defined by
a Python API might look like:

    https://gist.github.com/njsmith/75818a6debbce9d7ff48

Obviously there's room to build on this to get much fancier, but
AFAICT even this minimal version is already enough to correctly handle
all the important stuff -- schema version checking, error reporting,
full args/kwargs/return values. (It does assume that we'll only use
json-serializable data structures for argument and return values, but
that seems like a good plan anyway. Pickle would probably be a bad
idea because we're crossing between two different python environments
that may have different or incompatible packages/classes available.)

-n

On Wed, Nov 11, 2015 at 1:04 AM, Nathaniel Smith <njs at pobox.com> wrote:
> On Tue, Nov 10, 2015 at 11:27 PM, Robert Collins
> <robertc at robertcollins.net> wrote:
>> On 11 November 2015 at 19:49, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>> On 11 November 2015 at 16:19, Robert Collins <robertc at robertcollins.net> wrote:
>> ...>> pip is going to be invoking a CLI *no matter what*. Thats a hard
>>>> requirement unless Python's very fundamental import behaviour changes.
>>>> Slapping a Python API on things is lipstick on a pig here IMO: we're
>>>> going to have to downgrade any richer interface; and by specifying the
>>>> actual LCD as the interface it is then amenable to direct exploration
>>>> by users without them having to reverse engineer an undocumented thunk
>>>> within pip.
>>>
>>> I'm not opposed to documenting how pip talks to its worker CLI - I
>>> just share Nathan's concerns about locking that down in a PEP vs
>>> keeping *that* CLI within pip's boundary of responsibilities, and
>>> having a documented Python interface used for invoking build systems.
>>
>> I'm also very wary of something that would be an attractive nuisance.
>> I've seen nothing suggesting that a Python API would be anything but:
>>  - it won't be usable [it requires the glue to set up an isolated
>> context, which is buried in pip] in the general case
>
> This is exactly as true of a command line API -- in the general case
> it also requires the glue to set up an isolated context. People who go
> ahead and run 'flit' from their global environment instead of in the
> isolated build environment will experience exactly the same problems
> as people who go ahead and import 'flit.build_system_api' in their
> global environment, so I don't see how one is any more of an
> attractive nuisance than the other?
>
> AFAICT the main difference is that "setting up a specified Python
> context and then importing something and exploring its API" is
> literally what I do all day as a Python developer. Either way you have
> to set stuff up, and then once you do, in the Python API case you get
> stuff like tab completion, ipython introspection (? and ??), etc. for
> free.
>
>>  - no matter what we do, pip can't benefit from it beyond the
>> subprocess interface pip needs, because pip *cannot* import and use
>> the build interface
>
> Not sure what you mean by "benefit" here. At best this is an argument
> that the two options have similar capabilities, in which case I would
> argue that we should choose the one that leads to simpler and thus
> more probably bug-free specification language.
>
> But even this isn't really true -- the difference between them is that
> either way you have a subprocess API, but with a Python API, the
> subprocess interface that pip uses has the option of being improved
> incrementally over time -- including, potentially, to take further
> advantage of the underlying richness of the Python semantics. Sure,
> maybe the first release would just take all exceptions and map them
> into some text printed to stderr and a non-zero return code, and
> that's all that pip would get. But if someone had an idea for how pip
> could do better than this by, I dunno, encoding some structured
> metadata about the particular exception that occurred and passing this
> back up to pip to do something intelligent with it, they absolutely
> could write the code and submit a PR to pip, without having to write a
> new PEP.
>
>> tl;dr - I think making the case that the layer we define should be a
>> Python protocol rather than a subprocess protocol requires some really
>> strong evidence. We're *not* dealing with the same moving parts that
>> typical Python stuff requires.
>
> I'm very confused and honestly do not understand what you find
> attractive about the subprocess protocol approach. Even your arguments
> above aren't really even trying to be arguments that it's good, just
> arguments that the Python API approach isn't much better. I'm sure
> there is some reason you like it, and you might even have said it but
> I missed it because I disagreed or something :-). But literally the
> only reason I can think of right now for why one would prefer the
> subprocess approach is that it lets one remove 50 lines of "worker
> process" code from pip and move them into the individual build
> backends instead, which I guess is a win if one is focused narrowly on
> pip itself. But surely there is more I'm missing?
>
> (And even this is lines-of-code argument is actually pretty dubious --
> right now your draft PEP is importing-by-reference an entire existing
> codebase (!) for shell variable expansion in command lines, which is
> code that simply doesn't need to exist in the Python API approach. I'd
> be willing to bet that your approach requires more code in pip than
> mine :-).)
>
>>> However, I've now realised that we're not constrained even if we start
>>> with the CLI interface, as there's still a migration path to a Python
>>> API based model:
>>>
>>> Now: documented CLI for invoking build systems
>>> Future: documented Python API for invoking build systems, default
>>> fallback invokes the documented CLI
>>
>> Or we just issue an updated bootstrap schema, and there's no fallback
>> or anything needed.
>
> Oh no! But this totally gives up the most brilliant part of your
> original idea! :-)
>
> In my original draft, I had each hook specified separately in the
> bootstrap file, e.g. (super schematically):
>
>   build-requirements = flit-build-requirements
>   do-wheel-build = flit-do-wheel-build
>   do-editable-build = flit-do-editable build
>
> and you counterproposed that instead there should just be one line like
>
>   build-system = flit-build-system
>
> and this is exactly right, because it means that if some new
> capability is added to the spec (e.g. a new hook -- like
> hypothetically imagine if we ended up deferring the equivalent of
> egg-info or editable-build-mode to v2), then the new capability just
> needs to be implemented in pip and in flit, and then all the projects
> that use flit immediately gain superpowers without anyone having to go
> around and manually change all the bootstrap files in every project
> individually.
>
> But for this to work it's crucial that the pip<->build-system
> interface have some sort of versioning or negotiation beyond the
> bootstrap file's schema version.
>
>>> So the CLI documented in the PEP isn't *necessarily* going to be the
>>> one used by pip to communicate into the build environment - it may be
>>> invoked locally within the build environment.
>>
>> No, it totally will be. Exactly as setup.py is today. Thats
>> deliberate: The *new* thing we're setting out to enable is abstract
>> build systems, not reengineering pip.
>>
>> The future - sure, someone can write a new thing, and the necessary
>> capability we're building in to allow future changes will allow a new
>> PEP to slot in easily and take on that [non trivial and substantial
>> chunk of work]. (For instance, how do you do compiler and build system
>> specific options when you have a CLI to talk to pip with)?
>
> I dunno, that seems pretty easy? My original draft just suggested that
> the build hook would take a dict of string-valued keys, and then we'd
> add some options to pip like "--project-build-option foo=bar" that
> would set entries in that dict, and that's pretty much sufficient to
> get the job done. To enable backcompat you'd also want to map the old
> --install-option and --build-option switches to add entries to some
> well-known keys in that dict. But none of the details here need to be
> specified, because it's up to individual projects/build-systems to
> assign meaning to this stuff and individual build-frontends like pip
> to provide an interface to it -- at the build-frontent/build-backend
> interface layer we just need some way to pass through the blobs.
>
> I admit that this is another case where the Python API approach is
> making things trivial though ;-). If you want to pass arbitrary
> user-specified data through a command-line API, while avoiding things
> like potential namespace collisions between user-defined switches and
> standard-defined switches, then you have to do much more work than
> just say "there's another argument that's a dict".
>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org

-- 
Nathaniel J. Smith -- http://vorpus.org