[Distutils] [Numpy-discussion] Proposal: stop supporting 'setup.py install'; start requiring 'pip install .' instead

Sat Nov 7 11:33:10 EST 2015

On Sat, Nov 7, 2015 at 3:57 PM, Paul Moore <p.f.moore at gmail.com> wrote:

> On 7 November 2015 at 13:55, Ralf Gommers <ralf.gommers at gmail.com> wrote:
> > On Sat, Nov 7, 2015 at 2:02 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> >>
> >> On 7 November 2015 at 01:26, Chris Barker - NOAA Federal
> >> <chris.barker at noaa.gov> wrote:
> >> > So what IS supposed to be used in the development workflow? The new
> >> > mythical build system?
> >
> > I'd like to point out again that this is not just about development
> > workflow. This is just as much about simply *installing* from a local git
> > repo, or downloaded sources/sdist.
>
> Possibly I'm misunderstanding here.
>

I had an example above of installing into different venvs. Full rebuilds
for that each time are very expensive. And this whole thread is basically
about `pip install .`, not about inplace builds for development.

As another example of why even for a single build/install it's helpful to
just let the build system do what it wants to do instead of first copying
stuff over, here are some timing results. This is for PyWavelets, which
isn't that complicated a build (mostly pure Python, 1 Cython extension):

1. python setup.py install:   40 s
2. pip install . --upgrade --no-deps:   58 s
# OK, (2) is slow due to using shutil, to be fixed to work like (3):
3. python setup.py sdist:  8 s
    pip install dist/PyWavelets0.4.0.dev0+da1c6b4.tar.gz:  41 s
    # so total time for (3) will be 41 + 8 = 49 s
# and a better alternative to (1)
4. python setup.py bdist_wheel:  34 s
    pip install dist/PyWavelets-xxx.whl:   6 s
    # so total time for (3) will be 34 + 6 = 40 s

Not super-scientific, but the conclusion is clear: what pip does is a lot
slower than what for me is the expected behavior. And note that without the
Cython compile, the difference in timing will get even larger.

That expected behavior is:
  a) Just ask the build system to spit out a wheel (without any magic)
  b) Install that wheel (always)

> > The "pip install . should reinstall" discussion in
> > https://github.com/pypa/pip/issues/536 is also pretty much the same
> > argument.
>
> Well, that one is about pip reinstalling if you install from a local
> directory, and not skipping the install if the local directory version
> is the same as the installed version. As I noted there, I'm OK with
> this, it seems reasonable to me to say that if someone has a directory
> of files, they may have updated something but not (yet) bumped the
> version.
>
> The debate over there has gone on to whether we force reinstall for a
> local *file* (wheel or sdist) which I'm less comfortable with. But
> that's is being covered over there.
>
> The discussion *here* is, I thought, about skipping build steps when
> possible because you can reuse build artifacts. That's not "should pip
> do the install?", but rather "*how* should pip do the install?"
> Specifically, to reuse build artifacts it's necessaryto *not* do what
> pip currently does for all (non-editable) installs, which is to
> isolate the build in a temporary directory and do a clean build.
> That's a sensible debate to have, but it's very different from the
> issue you referenced.
>
> IMO, the discussions currently are complex enough that isolating
> independent concerns is crucial if anyone is to keep track. (It
> certainly is for me!)
>

Agreed that the discussions are complex now. But imho they're mostly
complex because the basic principles of what pip should be doing are not
completely clear, at least to me. If it's "build a wheel, install the
wheel" then a lot of things become simpler.

>> Fair question. Unfortunately, the answer is honestly that there's no
> >> simple answer - pip is not a bad option, but it's not its core use
> >> case so there are some rough edges.
> >
> > My impression is that right now pip's core use-case is not "installing",
> but
> > "installing from PyPi (and similar repos". There are a lot of rough edges
> > around installing from anything on your own hard drive.
>
> Not true. The rough edges are around installing things where (a) you
> don't want to rely in the invariant that name and version uniquely
> identify an installation (that's issue 536) and (b) where you don't
> want to do a clean build, because building is complex, slow, or
> otherwise something you want to optimise (that's this discussion).
>
> I routinely download wheels and use them to install. I also sometimes
> download sdists and install from them, although 99.99% of the time, I
> download them, build them into wheels and install them from wheels. It
> *always* works exactly as I'd expect. But if I'm doing development, I
> use -e. That seems to be the problem here, there are rough edges if
> you want a development workflow that doesn't rely on editable
> installs. I think that's what I already said :-)
>

It always works as you expect because you're very familiar with how things
work I suspect. I honestly started working on docs/code to make people use
`pip install .` and immediately ran into 3 issues (start of this thread).
This build caching is #4. And that doesn't even count --upgrade (that was
issue #0).

There are a vast amount of users that are used to `setup.py install`.
They'll be downloading a released/dev version or do a git/hg clone, and run
that `setup.py install` command. If we'll tell them to replace that by `pip
install .`, then at the moment there's a lot of rough edges that they are
going to run into.

Now some of those rough edges are bugs, some are things like "does pip
build from where you run it or in an isolated tmpdir". I'd like to get to
the situation where:
  - the bugs are fixed
  - the behavior/performance is >= `setup.py install`
  - with the difference then being some UI tweaks like by default hiding
the build log

>>
> >> I think it would be good to see if we can ensure pip is useful for
> >> this use case as well, all I was pointing out was that people
> >> shouldn't assume that it "should" work right now, and that changing it
> >> to work might involve some trade-offs that we don't want to make, if
> >> it compromises the core functionality of installing packages.
> >
> > It might be helpful to describe the actual trade-offs then, because as
> far
> > as I can tell no one has actually described how this would either hurt
> > another use-case or make pip internals much more complicated.
>
>
> 2. (For here) Builds are not isolated from what's in the development
> directory. So if you have your sdist definition wrong, what you build
> locally may work, but when you release it it may fail. Obviously that
> can be fixed by proper development and testing practices, but pip is
> designed currently to isolate builds to protect against mistakes like
> this, we'd need to remove that protection for cases where we wanted to
> do in-place builds.
>

Now this is an actual development work feature/choice. "sdist definition
wrong" may help developers which don't test install via sdist in their CI.
It doesn't really help end users directly.

> 3. The logic inside pip for doing builds is already pretty tricky.
> Adding code to sometimes build in place and sometimes in a temporary
> directory is going to make it even more complex. That might not be a
> concern for end users, but it makes maintaining pip harder, and risks
> there being subtle bugs in the logic that could bite end users. If you
> want specifics, I can't give them at the moment, because I don't know
> what the code to do the proposed in-place building would look like.
>
> I hope that helps. It's probably not as specific or explicit as you'd
> like, but to be fair, nor is the proposal.
>

It does help, thanks. I don't think I can make the proposal much more
concrete than "build a wheel, install a wheel (without magic)" though. At
least without starting to implement that proposal.

> What we currently have on the table is "If 'pip (install/wheel) .' is
> supposed to become the standard way to build things, then it should
> probably build in-place by default." For my personal use cases, I
> don't actually agree with any of that, but my use cases are not even
> remotely like those of numpy developers, so I don't want to dismiss
> the requirement. But if it's to go anywhere, it needs to be better
> explained.
>
> Just to be clear, *my* position (for projects simpler than numpy and
> friends) is:
>
> 1. The standard way to install should be "pip install <requirement or
> wheel>".
> 2. The standard way to build should be "pip wheel <sdist or
> directory>". The directory should be a clean checkout of something you
> plan to release, with a unique version number.
> 3. The standard way to develop should be "pip install -e ."
>

Agree with all of those.

> 4. Builds (pip wheel) should always unpack to a temporary location and
> build there. When building from a directory, in effect build a sdist
> and unpack it to the temporary location.
>

Here we seem to disagree. Your only concrete argument for it so far is
aimed at developers, and I think it (a) is an extra step that adds
complexity to the implementation, and (b) is inherently slower.

I hear the message that for things like numpy these rules won't work.
> But I'm completely unclear on why. Sure, builds take ages unless done
> incrementally. That's what pip install -e does, I don't understand why
> that's not acceptable.
>

I hope my replies above make clear why -e isn't too relevant here.

>
> If the discussion needs to go to the next level of detail, maybe that
> applies to the requirements as well as to the objections?
>

Maybe this isn't 100% correct because I'm not that familiar with pip
internals yet, but I'll give it a try:

For `pip install <local_dir>`,:
- Avoid using anything in pip/download.py
- Instead, construct the cmdoptions and pass them to WheelBuilder
- Install the built wheel
- Optionally: store a pip log somewhere so it knows what it did in
<local_path>. Might come in handy for something.

Could be that that adds complexity instead of reduces it, but I don't yet
see it.

> Paul
>
> PS Alternatively, feel free to ignore my comments.

I won't, your detailed reply was quite helpful.

Ralf

> I'm not likely to
> ever have the time to code any of the proposals being discussed here,
> but I won't block other pip developers either doing so or merging
> code, so my comments are not intended as anything more than input from
> someone who knows a bit about how pip is coded, how it's currently
> used, and what issues our users currently encounter. Seriously - I'm
> happy to say my piece and leave it at that if you prefer.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20151107/ea8d9d3b/attachment-0001.html>