[Distutils] PEP 438, pip and --allow-external (was: "pip: cdecimal an externally hosted file and may be unreliable" from python-dev)

Donald Stufft donald at stufft.io
Mon May 12 08:47:01 CEST 2014


On May 12, 2014, at 2:21 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 12 May 2014 15:39, Donald Stufft <donald at stufft.io> wrote:
>> On May 12, 2014, at 12:50 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>> 
>>> There are some more notable names in the "unsafe" lists, but a few
>>> spot checks on projects like PyGObject, PyGTK, biopython, dbus-python,
>>> django-piston, ipaddr, matplotlib, and mayavi showed that a number of
>>> them *have* switched to PyPI hosting for recent releases, but have
>>> left older releases as externally hosted. (A few notable names, like
>>> wxPython and Spyder, *did* show up as genuinely externally hosted.
>>> Something that would be nice to be able to do, but isn't really
>>> practical without a server side dependency graph, is to be able to
>>> figure out how many packages have an externally hosted dependency
>>> *somewhere in their dependency chain*, and *how many* other projects
>>> are depending on particular externally hosted projects transitively).
>> 
>> I could maybe do it with a mirror and a throw away VM but I think it’d
>> be a decent chunk of effort.
> 
> It's one of the things metadata 2.0 should eventually enable, but I
> think the numbers you already have are indicative enough to justify
> find a way to kill off the feature.
> 
>>> Regardless, even with those caveats, the numbers are already solid
>>> enough to back up the notion that the only possible reasons to support
>>> enabling verified external hosting support independently of unverified
>>> external hosting are policy and relationship management ones.
>>> Relationship management would just mean providing a deprecation period
>>> before removing the capability, but I want to spend some time
>>> exploring a possible concrete *policy* related rationale for keeping
>>> it.
>>> 
>>> The main legitimate reason I am aware of for wanting to avoid PyPI
>>> hosting is for non-US based individuals and organisations to avoid
>>> having to sign up to the "Any uploads of packages must comply with
>>> United States export controls under the Export Administration
>>> Regulations." requirement that the PSF is obliged to place on uploads
>>> to the PSF controlled US hosted PyPI servers. That rationale certainly
>>> applies in MAL's case, since eGenix is a German company, and I believe
>>> they mostly do business outside the US (for example, their case study
>>> in the Python brochure is for a government project in Ghana).
>> 
>> Yes that is the main reason I can distill from the various threads that
>> have occurred over time.
> 
> So can we agree that's the use case we need to have a solid answer for
> before completely dropping external hosting support from pip?

Yes.

I'm less worried about the specific timeline (as long as it's reasonable) than
I am worried about having **a** timeline and an answer in the intrim that I can
point people towards. "I'm sorry but it's getting fixed and here's the plan" is
so much better to tell people than "I'm sorry".

> 
>> I’m not sure the distinction makes much sense for PyPI/pip. You basically
>> have to trust the authors of the packages you’re installing. If a package
>> author is willing to hijack another package with a custom index they could
>> just as easily do something malicious in a setup.py. Even if we get rid of
>> the setup.py there are still endless ways of attacking someone who is
>> installing your package and they are basically impossible to prevent and
>> are just as bad or worse than that.
> 
> Yeah, that's a good point - there's little or nothing a malicious
> index can do that a malicious setup.py couldn't already do.
> 
>> My reasons are:
>> 
>> * It's only somewhat nicer up front than providing a custom index however it
>>  represents an additional command line flag that users have to learn.
> 
> We also have the option of some day providing a general access
> European hosted index server that omits the US export restriction
> requirement from its upload terms. That's a mechanism pip could enable
> by default without introducing the "multiple single points of failure
> in series" problem for complex dependency stacks.

We'd have to figure this out, but I'm not against trying to sort it out.
Rackspace has European DCs and I think Fastly has the ability to select only
European POPs (if that matters?) so it wouldn't even really require a degraded
performance. There are logistics and other considerations of course, but it's
not in and of itself something that I think would be completely off the table.

We'd of course want to make sure there was demand for it because it'd adding
more work on the Python Infrastructure team (and we'd need buy in there too)
but I don't think it's an outlandish thing.

> 
>> * I hate the idea of a long term --allow-all-verified-external (or any variant
>>  of it). It feels way too much to me like a "unbreak my pip please" flag and
>>  I think that it is how users who need to use it will perceive it. This
>>  will create more animosity and hostility towards the packaging toolchain.
>> 
>>  I went into this on the pip PR, but essentially I see this becoming a turd
>>  that people chuck into their ~/.pip/pip.conf, requirements.txt, environment,
>>  or build scripts. They'll run into a problem where they need it, shove it
>>  into their config and then forget about it until they try to deploy to a
>>  new machine, or service, or whatever and run into that problem again.
> 
> Agreed - it would be better to have a solution that points a way
> towards an eventual "enabled by default" solution, and the multiple
> index server support does indeed seem to better fit that bill.
> 
>> * I don't agree it says to non-US users that they must agree to the US export
>>  rules in order to participate in PyPI at all. They'll still be able to
>>  register their projects with PyPI, provide docs there. They just won't get
>>  as streamlined install experience. They'll have to provide some installation
>>  instructions.
>> 
>>  There is possibly even something we can do to make this more streamlined.
>>  Like perhaps they can register their custom index with PyPI and PyPI can
>>  advise pip of it and if pip finds that advisory pip can report it to the user
>>  and say "foo bar is hosted on a separate repository and in order to install
>>  it you'll need to add "https://example.com/my-cool-packages/" to your index
>>  URLs.
> 
> An "external index URL" metadata setting on PyPI (or even in metadata
> 2.0?) sounds like a reasonable option to me.

Yea I'm not sure of the exact implementation of that, certainly a short term
solution would/could be something added to PyPIi and a longer term one be
adding things to metadata 2.0 (or not, perhaps it should stay a PyPI thing).

> 
>> * We constantly tell people that if you depend on PyPI you need to run a
>>  mirror, however if a file isn't uploaded to PyPI then the user can't rely on
>>  the fact that the file existing on PyPI means they have the right to mirror
>>  and distribute it. This means that we force people who want to isolate
>>  themselves from external dependencies to manually resolve any externally
>>  hosted dependency. Most of them are not lawyers and may or may not have any
>>  idea what all that means or have a good sense if they can do that or not.
>> 
>>  It's true that this problem still exists with an external index, however by
>>  moving to a "stand up your own index" solution it becomes easier for people
>>  to reason about which dependencies they need to figure it out for since there
>>  will be a clear separation of things that came from PyPI vs things that came
>>  from another index.
> 
> This is where I think a PSF managed European hosted index could
> actually be a useful approach - we could still ensure users of their
> freedom to redistribute packages, without requiring uploaders to agree
> to comply with US export restrictions.
> 
> If the PSF's status as a US non-profit still complicates matters, we
> could potentially try to come to an agreement with an entity based in
> Europe without those
> 
>> * Long term I think that both PyPI and pip should disallow external hosting and
>>  require the use of an additional index. However that will require a new PEP
>>  to discuss that. I'm still thinking that through but the more I think about
>>  it, dig into pip's code base, and talk to people, the more convinced I become
>>  that it is the right long term decision.
>> 
>>  That does not mean people will need to upload to PyPI to participate on PyPI
>>  since a large part of what PyPI provides is discover-ability and a central
>>  naming authority.
> 
> Yes, you've persuaded me that enhancing PyPI's ability to act as a
> meta-index server (by pointing to subsidiary index servers on a
> per-package basis) is cleaner than having two independent delegation
> mechanisms that need to be configured differently in pip.
> 
> That change would also bring us closer to the Linux distro model
> (which works at the custom repo level), and correctly identify the
> single points of failure in a dependency chain (when you're not
> running your own local mirror).

\o/ PEP time? I have time this week...


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20140512/59db478c/attachment.sig>


More information about the Distutils-SIG mailing list