[Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

holger krekel holger at merlinux.eu
Sun Mar 10 20:54:05 CET 2013


On Sun, Mar 10, 2013 at 14:29 -0400, Donald Stufft wrote:
> 
> On Mar 10, 2013, at 2:18 PM, holger krekel <holger at merlinux.eu> wrote:
> 
> > On Sun, Mar 10, 2013 at 13:35 -0400, Donald Stufft wrote:
> >> On Mar 10, 2013, at 11:07 AM, holger krekel <holger at merlinux.eu> wrote:
> >>> [...]
> >>> Transitioning to "pypi-cache" mode
> >>> -------------------------------------
> >>> 
> >>> When transitioning from the currently implicit "pypi-ext" mode to
> >>> "pypi-cache" for a given package, a package maintainer should 
> >>> be able to retrieve/verify the historic release files which will 
> >>> be cached from pypi.python.org.  The UI should present this list
> >>> and have the maintainer accept it for completing the transition
> >>> to the "pypi-cache" mode.  Upon future release registration actions,
> >>> pypi.python.org will perform crawling for the homepage/download sites
> >>> and cache release files *before* returning a success return code for
> >>> the release registration.
> >>> [...]
> >> 
> >> Some concerns:
> >> 
> >> 1. We cannot automatically switch people to pypi-cache. We _have_ to get explicit permission from them.
> > 
> > Could you detail how you arrive at this conclusion?
> > (I've seen the claim before but not the underlying reasoning, maybe
> > i just missed it)
> > 
> > There would be prior notifications to the package maintainers.  If they 
> > don't want to have their packages cached at pypi.python.org, they can set
> > the mode to "pypi-only" and leave manual instructions.  I suspect there will
> > be very few people if anyone, objecting to pypi-cache mode.  If that is
> > false we might need to prolong pypi-ext mode some more for them and 
> > eventually switch them to pypi-only when we eventually decide to get
> > rid of external hosting.
> 
> I asked VanL. His statement on re-hosting packages was:
> 
>     "We could do it if we had permission. The tricky part would be getting permission for already-existing packages."
> 
> I'm pretty sure that emailing someone and assuming we have permission if they don't opt-out doesn't count as permission.

Hum, i I saw Jesse Noller saying a few days ago "let them opt out".
But i guess VanL can trump that :)  If that is true we could change the
notification to maintainers of B packages that hosting mode is going to
change to pypi-only, which would loose their release files unless they
opt-in to pypi-cache.  As long as that is a no-brainer for them, we are
not asking for much and can count on most people's good will to not make
other people's installation life harder.

Besides, admins could still set the "pypi-ext" mode if a maintainer can
explain why it's a problem for them to agree to "pypi-cache" or
"pypi-only".  I'd really like to not have too many packages lingering
around in "pypi-ext" mode if it can be avoided.

> > 
> >> 2. The cache mechanism is going to be fragile, and in the long term leaves a window open for security issues.
> > 
> > fragility: not sure it's too bad.  Once the mode is activited release
> > registration ("submit" POST action on "/pypi" http endpoint) will only
> > succeed if according releases can be found through homepage/download.
> > Changing the mode to pypi-cache in the presence of historic release
> > files hosted elsewhere needs a good pypi.python.org UI interaction and
> > may take several tries if neccessary sites cannot be reached.  Nevertheless,
> > this step is potentially fragile [X].
> 
> I see, so pypi-cache would only be triggered once during release creation. Cache makes it sound like we'd continuously monitor the given external urls instead of it actually being a pull based method of getting files.

Right, we need to avoid cache invalidation problems by only allowing
updates at user-chosen point in times (there might also be an explicit 
"update cache" button in case a maintainer pushes a egg/wheel later).  
It's still technically a cache i think but the term "rehost" would 
work as well i guess.

> [...]
> > Back to pypi-cache: it is there to make it super-easy for package
> > maintainers.  There are all kinds of release habits and scripts
> > pushing out things to google/bitbucket/github/other sites.  With
> > "pypi-cache" they don't need to change any of that.  They just need
> > to be fine with pypi.python.org pulling in the packages for caching.
> 
> Yes I understand the goal here. The problem is that there's not really
> a good way to secure this without requiring changes to their workflow. 
> At best they'll have to push information about every file so that PyPI
> is able to verify the files it is downloading, and if we are requiring
> them to push data about those files we might as well require them to
> push the files themselves. 

Is this about protection against package tampering?  If so, I think a
proper solution involves maintainers signing their release files but
this is outside the intended scope of the PEP.

Otherwise, the "re-hosting" process for pypi-cache mode is at least as
secure as currently where all hosts issuing pip/easy_install commands
visit external sites and can thus be MITM-attacked.  For pypi-only
server packages it's safer because no crawling takes place.

In any case, asking people to change their release process is not 
a no-brainer.  The PEP tries to avoid this source of friction.
That being said, i think we both agree to recommend maintainers to
(eventually) go for pypi-only and change their release processes
accordingly.  This PEP is not the end of the story of evolving package
hosting and i'd like to be careful about asking maintainers to change 
how they do things.

> This also has the effect we can provide
> immediate feedback when files do not validate on PyPI.

At release registration or switch-to-pypi-rehost time we could also do
package validation but i am inclined to see this as out of scope
for this PEP which tries to focus on the minimal steps to move 
from pypi-ext to everything-hosted-through-pypi.python.org.

cheers,
holger

> 
> > 
> > We might think about phasing out pypi-cache after some larger time
> > frame so that we eventually only have pypi-only and things are eventually
> > simple and saner.
> > 
> > best,
> > holger
> > 
> > 
> > 
> >> These buttons would be one time and quit. Once your project has been switched to PyPI Only you cannot go back to Legacy mode. All new projects would be already switched to PyPI Only. After some amount of time switch all Projects to PyPI Only but _do not_ re-host their packages as we cannot legally do so without their permission.
> >> 
> >> The above is simpler, still provides people an easy migration path, moves us to remove external hosting, and doesn't entangle us with legal issues.
> >> 
> >> [1] There is still a small window here where someone could MITM PyPI fetching these files, however since it would be a one time and down deal this risk is minimal and is worth it to move to an pypi only solution.
> >> 
> >> -----------------
> >> Donald Stufft
> >> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> >> 
> > 
> > 
> 
> 
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> 




More information about the Catalog-SIG mailing list