From richard at python.org Fri Mar 1 00:21:59 2013 From: richard at python.org (Richard Jones) Date: Fri, 1 Mar 2013 10:21:59 +1100 Subject: [Catalog-sig] remove historic download/homepage links for a project In-Reply-To: References: <512E3588.4020305@egenix.com> <20130227183754.GR9677@merlinux.eu> <512E6361.1030108@inaugust.com> <20130228092835.GX9677@merlinux.eu> <20130228134100.GZ9677@merlinux.eu> <3065EDAA-8BCE-4D5F-A59F-D0D4F2B33B25@mac.com> Message-ID: On 1 March 2013 04:10, Tres Seaver wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 02/28/2013 11:27 AM, Ronald Oussoren wrote: > >> But necessary to have. Or am the only one that accidently released a >> version that had serious bugs? > > Nope. The way to address such a version is to release a new, fixed > version (preferably one with a suitably-PEP-compliant version which > indicates the version being corrected). The only legitimate reason to > yank a release is that you are under legal compulsion to do so (a > takedown notice or equivalent), or you discover that the version released > has been trojaned in some way. You may have listed the only reason *you will allow* but the owner of the package can do whatever they want. You're correct that once the package is "out in the wild" you can't get all those copies back, but they can (for whatever reason they have and no, I'm not going to needlessly speculate) remove it from PyPI. You have no legal or moral right to compel them to do otherwise. Richard From pje at telecommunity.com Fri Mar 1 00:31:17 2013 From: pje at telecommunity.com (PJ Eby) Date: Thu, 28 Feb 2013 18:31:17 -0500 Subject: [Catalog-sig] Deprecate External Links In-Reply-To: <528718A2FA614C0288E562FEED8F85A4@gmail.com> References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512E28CB.9080907@egenix.com> <2C0A235BC980420C8632A75D39953B1A@gmail.com> <512E3588.4020305@egenix.com> <20130227183754.GR9677@merlinux.eu> <512E6361.1030108@inaugust.com> <20130228090034.GV9677@merlinux.eu> <528718A2FA614C0288E562FEED8F85A4@gmail.com> Message-ID: On Thu, Feb 28, 2013 at 5:00 PM, Donald Stufft wrote: > SSL checking on upload should be possible, do you want > a patch? If it uses the 'requests' library, yes, I'll accept one. But I don't want to do any direct implementation of SSL cert checking in setuptools, at least in the short run (next few weeks), because: 1. I don't consider myself qualified as yet to write a correct patch or even verify that a contributed patch is correct/safe, and 2. There is a licensing issue with including the Mozilla root certificate set in setuptools under its current license, and I'm not 100% certain I can *change* the license. (I *could* potentially use a platform-provided cert set, but that's not really an option on Windows unless you have Windows expertise above my paygrade for pulling that stuff out of the registry.) So, by delegating to the requests library, I can bypass both of those issues in the short term. In the longer term (>1 month from now), more integrated solutions may be more feasible. Using "requests" is the best I think I can reasonably achieve by PyCon, but I *will* be publicizing a set of instructions for how to "safely" download setuptools and requests (via https in a browser to prevent MITM attacks), as well as how to configure easy_install for more secure default settings. (And easy_install will always use "requests" if present, unless specifically asked not to with a --no-ssl-verify option.) From donald.stufft at gmail.com Fri Mar 1 00:36:02 2013 From: donald.stufft at gmail.com (Donald Stufft) Date: Thu, 28 Feb 2013 18:36:02 -0500 Subject: [Catalog-sig] Deprecate External Links In-Reply-To: References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512E28CB.9080907@egenix.com> <2C0A235BC980420C8632A75D39953B1A@gmail.com> <512E3588.4020305@egenix.com> <20130227183754.GR9677@merlinux.eu> <512E6361.1030108@inaugust.com> <20130228090034.GV9677@merlinux.eu> <528718A2FA614C0288E562FEED8F85A4@gmail.com> Message-ID: On Thursday, February 28, 2013 at 6:31 PM, PJ Eby wrote: > On Thu, Feb 28, 2013 at 5:00 PM, Donald Stufft wrote: > > SSL checking on upload should be possible, do you want > > a patch? > > > > > If it uses the 'requests' library, yes, I'll accept one. But I don't > want to do any direct implementation of SSL cert checking in > setuptools, at least in the short run (next few weeks), because: > > Does setuptools support Python3? (or do you want it to?) > > 1. I don't consider myself qualified as yet to write a correct patch > or even verify that a contributed patch is correct/safe, and > > There's existing implementations out there that add cert checking to urllib, it's fairly short. > > 2. There is a licensing issue with including the Mozilla root > certificate set in setuptools under its current license, and I'm not > 100% certain I can *change* the license. (I *could* potentially use a > platform-provided cert set, but that's not really an option on Windows > unless you have Windows expertise above my paygrade for pulling that > stuff out of the registry.) > > Shouldn't be any issue, the PSF license is very liberal and the MPL works on a per file (as opposed to a per project) basis. So if you include the cert bundle that particular file is MPL licensed while setuptools itself remains PSF. > > So, by delegating to the requests library, I can bypass both of those > issues in the short term. In the longer term (>1 month from now), > more integrated solutions may be more feasible. Using "requests" is > the best I think I can reasonably achieve by PyCon, but I *will* be > publicizing a set of instructions for how to "safely" download > setuptools and requests (via https in a browser to prevent MITM > attacks), as well as how to configure easy_install for more secure > default settings. (And easy_install will always use "requests" if > present, unless specifically asked not to with a --no-ssl-verify > option.) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald.stufft at gmail.com Fri Mar 1 01:13:00 2013 From: donald.stufft at gmail.com (Donald Stufft) Date: Thu, 28 Feb 2013 19:13:00 -0500 Subject: [Catalog-sig] Pypi cdn for hosted packages In-Reply-To: <73b43f9b-ed6d-40aa-ad17-40e1992dd295@email.android.com> References: <23CB2462-E646-4F51-B3BF-110FA3FB2F21@gmail.com> <73b43f9b-ed6d-40aa-ad17-40e1992dd295@email.android.com> Message-ID: On Thursday, February 28, 2013 at 10:13 AM, Noah Kantrowitz wrote: > Reponding from my phone quickly before this gets any further, will write more later. Plan is to have pypi move package download links to a new hostname (probably pypi-download.python.org (http://pypi-download.python.org)) and then throw that behind fastly. This sidesteps 100% of issues with dynamic pages, etc. Simple index with be handled secondarily. Just an aside, can we use a pythonhosted.org domain, like https://packages.pythonhosted.org/ or something? That will prevent gifar like attacks where someone finds a way to create a file that both looks like a valid file to PyPI, but that browsers will interpret as something executable. Or rather it prevents it from being able to attack *.python.org. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Mar 1 02:31:05 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 28 Feb 2013 20:31:05 -0500 Subject: [Catalog-sig] PyPI terms In-Reply-To: References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512E28CB.9080907@egenix.com> <512E422C.3070001@egenix.com> <4A372726-1248-4E43-AC00-863DA153D42C@coderanger.net> <512F2FE9.9080001@egenix.com> <3D8F33FF-A4FA-45B9-8AF5-97DA91876C1E@coderanger.net> <512F9E8A.1010707@egenix.com> Message-ID: On 2/28/2013 1:19 PM, Noah Kantrowitz wrote: > Because I happen to have YouTube open anyway: > > """ For clarity, you retain all of your ownership rights in your > Content. However, by submitting Content to YouTube, you hereby grant > YouTube a worldwide, non-exclusive, royalty-free, sublicenseable and > transferable license to use, reproduce, distribute, prepare > derivative works of, display, and perform the Content in connection > with the Service and YouTube's (and its successors' and affiliates') > business, including without limitation for promoting and > redistributing part or all of the Service (and derivative works > thereof) in any media formats and through any media channels. You > also hereby grant each user of the Service a non-exclusive license to > access your Content through the Service, and to use, reproduce, > distribute, display and perform such Content as permitted through the > functionality of the Service and under these Terms of Service. The > above licenses granted by you in video Content you submit to the > Service terminate within a commercially reasonable time after you > remove or delete your videos from the Service. You understand and > agree, however, that YouTube may retain, but not display, distribute, > or perform, server copies of your videos that have been removed or > deleted. The above licenses granted by you in user comments you > submit are perpetual and irrevocable. """ > > Slightly different wording, Noah, I understand that you desperately do not want to admit that the PSF requirement for uploading to it servers is unusually broad, because you do not want to admit that rational people might have a reason to not upload, but there it is. 1. The uploader only authorizes distribution via the YouTube infrastructure. Indeed, Google want that limitation because it wants to be the one that monetizes distribution. So it only streams videos (free ones, anyway) and does *not* download. Anyone who subverts this and captures the stream as a download has no rights to it. 2. The uploader can terminate the license with Google. Because of #1, such termination stops anyone from legal distribution. Note: Flickr gives uploaders the choice of whether images can be downloaded or only embedded in a flickr web page. It also lets uploaders set the license that applies to flickr users. And it allows deletion of images. > only the license to comments is irrevocable, Irrelevant to this discussion. > for videos they just promise to stop distributing This is the important point. > but not actually remove your content. This is a mostly irrelevant practical issue. Finding and scrubbing every backup copy is difficult and expensive, especially for disk-image backups or serial tape media (if indeed they still use such) or backups stuck down in a deep salt mine. Any repository that does backups has to have this proviso. (I am sure, for instance, that Flickr does now.) My take on the current license is this: the original upload license was rather minimal. The lawyer decided it was insufficient. Rather that craft a broader license with the absolute minimum rights grant necessary, the lawyer took the easy, quick, and cheap-for-psf route of a maximal rights grant. That is okay with me as long as it is not mis-represented and as long as people do not try to bludgeon me or anyone else in signing something we do not agree to. Note: when I contribute text and code to the CPython repository, I also give up all control. I know and accept that, and even want that, because it also means that I can re-write *other* people's text and code. But people may reasonably want to keep more control over their independent sole-author work. -- Terry Jan Reedy From tseaver at palladion.com Fri Mar 1 04:08:34 2013 From: tseaver at palladion.com (Tres Seaver) Date: Thu, 28 Feb 2013 22:08:34 -0500 Subject: [Catalog-sig] remove historic download/homepage links for a project In-Reply-To: References: <512E3588.4020305@egenix.com> <20130227183754.GR9677@merlinux.eu> <512E6361.1030108@inaugust.com> <20130228092835.GX9677@merlinux.eu> <20130228134100.GZ9677@merlinux.eu> <3065EDAA-8BCE-4D5F-A59F-D0D4F2B33B25@mac.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02/28/2013 06:21 PM, Richard Jones wrote: > On 1 March 2013 04:10, Tres Seaver wrote: >> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >> >> On 02/28/2013 11:27 AM, Ronald Oussoren wrote: >> >>> But necessary to have. Or am the only one that accidently released >>> a version that had serious bugs? >> >> Nope. The way to address such a version is to release a new, fixed >> version (preferably one with a suitably-PEP-compliant version which >> indicates the version being corrected). The only legitimate reason >> to yank a release is that you are under legal compulsion to do so >> (a takedown notice or equivalent), or you discover that the version >> released has been trojaned in some way. > > You may have listed the only reason *you will allow* but the owner of > the package can do whatever they want. You're correct that once the > package is "out in the wild" you can't get all those copies back, but > they can (for whatever reason they have and no, I'm not going to > needlessly speculate) remove it from PyPI. You have no legal or moral > right to compel them to do otherwise. I wasn't claiming any right: I was arguing that anybody who shares software with the community does the community a disservice by removing a release because it "has serious bugs." Brown-bag releases happen: ab open source community repairs the damage from them by making new releases, not by covering them up. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iEYEARECAAYFAlEwG7IACgkQ+gerLs4ltQ6RCACggZ38+vBTCXGlnwtm/mrmvkCp 370An1S6hQJkmJBVFQ5dkO+XeElkUPuj =zjAd -----END PGP SIGNATURE----- From regebro at gmail.com Fri Mar 1 04:37:14 2013 From: regebro at gmail.com (Lennart Regebro) Date: Fri, 1 Mar 2013 04:37:14 +0100 Subject: [Catalog-sig] Deprecate External Links In-Reply-To: <20130228195242.GA9677@merlinux.eu> References: <20130227183754.GR9677@merlinux.eu> <512E6361.1030108@inaugust.com> <20130227201642.GT9677@merlinux.eu> <674B990052E24AB58FF9614CCD7A9DC2@gmail.com> <20130228195242.GA9677@merlinux.eu> Message-ID: On Thu, Feb 28, 2013 at 8:52 PM, holger krekel wrote: > There are also packages which have some (older) release files on pypi > and newer ones outside (e.g. "lockfile" with 78256 downloads from > code.google.com). You didn't include such in your 2651 emails, or did you? No, I didn't, I assumed they would be quite few. Possibly a better algorithm is to check if the last release has files on PyPI. //Lennart From ronaldoussoren at mac.com Fri Mar 1 08:09:52 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 1 Mar 2013 08:09:52 +0100 Subject: [Catalog-sig] remove historic download/homepage links for a project In-Reply-To: References: <512E3588.4020305@egenix.com> <20130227183754.GR9677@merlinux.eu> <512E6361.1030108@inaugust.com> <20130228092835.GX9677@merlinux.eu> <20130228134100.GZ9677@merlinux.eu> <3065EDAA-8BCE-4D5F-A59F-D0D4F2B33B25@mac.com> Message-ID: On 1 Mar, 2013, at 4:08, Tres Seaver wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 02/28/2013 06:21 PM, Richard Jones wrote: >> On 1 March 2013 04:10, Tres Seaver wrote: >>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >>> >>> On 02/28/2013 11:27 AM, Ronald Oussoren wrote: >>> >>>> But necessary to have. Or am the only one that accidently released >>>> a version that had serious bugs? >>> >>> Nope. The way to address such a version is to release a new, fixed >>> version (preferably one with a suitably-PEP-compliant version which >>> indicates the version being corrected). The only legitimate reason >>> to yank a release is that you are under legal compulsion to do so >>> (a takedown notice or equivalent), or you discover that the version >>> released has been trojaned in some way. >> >> You may have listed the only reason *you will allow* but the owner of >> the package can do whatever they want. You're correct that once the >> package is "out in the wild" you can't get all those copies back, but >> they can (for whatever reason they have and no, I'm not going to >> needlessly speculate) remove it from PyPI. You have no legal or moral >> right to compel them to do otherwise. > > I wasn't claiming any right: I was arguing that anybody who shares > software with the community does the community a disservice by removing a > release because it "has serious bugs." Brown-bag releases happen: ab > open source community repairs the damage from them by making new > releases, not by covering them up. I luckily haven't run into this with software I release on PyPI yet, but sometimes pulling back an update while working on a fix is the responsible thing to do. You must be living in some other community than I do, I usually get to fix my own bugs. Ronald > > > Tres. > - -- > =================================================================== > Tres Seaver +1 540-429-0999 tseaver at palladion.com > Palladion Software "Excellence by Design" http://palladion.com > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with undefined - http://www.enigmail.net/ > > iEYEARECAAYFAlEwG7IACgkQ+gerLs4ltQ6RCACggZ38+vBTCXGlnwtm/mrmvkCp > 370An1S6hQJkmJBVFQ5dkO+XeElkUPuj > =zjAd > -----END PGP SIGNATURE----- > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig From regebro at gmail.com Fri Mar 1 08:35:23 2013 From: regebro at gmail.com (Lennart Regebro) Date: Fri, 1 Mar 2013 08:35:23 +0100 Subject: [Catalog-sig] remove historic download/homepage links for a project In-Reply-To: References: <512E3588.4020305@egenix.com> <20130227183754.GR9677@merlinux.eu> <512E6361.1030108@inaugust.com> <20130228092835.GX9677@merlinux.eu> <20130228134100.GZ9677@merlinux.eu> <3065EDAA-8BCE-4D5F-A59F-D0D4F2B33B25@mac.com> Message-ID: On Fri, Mar 1, 2013 at 8:09 AM, Ronald Oussoren wrote: > I luckily haven't run into this with software I release on PyPI yet, but sometimes > pulling back an update while working on a fix is the responsible thing to do. The the bug leads to data loss or security holes I agree. //Lennart From reinout at vanrees.org Fri Mar 1 10:02:46 2013 From: reinout at vanrees.org (Reinout van Rees) Date: Fri, 01 Mar 2013 10:02:46 +0100 Subject: [Catalog-sig] Deprecate External Links In-Reply-To: <20130228200848.GB9677@merlinux.eu> References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> <20130228200848.GB9677@merlinux.eu> Message-ID: On 28-02-13 21:08, holger krekel wrote: >> I have seen that position in this discussion ("I have to upload 120 >> >files per release, so I won't do that", for instance). > haven't seen that. Marc-Andre Lemburg said this, which I took to mean 120 uploads per release: """ However, taking our egenix-mx-base package as example, we have 120 distribution files for every single release. Uploading those to PyPI would not only take long, but also ... """ Reinout -- Reinout van Rees http://reinout.vanrees.org/ reinout at vanrees.org http://www.nelen-schuurmans.nl/ "If you're not sure what to do, make something. -- Paul Graham" From holger at merlinux.eu Fri Mar 1 10:20:09 2013 From: holger at merlinux.eu (holger krekel) Date: Fri, 1 Mar 2013 09:20:09 +0000 Subject: [Catalog-sig] Deprecate External Links In-Reply-To: References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> <20130228200848.GB9677@merlinux.eu> Message-ID: <20130301092009.GD9677@merlinux.eu> On Fri, Mar 01, 2013 at 10:02 +0100, Reinout van Rees wrote: > On 28-02-13 21:08, holger krekel wrote: > >>I have seen that position in this discussion ("I have to upload 120 > >>>files per release, so I won't do that", for instance). > > >haven't seen that. > > Marc-Andre Lemburg said this, which I took to mean 120 uploads per release: > > """ > However, taking our egenix-mx-base package as example, we have > 120 distribution files for every single release. Uploading those > to PyPI would not only take long, but also ... > """ Ah ok, thanks. Didn't interpret Marc-Andre's post as claiming that downloads/homepage crawling is a good idea, though. Just that there has been reasons not to upload things which need to be addressed, especially the need for enough storage space. best, holger > > > Reinout > > -- > Reinout van Rees http://reinout.vanrees.org/ > reinout at vanrees.org http://www.nelen-schuurmans.nl/ > "If you're not sure what to do, make something. -- Paul Graham" > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From mal at egenix.com Fri Mar 1 10:24:53 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 01 Mar 2013 10:24:53 +0100 Subject: [Catalog-sig] Deprecate External Links In-Reply-To: References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> <20130228200848.GB9677@merlinux.eu> Message-ID: <513073E5.20900@egenix.com> On 01.03.2013 10:02, Reinout van Rees wrote: > On 28-02-13 21:08, holger krekel wrote: >>> I have seen that position in this discussion ("I have to upload 120 >>> >files per release, so I won't do that", for instance). > >> haven't seen that. > > Marc-Andre Lemburg said this, which I took to mean 120 uploads per release: > > """ > However, taking our egenix-mx-base package as example, we have > 120 distribution files for every single release. Uploading those > to PyPI would not only take long, but also ... > """ Correct, with a total of over 100MB per release. However, the above quote is slightly incorrect: I did not say "I won't do that", just that there are issues with doing this: * It currently takes too long uploading that many files to PyPI. This causes a problem, since in order to start the upload, we have to register the release on PyPI, which tools will then immediately find. However, during the upload time, they won't necessarily find the right files to download and then fail. The proposed pull mechanism (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) would work around this problem: tools would simply go to our servers in case they can't find the files on PyPI. * PyPI doesn't allow us to upload two egg files with the same name: we have to provide egg files for UCS2 Python builds and UCS4 Python builds, since easy_install/setuptools/pip don't differentiate between the two variants. This is the main reason why we're hosting our own PyPI-style indexes, one for UCS2 and the other for UCS4 builds: https://downloads.egenix.com/python/index/ucs2/ https://downloads.egenix.com/python/index/ucs4/ * I'm not sure whether we want to import our crypto packages to the US, so for a subset of the files, we'd probably continue to use our servers in Germany. Again, with the above proposal, this shouldn't be a problem. * Ihe PyPI terms are a bummer for us, but this can be fixed, I guess. If we can resolve the issues, we'd have no problem having the files mirrored on PyPI. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 01 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From holger at merlinux.eu Fri Mar 1 10:46:55 2013 From: holger at merlinux.eu (holger krekel) Date: Fri, 1 Mar 2013 09:46:55 +0000 Subject: [Catalog-sig] Deprecate External Links In-Reply-To: <513073E5.20900@egenix.com> References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> <20130228200848.GB9677@merlinux.eu> <513073E5.20900@egenix.com> Message-ID: <20130301094655.GE9677@merlinux.eu> On Fri, Mar 01, 2013 at 10:24 +0100, M.-A. Lemburg wrote: > On 01.03.2013 10:02, Reinout van Rees wrote: > > On 28-02-13 21:08, holger krekel wrote: > >>> I have seen that position in this discussion ("I have to upload 120 > >>> >files per release, so I won't do that", for instance). > > > >> haven't seen that. > > > > Marc-Andre Lemburg said this, which I took to mean 120 uploads per release: > > > > """ > > However, taking our egenix-mx-base package as example, we have > > 120 distribution files for every single release. Uploading those > > to PyPI would not only take long, but also ... > > """ > > Correct, with a total of over 100MB per release. However, the above > quote is slightly incorrect: I did not say "I won't do that", just > that there are issues with doing this: > > * It currently takes too long uploading that many files to > PyPI. This causes a problem, since in order to start the upload, > we have to register the release on PyPI, which tools will then > immediately find. However, during the upload time, they won't > necessarily find the right files to download and then fail. You can actually skip the register and directly upload, it will create release metadata on the fly. Not sure if it's complete but you can then do a "register" to update it if needed. best, holger > The proposed pull mechanism (see > http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) > would work around this problem: tools would simply go to > our servers in case they can't find the files on PyPI. > > * PyPI doesn't allow us to upload two egg files with the same > name: we have to provide egg files for UCS2 Python builds and > UCS4 Python builds, since easy_install/setuptools/pip don't > differentiate between the two variants. This is the main > reason why we're hosting our own PyPI-style indexes, one for > UCS2 and the other for UCS4 builds: > https://downloads.egenix.com/python/index/ucs2/ > https://downloads.egenix.com/python/index/ucs4/ > > * I'm not sure whether we want to import our crypto packages > to the US, so for a subset of the files, we'd probably > continue to use our servers in Germany. > > Again, with the above proposal, this shouldn't be a problem. > > * Ihe PyPI terms are a bummer for us, but this can be fixed, > I guess. > > If we can resolve the issues, we'd have no problem having the > files mirrored on PyPI. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 01 2013) > >>> Python Projects, Consulting and Support ... http://www.egenix.com/ > >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From richard at python.org Fri Mar 1 10:53:11 2013 From: richard at python.org (Richard Jones) Date: Fri, 1 Mar 2013 20:53:11 +1100 Subject: [Catalog-sig] Deprecate External Links In-Reply-To: <513073E5.20900@egenix.com> References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> <20130228200848.GB9677@merlinux.eu> <513073E5.20900@egenix.com> Message-ID: On 1 March 2013 20:24, M.-A. Lemburg wrote: > * PyPI doesn't allow us to upload two egg files with the same > name: we have to provide egg files for UCS2 Python builds and > UCS4 Python builds, since easy_install/setuptools/pip don't > differentiate between the two variants. This is the main > reason why we're hosting our own PyPI-style indexes, one for > UCS2 and the other for UCS4 builds: > https://downloads.egenix.com/python/index/ucs2/ > https://downloads.egenix.com/python/index/ucs4/ Hm. that's a tricky one. I've assumed that the filename encodes all of the relevant build information. Perhaps that should be addressed (otherwise pity the poor user who downloads one or the other incorrectly and then runs into issues that are probably quite perplexing.) Richard From holger at merlinux.eu Fri Mar 1 11:19:56 2013 From: holger at merlinux.eu (holger krekel) Date: Fri, 1 Mar 2013 10:19:56 +0000 Subject: [Catalog-sig] homepage/download metadata cleaning Message-ID: <20130301101956.GH9677@merlinux.eu> Hi Richard, all, somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py" script which takes a project name as an argument and then goes to pypi.python.org and removes all homepage/download metadata entries for this project. This sanitizes/speeds up installation because pip/easy_install don't need to crawl them anymore. I just did this for three of my projects, (pytest, tox and py) and it seems to work fine. Now before i release this as a tool, i wonder: Is it a good idea to remove download/homepage entries? Is there any current machine use (other than the dreaded crawling) for the homepage/download_url per-release metadata fields? For humans the homepage link is nicely discoverable if the long-description doesn't mention it prominently. But i think there also is a "project url" or "bugtrack url" for a project so maybe those could be used to reference these important pages? (i am a bit confused on the exact meaning of those urls, btw). Should we maybe stop advertising "homepage" and "download_url" and instead see to extend project-url/bugtrackurl to be used and shown nicely? The latter are independent of releases which i think makes sense - what use are old probably unreachable/borked homepages anyway. And it's also not too bad having to go once to pypi.python.org to set it, usually it seldomly changes. best, holger From mal at egenix.com Fri Mar 1 12:04:24 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 01 Mar 2013 12:04:24 +0100 Subject: [Catalog-sig] homepage/download metadata cleaning In-Reply-To: <20130301101956.GH9677@merlinux.eu> References: <20130301101956.GH9677@merlinux.eu> Message-ID: <51308B38.9030709@egenix.com> On 01.03.2013 11:19, holger krekel wrote: > Hi Richard, all, > > somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py" > script which takes a project name as an argument and then goes to > pypi.python.org and removes all homepage/download metadata entries for > this project. This sanitizes/speeds up installation because > pip/easy_install don't need to crawl them anymore. I just did this for > three of my projects, (pytest, tox and py) and it seems to work fine. Does it also cleanup the links that PyPI adds to the /simple/ by parsing the project description for links ? I think those are far nastier than the homepage and download links, which can be put to some good use to limit the external lookups (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) See e.g. https://pypi.python.org/simple/zc.buildout/ for a good example of the mess this generates... even mailto links get listed and "file:///" links open up the installers for all kinds of nasty things (unless they explicitly protect against following these). > Now before i release this as a tool, i wonder: Is it a good idea to remove > download/homepage entries? Is there any current machine use (other than > the dreaded crawling) for the homepage/download_url per-release metadata > fields? > > For humans the homepage link is nicely discoverable if the long-description > doesn't mention it prominently. But i think there also is a "project url" > or "bugtrack url" for a project so maybe those could be used to reference > these important pages? (i am a bit confused on the exact meaning of those > urls, btw). > > Should we maybe stop advertising "homepage" and "download_url" > and instead see to extend project-url/bugtrackurl to be used > and shown nicely? The latter are independent of releases which i think > makes sense - what use are old probably unreachable/borked homepages > anyway. And it's also not too bad having to go once to pypi.python.org > to set it, usually it seldomly changes. I think it would be better to differentiate between showing the fields on the project pages, where they provide useful resources for people, and their use on the /simple/ index pages which are meant for programs to parse. IMO, the homepage and download links on the project pages are indeed very useful for people. On the /simple/ index a homepage link is probably not all that useful (provided a download link is set). The download links serve the purpose of directing tools to the right location, so those do belong on the /simple/ index listings. I'd completely remove the links parsed from the descriptions, since those don't really provide a good basis for crawling (the description is meant for humans to parse, not programs). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 01 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From donald.stufft at gmail.com Fri Mar 1 12:09:54 2013 From: donald.stufft at gmail.com (Donald Stufft) Date: Fri, 1 Mar 2013 06:09:54 -0500 Subject: [Catalog-sig] homepage/download metadata cleaning In-Reply-To: <51308B38.9030709@egenix.com> References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com> Message-ID: <71AA0F5ADB4E4C33BBB37833733526A0@gmail.com> On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote: > On 01.03.2013 11:19, holger krekel wrote: > > Hi Richard, all, > > > > somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py" > > script which takes a project name as an argument and then goes to > > pypi.python.org (http://pypi.python.org) and removes all homepage/download metadata entries for > > this project. This sanitizes/speeds up installation because > > pip/easy_install don't need to crawl them anymore. I just did this for > > three of my projects, (pytest, tox and py) and it seems to work fine. > > > > > Does it also cleanup the links that PyPI adds to the /simple/ by > parsing the project description for links ? > > I think those are far nastier than the homepage and download links, > which can be put to some good use to limit the external lookups > (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) > > See e.g. https://pypi.python.org/simple/zc.buildout/ > for a good example of the mess this generates... even mailto links > get listed and "file:///" links open up the installers for all > kinds of nasty things (unless they explicitly protect against > following these). > > pip at least, and I assume the other tools don't spider those links, but they do consider them for download (e.g. if the link looks installable it will be a candidate for installing, but it won't fetch it, and look for more links like it will donwnload_url/home_page). I believe that's the way it's structured atm. > > > Now before i release this as a tool, i wonder: Is it a good idea to remove > > download/homepage entries? Is there any current machine use (other than > > the dreaded crawling) for the homepage/download_url per-release metadata > > fields? > > > > For humans the homepage link is nicely discoverable if the long-description > > doesn't mention it prominently. But i think there also is a "project url" > > or "bugtrack url" for a project so maybe those could be used to reference > > these important pages? (i am a bit confused on the exact meaning of those > > urls, btw). > > > > Should we maybe stop advertising "homepage" and "download_url" > > and instead see to extend project-url/bugtrackurl to be used > > and shown nicely? The latter are independent of releases which i think > > makes sense - what use are old probably unreachable/borked homepages > > anyway. And it's also not too bad having to go once to pypi.python.org (http://pypi.python.org) > > to set it, usually it seldomly changes. > > > > > I think it would be better to differentiate between showing the > fields on the project pages, where they provide useful resources > for people, and their use on the /simple/ index pages which are > meant for programs to parse. > > IMO, the homepage and download links on the project pages are > indeed very useful for people. On the /simple/ index a homepage > link is probably not all that useful (provided a download link > is set). The download links serve the purpose of directing > tools to the right location, so those do belong on the /simple/ > index listings. I'd completely remove the links parsed from > the descriptions, since those don't really provide a good > basis for crawling (the description is meant for humans to > parse, not programs). > > -- > Marc-Andre Lemburg > eGenix.com (http://eGenix.com) > > Professional Python Services directly from the Source (#1, Mar 01 2013) > > > > Python Projects, Consulting and Support ... http://www.egenix.com/ > > > > mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ > > > > mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > > > > > > > > > > > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com (http://eGenix.com) Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org (mailto:Catalog-SIG at python.org) > http://mail.python.org/mailman/listinfo/catalog-sig > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald.stufft at gmail.com Fri Mar 1 12:10:30 2013 From: donald.stufft at gmail.com (Donald Stufft) Date: Fri, 1 Mar 2013 06:10:30 -0500 Subject: [Catalog-sig] homepage/download metadata cleaning In-Reply-To: <51308B38.9030709@egenix.com> References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com> Message-ID: On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote: > On 01.03.2013 11:19, holger krekel wrote: > > Hi Richard, all, > > > > somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py" > > script which takes a project name as an argument and then goes to > > pypi.python.org (http://pypi.python.org) and removes all homepage/download metadata entries for > > this project. This sanitizes/speeds up installation because > > pip/easy_install don't need to crawl them anymore. I just did this for > > three of my projects, (pytest, tox and py) and it seems to work fine. > > > > > Does it also cleanup the links that PyPI adds to the /simple/ by > parsing the project description for links ? > > I think those are far nastier than the homepage and download links, > which can be put to some good use to limit the external lookups > (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) > > See e.g. https://pypi.python.org/simple/zc.buildout/ > for a good example of the mess this generates... even mailto links > get listed and "file:///" links open up the installers for all > kinds of nasty things (unless they explicitly protect against > following these). > > > Now before i release this as a tool, i wonder: Is it a good idea to remove > > download/homepage entries? Is there any current machine use (other than > > the dreaded crawling) for the homepage/download_url per-release metadata > > fields? > > > > For humans the homepage link is nicely discoverable if the long-description > > doesn't mention it prominently. But i think there also is a "project url" > > or "bugtrack url" for a project so maybe those could be used to reference > > these important pages? (i am a bit confused on the exact meaning of those > > urls, btw). > > > > Should we maybe stop advertising "homepage" and "download_url" > > and instead see to extend project-url/bugtrackurl to be used > > and shown nicely? The latter are independent of releases which i think > > makes sense - what use are old probably unreachable/borked homepages > > anyway. And it's also not too bad having to go once to pypi.python.org (http://pypi.python.org) > > to set it, usually it seldomly changes. > > > > > I think it would be better to differentiate between showing the > fields on the project pages, where they provide useful resources > for people, and their use on the /simple/ index pages which are > meant for programs to parse. > > IMO, the homepage and download links on the project pages are > indeed very useful for people. On the /simple/ index a homepage > link is probably not all that useful (provided a download link > is set). The download links serve the purpose of directing > tools to the right location, so those do belong on the /simple/ > index listings. I'd completely remove the links parsed from > the descriptions, since those don't really provide a good > basis for crawling (the description is meant for humans to > parse, not programs). > > I'd prefer this to eventually get replaced by the project-url metadata but that's not available yet and at the moment are useful. > > -- > Marc-Andre Lemburg > eGenix.com (http://eGenix.com) > > Professional Python Services directly from the Source (#1, Mar 01 2013) > > > > Python Projects, Consulting and Support ... http://www.egenix.com/ > > > > mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ > > > > mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > > > > > > > > > > > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com (http://eGenix.com) Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org (mailto:Catalog-SIG at python.org) > http://mail.python.org/mailman/listinfo/catalog-sig > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From holger at merlinux.eu Fri Mar 1 12:17:07 2013 From: holger at merlinux.eu (holger krekel) Date: Fri, 1 Mar 2013 11:17:07 +0000 Subject: [Catalog-sig] homepage/download metadata cleaning In-Reply-To: <71AA0F5ADB4E4C33BBB37833733526A0@gmail.com> References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com> <71AA0F5ADB4E4C33BBB37833733526A0@gmail.com> Message-ID: <20130301111707.GI9677@merlinux.eu> On Fri, Mar 01, 2013 at 06:09 -0500, Donald Stufft wrote: > On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote: > > On 01.03.2013 11:19, holger krekel wrote: > > > Hi Richard, all, > > > > > > somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py" > > > script which takes a project name as an argument and then goes to > > > pypi.python.org (http://pypi.python.org) and removes all homepage/download metadata entries for > > > this project. This sanitizes/speeds up installation because > > > pip/easy_install don't need to crawl them anymore. I just did this for > > > three of my projects, (pytest, tox and py) and it seems to work fine. > > > > > > > > > Does it also cleanup the links that PyPI adds to the /simple/ by > > parsing the project description for links ? > > > > I think those are far nastier than the homepage and download links, > > which can be put to some good use to limit the external lookups > > (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) > > > > See e.g. https://pypi.python.org/simple/zc.buildout/ > > for a good example of the mess this generates... even mailto links > > get listed and "file:///" links open up the installers for all > > kinds of nasty things (unless they explicitly protect against > > following these). > > > > > > pip at least, and I assume the other tools don't spider those links, but > they do consider them for download (e.g. if the link looks installable > it will be a candidate for installing, but it won't fetch it, and look for > more links like it will donwnload_url/home_page). > > I believe that's the way it's structured atm. That's right. Even though the long-description extracted links look ugly on a simple/PKGNAME page, neither pip nor easy_install do anything with them except if the "href" ends in "#egg=PKGNAME-" in which case they are taken as pointing to a development tarball (e.g. at github or bitbucket). ASFAIK a link like "PKGNAME-VER.tar.gz" will not be treated as an installation candidate, just the "#egg=PKGNAME" one. best, holger > > > > > Now before i release this as a tool, i wonder: Is it a good idea to remove > > > download/homepage entries? Is there any current machine use (other than > > > the dreaded crawling) for the homepage/download_url per-release metadata > > > fields? > > > > > > For humans the homepage link is nicely discoverable if the long-description > > > doesn't mention it prominently. But i think there also is a "project url" > > > or "bugtrack url" for a project so maybe those could be used to reference > > > these important pages? (i am a bit confused on the exact meaning of those > > > urls, btw). > > > > > > Should we maybe stop advertising "homepage" and "download_url" > > > and instead see to extend project-url/bugtrackurl to be used > > > and shown nicely? The latter are independent of releases which i think > > > makes sense - what use are old probably unreachable/borked homepages > > > anyway. And it's also not too bad having to go once to pypi.python.org (http://pypi.python.org) > > > to set it, usually it seldomly changes. > > > > > > > > > I think it would be better to differentiate between showing the > > fields on the project pages, where they provide useful resources > > for people, and their use on the /simple/ index pages which are > > meant for programs to parse. > > > > IMO, the homepage and download links on the project pages are > > indeed very useful for people. On the /simple/ index a homepage > > link is probably not all that useful (provided a download link > > is set). The download links serve the purpose of directing > > tools to the right location, so those do belong on the /simple/ > > index listings. I'd completely remove the links parsed from > > the descriptions, since those don't really provide a good > > basis for crawling (the description is meant for humans to > > parse, not programs). > > > > -- > > Marc-Andre Lemburg > > eGenix.com (http://eGenix.com) > > > > Professional Python Services directly from the Source (#1, Mar 01 2013) > > > > > Python Projects, Consulting and Support ... http://www.egenix.com/ > > > > > mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ > > > > > mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > > > > > > > > > > > > > > > > ________________________________________________________________________ > > > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > > > eGenix.com (http://eGenix.com) Software, Skills and Services GmbH Pastor-Loeh-Str.48 > > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > > Registered at Amtsgericht Duesseldorf: HRB 46611 > > http://www.egenix.com/company/contact/ > > _______________________________________________ > > Catalog-SIG mailing list > > Catalog-SIG at python.org (mailto:Catalog-SIG at python.org) > > http://mail.python.org/mailman/listinfo/catalog-sig > > > > > > From jnoller at gmail.com Fri Mar 1 12:28:00 2013 From: jnoller at gmail.com (Jesse Noller) Date: Fri, 1 Mar 2013 06:28:00 -0500 Subject: [Catalog-sig] PyPI terms In-Reply-To: References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512E28CB.9080907@egenix.com> <512E422C.3070001@egenix.com> <4A372726-1248-4E43-AC00-863DA153D42C@coderanger.net> <512F2FE9.9080001@egenix.com> <3D8F33FF-A4FA-45B9-8AF5-97DA91876C1E@coderanger.net> <512F9E8A.1010707@egenix.com> Message-ID: Since we're hotly contesting the pypi terms of service - I thought I'd page Van, who is the chairman and I'm pretty sure drafted the terms of service for pypi for the foundation. He should be able to bludgeon us all! Jesse On Feb 28, 2013, at 8:31 PM, Terry Reedy wrote: > On 2/28/2013 1:19 PM, Noah Kantrowitz wrote: > >> Because I happen to have YouTube open anyway: >> >> """ For clarity, you retain all of your ownership rights in your >> Content. However, by submitting Content to YouTube, you hereby grant >> YouTube a worldwide, non-exclusive, royalty-free, sublicenseable and >> transferable license to use, reproduce, distribute, prepare >> derivative works of, display, and perform the Content in connection >> with the Service and YouTube's (and its successors' and affiliates') >> business, including without limitation for promoting and >> redistributing part or all of the Service (and derivative works >> thereof) in any media formats and through any media channels. You >> also hereby grant each user of the Service a non-exclusive license to >> access your Content through the Service, and to use, reproduce, >> distribute, display and perform such Content as permitted through the >> functionality of the Service and under these Terms of Service. The >> above licenses granted by you in video Content you submit to the >> Service terminate within a commercially reasonable time after you >> remove or delete your videos from the Service. You understand and >> agree, however, that YouTube may retain, but not display, distribute, >> or perform, server copies of your videos that have been removed or >> deleted. The above licenses granted by you in user comments you >> submit are perpetual and irrevocable. """ >> >> Slightly different wording, > > Noah, I understand that you desperately do not want to admit that the PSF requirement for uploading to it servers is unusually broad, because you do not want to admit that rational people might have a reason to not upload, but there it is. > > 1. The uploader only authorizes distribution via the YouTube infrastructure. Indeed, Google want that limitation because it wants to be the one that monetizes distribution. So it only streams videos (free ones, anyway) and does *not* download. Anyone who subverts this and captures the stream as a download has no rights to it. > > 2. The uploader can terminate the license with Google. Because of #1, such termination stops anyone from legal distribution. > > Note: Flickr gives uploaders the choice of whether images can be downloaded or only embedded in a flickr web page. It also lets uploaders set the license that applies to flickr users. And it allows deletion of images. > >> only the license to comments is irrevocable, > > Irrelevant to this discussion. > > > for videos they just promise to stop distributing > > This is the important point. > >> but not actually remove your content. > > This is a mostly irrelevant practical issue. Finding and scrubbing every backup copy is difficult and expensive, especially for disk-image backups or serial tape media (if indeed they still use such) or backups stuck down in a deep salt mine. Any repository that does backups has to have this proviso. (I am sure, for instance, that Flickr does now.) > > > My take on the current license is this: the original upload license was rather minimal. The lawyer decided it was insufficient. Rather that craft a broader license with the absolute minimum rights grant necessary, the lawyer took the easy, quick, and cheap-for-psf route of a maximal rights grant. That is okay with me as long as it is not mis-represented and as long as people do not try to bludgeon me or anyone else in signing something we do not agree to. > > Note: when I contribute text and code to the CPython repository, I also give up all control. I know and accept that, and even want that, because it also means that I can re-write *other* people's text and code. But people may reasonably want to keep more control over their independent sole-author work. > > -- > Terry Jan Reedy > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig From jnoller at gmail.com Fri Mar 1 12:30:02 2013 From: jnoller at gmail.com (Jesse Noller) Date: Fri, 1 Mar 2013 06:30:02 -0500 Subject: [Catalog-sig] Deprecate External Links In-Reply-To: <513073E5.20900@egenix.com> References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> <20130228200848.GB9677@merlinux.eu> <513073E5.20900@egenix.com> Message-ID: Marc Andre: I'm cc'ing Van: can you explain why the pypi terms are a bummer so we can see if there is actually an issue to be resolved or a matter of taste? We need to protect the foundation while preserving author rights - but I don't want one user / subset dictating how we evolve the technology. Jesse On Mar 1, 2013, at 4:24 AM, "M.-A. Lemburg" wrote: > On 01.03.2013 10:02, Reinout van Rees wrote: >> On 28-02-13 21:08, holger krekel wrote: >>>> I have seen that position in this discussion ("I have to upload 120 >>>>> files per release, so I won't do that", for instance). >> >>> haven't seen that. >> >> Marc-Andre Lemburg said this, which I took to mean 120 uploads per release: >> >> """ >> However, taking our egenix-mx-base package as example, we have >> 120 distribution files for every single release. Uploading those >> to PyPI would not only take long, but also ... >> """ > > Correct, with a total of over 100MB per release. However, the above > quote is slightly incorrect: I did not say "I won't do that", just > that there are issues with doing this: > > * It currently takes too long uploading that many files to > PyPI. This causes a problem, since in order to start the upload, > we have to register the release on PyPI, which tools will then > immediately find. However, during the upload time, they won't > necessarily find the right files to download and then fail. > > The proposed pull mechanism (see > http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) > would work around this problem: tools would simply go to > our servers in case they can't find the files on PyPI. > > * PyPI doesn't allow us to upload two egg files with the same > name: we have to provide egg files for UCS2 Python builds and > UCS4 Python builds, since easy_install/setuptools/pip don't > differentiate between the two variants. This is the main > reason why we're hosting our own PyPI-style indexes, one for > UCS2 and the other for UCS4 builds: > https://downloads.egenix.com/python/index/ucs2/ > https://downloads.egenix.com/python/index/ucs4/ > > * I'm not sure whether we want to import our crypto packages > to the US, so for a subset of the files, we'd probably > continue to use our servers in Germany. > > Again, with the above proposal, this shouldn't be a problem. > > * Ihe PyPI terms are a bummer for us, but this can be fixed, > I guess. > > If we can resolve the issues, we'd have no problem having the > files mirrored on PyPI. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 01 2013) >>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig From mal at egenix.com Fri Mar 1 12:47:22 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 01 Mar 2013 12:47:22 +0100 Subject: [Catalog-sig] PyPI terms (was: Deprecate External Links) In-Reply-To: References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> <20130228200848.GB9677@merlinux.eu> <513073E5.20900@egenix.com> Message-ID: <5130954A.2050805@egenix.com> On 01.03.2013 12:30, Jesse Noller wrote: > Marc Andre: I'm cc'ing Van: can you explain why the pypi terms are a bummer so we can see if there is actually an issue to be resolved or a matter of taste? > > We need to protect the foundation while preserving author rights - but I don't want one user / subset dictating how we evolve the technology. I think we should move this discussion to the python-legal-sig list: http://mail.python.org/mailman/listinfo/python-legal-sig Let me know when you've subscribed and then we can hash things out on that list. The catalog sig is not really the suitable place for these discussions. > Jesse > > On Mar 1, 2013, at 4:24 AM, "M.-A. Lemburg" wrote: > >> On 01.03.2013 10:02, Reinout van Rees wrote: >>> On 28-02-13 21:08, holger krekel wrote: >>>>> I have seen that position in this discussion ("I have to upload 120 >>>>>> files per release, so I won't do that", for instance). >>> >>>> haven't seen that. >>> >>> Marc-Andre Lemburg said this, which I took to mean 120 uploads per release: >>> >>> """ >>> However, taking our egenix-mx-base package as example, we have >>> 120 distribution files for every single release. Uploading those >>> to PyPI would not only take long, but also ... >>> """ >> >> Correct, with a total of over 100MB per release. However, the above >> quote is slightly incorrect: I did not say "I won't do that", just >> that there are issues with doing this: >> >> * It currently takes too long uploading that many files to >> PyPI. This causes a problem, since in order to start the upload, >> we have to register the release on PyPI, which tools will then >> immediately find. However, during the upload time, they won't >> necessarily find the right files to download and then fail. >> >> The proposed pull mechanism (see >> http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) >> would work around this problem: tools would simply go to >> our servers in case they can't find the files on PyPI. >> >> * PyPI doesn't allow us to upload two egg files with the same >> name: we have to provide egg files for UCS2 Python builds and >> UCS4 Python builds, since easy_install/setuptools/pip don't >> differentiate between the two variants. This is the main >> reason why we're hosting our own PyPI-style indexes, one for >> UCS2 and the other for UCS4 builds: >> https://downloads.egenix.com/python/index/ucs2/ >> https://downloads.egenix.com/python/index/ucs4/ >> >> * I'm not sure whether we want to import our crypto packages >> to the US, so for a subset of the files, we'd probably >> continue to use our servers in Germany. >> >> Again, with the above proposal, this shouldn't be a problem. >> >> * Ihe PyPI terms are a bummer for us, but this can be fixed, >> I guess. >> >> If we can resolve the issues, we'd have no problem having the >> files mirrored on PyPI. >> >> -- >> Marc-Andre Lemburg >> eGenix.com >> >> Professional Python Services directly from the Source (#1, Mar 01 2013) >>>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ >> ________________________________________________________________________ >> >> ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: >> >> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >> Registered at Amtsgericht Duesseldorf: HRB 46611 >> http://www.egenix.com/company/contact/ >> _______________________________________________ >> Catalog-SIG mailing list >> Catalog-SIG at python.org >> http://mail.python.org/mailman/listinfo/catalog-sig -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 01 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From jnoller at gmail.com Fri Mar 1 13:18:24 2013 From: jnoller at gmail.com (Jesse Noller) Date: Fri, 1 Mar 2013 07:18:24 -0500 Subject: [Catalog-sig] PyPI terms (was: Deprecate External Links) In-Reply-To: <5130954A.2050805@egenix.com> References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> <20130228200848.GB9677@merlinux.eu> <513073E5.20900@egenix.com> <5130954A.2050805@egenix.com> Message-ID: I am subscribed: I made the list. We're both board directors too. Changes to the tos should come from legal counsel, and the board On Mar 1, 2013, at 6:47 AM, "M.-A. Lemburg" wrote: > On 01.03.2013 12:30, Jesse Noller wrote: >> Marc Andre: I'm cc'ing Van: can you explain why the pypi terms are a bummer so we can see if there is actually an issue to be resolved or a matter of taste? >> >> We need to protect the foundation while preserving author rights - but I don't want one user / subset dictating how we evolve the technology. > > I think we should move this discussion to the python-legal-sig list: > > http://mail.python.org/mailman/listinfo/python-legal-sig > > Let me know when you've subscribed and then we can hash things > out on that list. The catalog sig is not really the suitable > place for these discussions. > >> Jesse >> >> On Mar 1, 2013, at 4:24 AM, "M.-A. Lemburg" wrote: >> >>> On 01.03.2013 10:02, Reinout van Rees wrote: >>>> On 28-02-13 21:08, holger krekel wrote: >>>>>> I have seen that position in this discussion ("I have to upload 120 >>>>>>> files per release, so I won't do that", for instance). >>>> >>>>> haven't seen that. >>>> >>>> Marc-Andre Lemburg said this, which I took to mean 120 uploads per release: >>>> >>>> """ >>>> However, taking our egenix-mx-base package as example, we have >>>> 120 distribution files for every single release. Uploading those >>>> to PyPI would not only take long, but also ... >>>> """ >>> >>> Correct, with a total of over 100MB per release. However, the above >>> quote is slightly incorrect: I did not say "I won't do that", just >>> that there are issues with doing this: >>> >>> * It currently takes too long uploading that many files to >>> PyPI. This causes a problem, since in order to start the upload, >>> we have to register the release on PyPI, which tools will then >>> immediately find. However, during the upload time, they won't >>> necessarily find the right files to download and then fail. >>> >>> The proposed pull mechanism (see >>> http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) >>> would work around this problem: tools would simply go to >>> our servers in case they can't find the files on PyPI. >>> >>> * PyPI doesn't allow us to upload two egg files with the same >>> name: we have to provide egg files for UCS2 Python builds and >>> UCS4 Python builds, since easy_install/setuptools/pip don't >>> differentiate between the two variants. This is the main >>> reason why we're hosting our own PyPI-style indexes, one for >>> UCS2 and the other for UCS4 builds: >>> https://downloads.egenix.com/python/index/ucs2/ >>> https://downloads.egenix.com/python/index/ucs4/ >>> >>> * I'm not sure whether we want to import our crypto packages >>> to the US, so for a subset of the files, we'd probably >>> continue to use our servers in Germany. >>> >>> Again, with the above proposal, this shouldn't be a problem. >>> >>> * Ihe PyPI terms are a bummer for us, but this can be fixed, >>> I guess. >>> >>> If we can resolve the issues, we'd have no problem having the >>> files mirrored on PyPI. >>> >>> -- >>> Marc-Andre Lemburg >>> eGenix.com >>> >>> Professional Python Services directly from the Source (#1, Mar 01 2013) >>>>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ >>> ________________________________________________________________________ >>> >>> ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: >>> >>> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >>> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >>> Registered at Amtsgericht Duesseldorf: HRB 46611 >>> http://www.egenix.com/company/contact/ >>> _______________________________________________ >>> Catalog-SIG mailing list >>> Catalog-SIG at python.org >>> http://mail.python.org/mailman/listinfo/catalog-sig > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 01 2013) >>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ From mal at egenix.com Fri Mar 1 13:20:55 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 01 Mar 2013 13:20:55 +0100 Subject: [Catalog-sig] PyPI terms In-Reply-To: References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> <20130228200848.GB9677@merlinux.eu> <513073E5.20900@egenix.com> <5130954A.2050805@egenix.com> Message-ID: <51309D27.4000204@egenix.com> On 01.03.2013 13:18, Jesse Noller wrote: > I am subscribed: I made the list. We're both board directors too. Changes to the tos should come from legal counsel, and the board Van and all others who are interested as well ? > On Mar 1, 2013, at 6:47 AM, "M.-A. Lemburg" wrote: > >> On 01.03.2013 12:30, Jesse Noller wrote: >>> Marc Andre: I'm cc'ing Van: can you explain why the pypi terms are a bummer so we can see if there is actually an issue to be resolved or a matter of taste? >>> >>> We need to protect the foundation while preserving author rights - but I don't want one user / subset dictating how we evolve the technology. >> >> I think we should move this discussion to the python-legal-sig list: >> >> http://mail.python.org/mailman/listinfo/python-legal-sig >> >> Let me know when you've subscribed and then we can hash things >> out on that list. The catalog sig is not really the suitable >> place for these discussions. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 01 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From jnoller at gmail.com Fri Mar 1 13:23:53 2013 From: jnoller at gmail.com (Jesse Noller) Date: Fri, 1 Mar 2013 07:23:53 -0500 Subject: [Catalog-sig] PyPI terms (was: Deprecate External Links) In-Reply-To: <5130954A.2050805@egenix.com> References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> <20130228200848.GB9677@merlinux.eu> <513073E5.20900@egenix.com> <5130954A.2050805@egenix.com> Message-ID: <2EC9E943-A57F-4444-956D-FA3AB7AE13AD@gmail.com> Either the tos is preventing pypi tech, security and distribution enhancements, or they aren't. If its the latter: then we can stop trotting them out as a reason to grossly improve our services and infrastructure. On Mar 1, 2013, at 6:47 AM, "M.-A. Lemburg" wrote: > On 01.03.2013 12:30, Jesse Noller wrote: >> Marc Andre: I'm cc'ing Van: can you explain why the pypi terms are a bummer so we can see if there is actually an issue to be resolved or a matter of taste? >> >> We need to protect the foundation while preserving author rights - but I don't want one user / subset dictating how we evolve the technology. > > I think we should move this discussion to the python-legal-sig list: > > http://mail.python.org/mailman/listinfo/python-legal-sig > > Let me know when you've subscribed and then we can hash things > out on that list. The catalog sig is not really the suitable > place for these discussions. > >> Jesse >> >> On Mar 1, 2013, at 4:24 AM, "M.-A. Lemburg" wrote: >> >>> On 01.03.2013 10:02, Reinout van Rees wrote: >>>> On 28-02-13 21:08, holger krekel wrote: >>>>>> I have seen that position in this discussion ("I have to upload 120 >>>>>>> files per release, so I won't do that", for instance). >>>> >>>>> haven't seen that. >>>> >>>> Marc-Andre Lemburg said this, which I took to mean 120 uploads per release: >>>> >>>> """ >>>> However, taking our egenix-mx-base package as example, we have >>>> 120 distribution files for every single release. Uploading those >>>> to PyPI would not only take long, but also ... >>>> """ >>> >>> Correct, with a total of over 100MB per release. However, the above >>> quote is slightly incorrect: I did not say "I won't do that", just >>> that there are issues with doing this: >>> >>> * It currently takes too long uploading that many files to >>> PyPI. This causes a problem, since in order to start the upload, >>> we have to register the release on PyPI, which tools will then >>> immediately find. However, during the upload time, they won't >>> necessarily find the right files to download and then fail. >>> >>> The proposed pull mechanism (see >>> http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) >>> would work around this problem: tools would simply go to >>> our servers in case they can't find the files on PyPI. >>> >>> * PyPI doesn't allow us to upload two egg files with the same >>> name: we have to provide egg files for UCS2 Python builds and >>> UCS4 Python builds, since easy_install/setuptools/pip don't >>> differentiate between the two variants. This is the main >>> reason why we're hosting our own PyPI-style indexes, one for >>> UCS2 and the other for UCS4 builds: >>> https://downloads.egenix.com/python/index/ucs2/ >>> https://downloads.egenix.com/python/index/ucs4/ >>> >>> * I'm not sure whether we want to import our crypto packages >>> to the US, so for a subset of the files, we'd probably >>> continue to use our servers in Germany. >>> >>> Again, with the above proposal, this shouldn't be a problem. >>> >>> * Ihe PyPI terms are a bummer for us, but this can be fixed, >>> I guess. >>> >>> If we can resolve the issues, we'd have no problem having the >>> files mirrored on PyPI. >>> >>> -- >>> Marc-Andre Lemburg >>> eGenix.com >>> >>> Professional Python Services directly from the Source (#1, Mar 01 2013) >>>>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ >>> ________________________________________________________________________ >>> >>> ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: >>> >>> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >>> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >>> Registered at Amtsgericht Duesseldorf: HRB 46611 >>> http://www.egenix.com/company/contact/ >>> _______________________________________________ >>> Catalog-SIG mailing list >>> Catalog-SIG at python.org >>> http://mail.python.org/mailman/listinfo/catalog-sig > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 01 2013) >>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ From jnoller at gmail.com Fri Mar 1 13:26:35 2013 From: jnoller at gmail.com (Jesse Noller) Date: Fri, 1 Mar 2013 07:26:35 -0500 Subject: [Catalog-sig] PyPI terms In-Reply-To: <51309D27.4000204@egenix.com> References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> <20130228200848.GB9677@merlinux.eu> <513073E5.20900@egenix.com> <5130954A.2050805@egenix.com> <51309D27.4000204@egenix.com> Message-ID: <68D705AC-BBD0-415D-8AC3-8DEA93F02FE3@gmail.com> On Mar 1, 2013, at 7:20 AM, "M.-A. Lemburg" wrote: > On 01.03.2013 13:18, Jesse Noller wrote: >> I am subscribed: I made the list. We're both board directors too. Changes to the tos should come from legal counsel, and the board > > Van and all others who are interested as well ? I do not see a reason for van to be subscribed unless he really wants more email. Actual issues need to be addressed by the board, and elevated there > >> On Mar 1, 2013, at 6:47 AM, "M.-A. Lemburg" wrote: >> >>> On 01.03.2013 12:30, Jesse Noller wrote: >>>> Marc Andre: I'm cc'ing Van: can you explain why the pypi terms are a bummer so we can see if there is actually an issue to be resolved or a matter of taste? >>>> >>>> We need to protect the foundation while preserving author rights - but I don't want one user / subset dictating how we evolve the technology. >>> >>> I think we should move this discussion to the python-legal-sig list: >>> >>> http://mail.python.org/mailman/listinfo/python-legal-sig >>> >>> Let me know when you've subscribed and then we can hash things >>> out on that list. The catalog sig is not really the suitable >>> place for these discussions. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 01 2013) >>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ From mal at egenix.com Fri Mar 1 14:56:11 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 01 Mar 2013 14:56:11 +0100 Subject: [Catalog-sig] PyPI terms In-Reply-To: <5130954A.2050805@egenix.com> References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> <20130228200848.GB9677@merlinux.eu> <513073E5.20900@egenix.com> <5130954A.2050805@egenix.com> Message-ID: <5130B37B.6050501@egenix.com> On 01.03.2013 12:47, M.-A. Lemburg wrote: > On 01.03.2013 12:30, Jesse Noller wrote: >> Marc Andre: I'm cc'ing Van: can you explain why the pypi terms are a bummer so we can see if there is actually an issue to be resolved or a matter of taste? >> >> We need to protect the foundation while preserving author rights - but I don't want one user / subset dictating how we evolve the technology. > > I think we should move this discussion to the python-legal-sig list: > > http://mail.python.org/mailman/listinfo/python-legal-sig > > Let me know when you've subscribed and then we can hash things > out on that list. The catalog sig is not really the suitable > place for these discussions. I've kicked off the discussion on the other list. See you there. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 01 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From jnoller at gmail.com Fri Mar 1 15:02:39 2013 From: jnoller at gmail.com (Jesse Noller) Date: Fri, 1 Mar 2013 09:02:39 -0500 Subject: [Catalog-sig] PyPI terms In-Reply-To: <5130B37B.6050501@egenix.com> References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> <20130228200848.GB9677@merlinux.eu> <513073E5.20900@egenix.com> <5130954A.2050805@egenix.com> <5130B37B.6050501@egenix.com> Message-ID: Okie doke. So we can move on to putting up the CDN and deprecating external links for now? On Fri, Mar 1, 2013 at 8:56 AM, M.-A. Lemburg wrote: > On 01.03.2013 12:47, M.-A. Lemburg wrote: > > On 01.03.2013 12:30, Jesse Noller wrote: > >> Marc Andre: I'm cc'ing Van: can you explain why the pypi terms are a > bummer so we can see if there is actually an issue to be resolved or a > matter of taste? > >> > >> We need to protect the foundation while preserving author rights - but > I don't want one user / subset dictating how we evolve the technology. > > > > I think we should move this discussion to the python-legal-sig list: > > > > http://mail.python.org/mailman/listinfo/python-legal-sig > > > > Let me know when you've subscribed and then we can hash things > > out on that list. The catalog sig is not really the suitable > > place for these discussions. > > I've kicked off the discussion on the other list. See you there. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 01 2013) > >>> Python Projects, Consulting and Support ... http://www.egenix.com/ > >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Fri Mar 1 15:11:12 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 01 Mar 2013 15:11:12 +0100 Subject: [Catalog-sig] PyPI terms In-Reply-To: References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> <20130228200848.GB9677@merlinux.eu> <513073E5.20900@egenix.com> <5130954A.2050805@egenix.com> <5130B37B.6050501@egenix.com> Message-ID: <5130B700.9070705@egenix.com> On 01.03.2013 15:02, Jesse Noller wrote: > Okie doke. So we can move on to putting up the CDN and deprecating external > links for now? I don't think anyone is against putting up a CDN. It should meet the same security requirements we have for the pypi server itself, ie. HTTPS all the way, proper certificates, operated by the PSF, perhaps run on a different domain, and whatever other goodies Donald can come up with ;-) For the external links we need a migration path... that's in the works. See http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal for a proposal that allows migrating away from relying on external hosts in a backwards compatible and secure way. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 01 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From van.lindberg at gmail.com Fri Mar 1 15:37:49 2013 From: van.lindberg at gmail.com (VanL) Date: Fri, 1 Mar 2013 08:37:49 -0600 Subject: [Catalog-sig] PyPI terms In-Reply-To: <5130B37B.6050501@egenix.com> References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> <20130228200848.GB9677@merlinux.eu> <513073E5.20900@egenix.com> <5130954A.2050805@egenix.com> <5130B37B.6050501@egenix.com> Message-ID: <948E0503DDAA4FB496104EC483F993F4@gmail.com> Please forward to catalog-sig if this gets bounced. I'm not on that list. I drafted these terms of service. I know they are broad. They were made exactly as broad as was needed. This was not the case that we took the cheap-and-easy route of a maximal rights grant. (And besides, it would have been equally cheap for the PSF either way). What it was is that we investigated and found out all the different ways that people were using PyPI. Of particular importance were these: - Automated access from scripts (We can't pass through any license terms - no click through or agreement to use - Automated mirroring - and re-mirroring of mirrors - without any agreement, both to public and private repositories (We need the right to distribute and to allow others to distribute. We needed to protect our downstream and make sure that their common use cases aren't infringing) These terms were chosen so that our community would have the rights to do these very common things and not be infringing. The only way we could do this was by asking for a broader grant at the time of distribution. Also, what no one gets is that *the license does not allow modification!* So you can distribute far and wide for any purpose - but you can only distribute what the original author uploaded without being liable for infringement. People have also said that this overrules the licenses on their packages. That is not so! The licenses in this case run in parallel, and distribution needs to satisfy both licenses or it cannot be done at all. This was the subject of a lot of thought and a lot of work that a lot of people have not even considered, and it was chosen very deliberately to protect our overall community. Because the protection of the community is a broad purpose, it needed some broad provisions - but it is as tightly crafted as I could get while still not making our known downstream uses infringing. If it gets changed, it will be over my strenuous objections. Van ____________________________ Van Lindberg van.lindberg at gmail.com On Friday, March 1, 2013 at 7:56 AM, M.-A. Lemburg wrote: > On 01.03.2013 12:47, M.-A. Lemburg wrote: > > On 01.03.2013 12:30, Jesse Noller wrote: > > > Marc Andre: I'm cc'ing Van: can you explain why the pypi terms are a bummer so we can see if there is actually an issue to be resolved or a matter of taste? > > > > > > We need to protect the foundation while preserving author rights - but I don't want one user / subset dictating how we evolve the technology. > > > > I think we should move this discussion to the python-legal-sig list: > > > > http://mail.python.org/mailman/listinfo/python-legal-sig > > > > Let me know when you've subscribed and then we can hash things > > out on that list. The catalog sig is not really the suitable > > place for these discussions. > > > > > I've kicked off the discussion on the other list. See you there. > > -- > Marc-Andre Lemburg > eGenix.com (http://eGenix.com) > > Professional Python Services directly from the Source (#1, Mar 01 2013) > > > > Python Projects, Consulting and Support ... http://www.egenix.com/ > > > > mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ > > > > mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > > > > > > > > > > > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com (http://eGenix.com) Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From holger at merlinux.eu Fri Mar 1 16:19:24 2013 From: holger at merlinux.eu (holger krekel) Date: Fri, 1 Mar 2013 15:19:24 +0000 Subject: [Catalog-sig] PyPI terms In-Reply-To: <5130B700.9070705@egenix.com> References: <20130228094343.GY9677@merlinux.eu> <20130228200848.GB9677@merlinux.eu> <513073E5.20900@egenix.com> <5130954A.2050805@egenix.com> <5130B37B.6050501@egenix.com> <5130B700.9070705@egenix.com> Message-ID: <20130301151924.GK9677@merlinux.eu> On Fri, Mar 01, 2013 at 15:11 +0100, M.-A. Lemburg wrote: > On 01.03.2013 15:02, Jesse Noller wrote: > > Okie doke. So we can move on to putting up the CDN and deprecating external > > links for now? > > I don't think anyone is against putting up a CDN. It should meet > the same security requirements we have for the pypi server itself, > ie. HTTPS all the way, proper certificates, operated by the PSF, > perhaps run on a different domain, and whatever other goodies > Donald can come up with ;-) > > For the external links we need a migration path... that's in the works. > > See http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal for > a proposal that allows migrating away from relying on external > hosts in a backwards compatible and secure way. The page doesn't describe the current "scraping" situation accurately. As mentioned in my last post, pip/easy_install do _not_ visit all links found in simple/PKGNAME. Only the ones with rel="home_page" or rel="download". So the proposal effectively says to not visit "homepage" links by default and use a special format for download ones. The special format i am not sure about - i guess the SHA256 hash there is to make sure the target content is the correct one, right? What about abusing download_url some more and do a multiline-format like this: HASH1 URL-TO-RELEASE-FILE1 HASH2 URL-TO-RELEASE-FILE2 This way we can avoid any additional http-requests on the pip/easy_install client side _and_ allow multiple release files. The simple/PKGNAME metadata would contain all information that is needed (and we could probably introduce a special syntax for #egg github/bitbucket-style tarballs). Those URLs would only be retrieved if the client-side installer determines it needs them because of the user-required version. You wouldn't need to create a special "-download.html" file then, no additional http requests, and it's easy to create this format without much tool support. Can't incorporate this into the wiki right now myself and i'd probably like to structure the page differently. The issue here really is the (future) behaviour of easy_install and pip, not so much distutils or the pypi server (apart from the worthwhile-to-consider idea of pulling/caching things). On a side note i'd rather prefer this to be a github/bitbucket project where i can submit a pull request :) best, holger > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 01 2013) > >>> Python Projects, Consulting and Support ... http://www.egenix.com/ > >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From pje at telecommunity.com Fri Mar 1 17:29:17 2013 From: pje at telecommunity.com (PJ Eby) Date: Fri, 1 Mar 2013 11:29:17 -0500 Subject: [Catalog-sig] homepage/download metadata cleaning In-Reply-To: <20130301111707.GI9677@merlinux.eu> References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com> <71AA0F5ADB4E4C33BBB37833733526A0@gmail.com> <20130301111707.GI9677@merlinux.eu> Message-ID: On Fri, Mar 1, 2013 at 6:17 AM, holger krekel wrote: > On Fri, Mar 01, 2013 at 06:09 -0500, Donald Stufft wrote: >> On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote: >> > On 01.03.2013 11:19, holger krekel wrote: >> > > Hi Richard, all, >> > > >> > > somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py" >> > > script which takes a project name as an argument and then goes to >> > > pypi.python.org (http://pypi.python.org) and removes all homepage/download metadata entries for >> > > this project. This sanitizes/speeds up installation because >> > > pip/easy_install don't need to crawl them anymore. I just did this for >> > > three of my projects, (pytest, tox and py) and it seems to work fine. >> > > >> > >> > >> > Does it also cleanup the links that PyPI adds to the /simple/ by >> > parsing the project description for links ? >> > >> > I think those are far nastier than the homepage and download links, >> > which can be put to some good use to limit the external lookups >> > (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) >> > >> > See e.g. https://pypi.python.org/simple/zc.buildout/ >> > for a good example of the mess this generates... even mailto links >> > get listed and "file:///" links open up the installers for all >> > kinds of nasty things (unless they explicitly protect against >> > following these). >> > >> > >> >> pip at least, and I assume the other tools don't spider those links, but >> they do consider them for download (e.g. if the link looks installable >> it will be a candidate for installing, but it won't fetch it, and look for >> more links like it will donwnload_url/home_page). >> >> I believe that's the way it's structured atm. > > That's right. Even though the long-description extracted links > look ugly on a simple/PKGNAME page, neither pip nor easy_install do anything > with them except if the "href" ends in "#egg=PKGNAME-" in which case they are > taken as pointing to a development tarball (e.g. at github or bitbucket). > ASFAIK a link like "PKGNAME-VER.tar.gz" will not be treated as > an installation candidate, just the "#egg=PKGNAME" one. Both are considered "primary links". A primary link is a link whose filename portion matches one of the supported distutils or setuptools file formats, or is marked with an #egg tag. Primary links are indexed as to project name and version, so that if that version/format is chosen as the best candidate, it will be downloaded and installed. Links marked with rel="homepage" or rel="download" are "secondary links". Secondary links are actively retrieved and scanned to look for more primary links. No further secondary links are scanned or followed. (Details of all of this can be found at: http://peak.telecommunity.com/DevCenter/setuptools#making-your-package-available-for-easyinstall ) This basically means that MAL's proposal for a download.html file is actually a bit moot: you can just stick direct "primary" download URLs in your PyPI description field, and the tools will pick them up. They can even include #md5 info. (See http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api - item 4 mentions the description part.) This means, by the way, that you could make an external link cleaner which spiders the external pages and pulls the candidates onto the description for that release, thereby keeping useful primary links and getting rid of the secondary links used to fetch them. From pje at telecommunity.com Fri Mar 1 17:37:56 2013 From: pje at telecommunity.com (PJ Eby) Date: Fri, 1 Mar 2013 11:37:56 -0500 Subject: [Catalog-sig] Deprecate External Links In-Reply-To: <513073E5.20900@egenix.com> References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> <20130228200848.GB9677@merlinux.eu> <513073E5.20900@egenix.com> Message-ID: On Fri, Mar 1, 2013 at 4:24 AM, M.-A. Lemburg wrote: > On 01.03.2013 10:02, Reinout van Rees wrote: >> On 28-02-13 21:08, holger krekel wrote: >>>> I have seen that position in this discussion ("I have to upload 120 >>>> >files per release, so I won't do that", for instance). >> >>> haven't seen that. >> >> Marc-Andre Lemburg said this, which I took to mean 120 uploads per release: >> >> """ >> However, taking our egenix-mx-base package as example, we have >> 120 distribution files for every single release. Uploading those >> to PyPI would not only take long, but also ... >> """ > > Correct, with a total of over 100MB per release. However, the above > quote is slightly incorrect: I did not say "I won't do that", just > that there are issues with doing this: > > * It currently takes too long uploading that many files to > PyPI. This causes a problem, since in order to start the upload, > we have to register the release on PyPI, which tools will then > immediately find. However, during the upload time, they won't > necessarily find the right files to download and then fail. Actually, easy_install doesn't pay any attention to what releases are registered. It just looks for primary and secondary links. If there are links for a version that it can use, it uses it. If it does not find links for a version, then that version does not exist, as far as it is concerned. So registering without files is not a problem. > The proposed pull mechanism (see > http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) > would work around this problem: tools would simply go to > our servers in case they can't find the files on PyPI. That proposal is unnecessary, actually. You could *right now* simply place binary download links (with optional "#md5=...." verification) in your package's description field, and the moment you registered the package, existing tools would find those links and download them from your site. You could then remove your home page and download URLs from the relevant fields, and place them also in the description. (easy_install does not follow non-download links within the description -- i.e., links that don't end in .egg, .tgz, etc. and don't have an #egg tag.) > * PyPI doesn't allow us to upload two egg files with the same > name: we have to provide egg files for UCS2 Python builds and > UCS4 Python builds, since easy_install/setuptools/pip don't > differentiate between the two variants. They can if it's part of the platform string; the catch is that right now it's not. We'd have to go through an upgrade cycle of the tools to support that. I need to take a look at what PEP 427 is doing (and you should too, if you haven't already) to get this part sorted out. From mal at egenix.com Fri Mar 1 17:50:18 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 01 Mar 2013 17:50:18 +0100 Subject: [Catalog-sig] [Python-legal-sig] PyPI terms In-Reply-To: <948E0503DDAA4FB496104EC483F993F4@gmail.com> References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> <20130228200848.GB9677@merlinux.eu> <513073E5.20900@egenix.com> <5130954A.2050805@egenix.com> <5130B37B.6050501@egenix.com> <948E0503DDAA4FB496104EC483F993F4@gmail.com> Message-ID: <5130DC4A.1030406@egenix.com> Hi Van, please read my long posting to the python-legal list. This explains the concerns and makes suggestions on how to improve things in a way that is compatible with what PyPI is and how it is used today: http://mail.python.org/pipermail/python-legal-sig/2013-March/000000.html PS: I'd prefer if you not cross-post to both lists and keep the discussion to the legal list. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 01 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Fri Mar 1 20:31:28 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 01 Mar 2013 20:31:28 +0100 Subject: [Catalog-sig] homepage/download metadata cleaning In-Reply-To: <20130301111707.GI9677@merlinux.eu> References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com> <71AA0F5ADB4E4C33BBB37833733526A0@gmail.com> <20130301111707.GI9677@merlinux.eu> Message-ID: <51310210.5050203@egenix.com> On 01.03.2013 12:17, holger krekel wrote: > On Fri, Mar 01, 2013 at 06:09 -0500, Donald Stufft wrote: >> On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote: >>> On 01.03.2013 11:19, holger krekel wrote: >>>> Hi Richard, all, >>>> >>>> somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py" >>>> script which takes a project name as an argument and then goes to >>>> pypi.python.org (http://pypi.python.org) and removes all homepage/download metadata entries for >>>> this project. This sanitizes/speeds up installation because >>>> pip/easy_install don't need to crawl them anymore. I just did this for >>>> three of my projects, (pytest, tox and py) and it seems to work fine. >>>> >>> >>> >>> Does it also cleanup the links that PyPI adds to the /simple/ by >>> parsing the project description for links ? >>> >>> I think those are far nastier than the homepage and download links, >>> which can be put to some good use to limit the external lookups >>> (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) >>> >>> See e.g. https://pypi.python.org/simple/zc.buildout/ >>> for a good example of the mess this generates... even mailto links >>> get listed and "file:///" links open up the installers for all >>> kinds of nasty things (unless they explicitly protect against >>> following these). >>> >>> >> >> pip at least, and I assume the other tools don't spider those links, but >> they do consider them for download (e.g. if the link looks installable >> it will be a candidate for installing, but it won't fetch it, and look for >> more links like it will donwnload_url/home_page). >> >> I believe that's the way it's structured atm. > > That's right. Even though the long-description extracted links > look ugly on a simple/PKGNAME page, neither pip nor easy_install do anything > with them except if the "href" ends in "#egg=PKGNAME-" in which case they are > taken as pointing to a development tarball (e.g. at github or bitbucket). > ASFAIK a link like "PKGNAME-VER.tar.gz" will not be treated as > an installation candidate, just the "#egg=PKGNAME" one. Hmm, then why not remove links that don't match the above from the /simple/ index pages ? Note that it's easily possible to make e.g. file:/// links have a fragment that matches what you described, so I guess the filters would have to be more careful about what to allow (e.g. only http/ftp schemes, perhaps even only https schemes) and what not. BTW: Are those links also shown as-is on the description page ? People could do nasty stuff by adding "javascript:" links which look like normal links to the descriptions. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 01 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From dholth at gmail.com Fri Mar 1 21:25:52 2013 From: dholth at gmail.com (Daniel Holth) Date: Fri, 1 Mar 2013 15:25:52 -0500 Subject: [Catalog-sig] PEP 425 / 427 compatibility tags Message-ID: On Fri, Mar 1, 2013 at 11:37 AM, PJ Eby wrote: > On Fri, Mar 1, 2013 at 4:24 AM, M.-A. Lemburg wrote: >> On 01.03.2013 10:02, Reinout van Rees wrote: >>> On 28-02-13 21:08, holger krekel wrote: >>>>> I have seen that position in this discussion ("I have to upload 120 >>>>> >files per release, so I won't do that", for instance). >>> >>>> haven't seen that. >>> >>> Marc-Andre Lemburg said this, which I took to mean 120 uploads per release: >>> >>> """ >>> However, taking our egenix-mx-base package as example, we have >>> 120 distribution files for every single release. Uploading those >>> to PyPI would not only take long, but also ... >>> """ >> >> Correct, with a total of over 100MB per release. However, the above >> quote is slightly incorrect: I did not say "I won't do that", just >> that there are issues with doing this: >> >> * It currently takes too long uploading that many files to >> PyPI. This causes a problem, since in order to start the upload, >> we have to register the release on PyPI, which tools will then >> immediately find. However, during the upload time, they won't >> necessarily find the right files to download and then fail. > > Actually, easy_install doesn't pay any attention to what releases are > registered. It just looks for primary and secondary links. If there > are links for a version that it can use, it uses it. If it does not > find links for a version, then that version does not exist, as far as > it is concerned. So registering without files is not a problem. > > >> The proposed pull mechanism (see >> http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) >> would work around this problem: tools would simply go to >> our servers in case they can't find the files on PyPI. > > That proposal is unnecessary, actually. You could *right now* simply > place binary download links (with optional "#md5=...." verification) > in your package's description field, and the moment you registered the > package, existing tools would find those links and download them from > your site. You could then remove your home page and download URLs > from the relevant fields, and place them also in the description. > (easy_install does not follow non-download links within the > description -- i.e., links that don't end in .egg, .tgz, etc. and > don't have an #egg tag.) > > >> * PyPI doesn't allow us to upload two egg files with the same >> name: we have to provide egg files for UCS2 Python builds and >> UCS4 Python builds, since easy_install/setuptools/pip don't >> differentiate between the two variants. > > They can if it's part of the platform string; the catch is that right > now it's not. We'd have to go through an upgrade cycle of the tools > to support that. I need to take a look at what PEP 427 is doing (and > you should too, if you haven't already) to get this part sorted out. The compatibility tags are specified in http://www.python.org/dev/peps/pep-0425/ and are first used with PEP 427. The scheme defines a tag which is a combination of implementation, abi, and platform tags, and an algorithm for choosing the "most preferred" among the available builds for a particular release of some distribution. The ABI tags are basically abbreviated versions of the tags from http://www.python.org/dev/peps/pep-3149/ and look like "cp32dmu" for a debug, malloc, wide unicode build of CPython 3.2, or just "cp32" for a Python 3.2 with none of those features compiled in. Your package would probably use tags like "cp32-cp32mu-linux_x86_64". Even though PEP 3149 is a Python 3.2 feature, the *PEP 425* ABI tags are supposed to work in the same way with older version of Python, e.g. "py26u" for a Python 2.6 unicode build. From mal at egenix.com Fri Mar 1 21:27:38 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 01 Mar 2013 21:27:38 +0100 Subject: [Catalog-sig] homepage/download metadata cleaning In-Reply-To: References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com> <71AA0F5ADB4E4C33BBB37833733526A0@gmail.com> <20130301111707.GI9677@merlinux.eu> Message-ID: <51310F3A.6010100@egenix.com> Thank for the feedback, Holger and Phillip. I'll bake this into a version 0.2 of the proposal over the weekend. On 01.03.2013 17:29, PJ Eby wrote: > On Fri, Mar 1, 2013 at 6:17 AM, holger krekel wrote: >> On Fri, Mar 01, 2013 at 06:09 -0500, Donald Stufft wrote: >>> On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote: >>>> On 01.03.2013 11:19, holger krekel wrote: >>>>> Hi Richard, all, >>>>> >>>>> somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py" >>>>> script which takes a project name as an argument and then goes to >>>>> pypi.python.org (http://pypi.python.org) and removes all homepage/download metadata entries for >>>>> this project. This sanitizes/speeds up installation because >>>>> pip/easy_install don't need to crawl them anymore. I just did this for >>>>> three of my projects, (pytest, tox and py) and it seems to work fine. >>>>> >>>> >>>> >>>> Does it also cleanup the links that PyPI adds to the /simple/ by >>>> parsing the project description for links ? >>>> >>>> I think those are far nastier than the homepage and download links, >>>> which can be put to some good use to limit the external lookups >>>> (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) >>>> >>>> See e.g. https://pypi.python.org/simple/zc.buildout/ >>>> for a good example of the mess this generates... even mailto links >>>> get listed and "file:///" links open up the installers for all >>>> kinds of nasty things (unless they explicitly protect against >>>> following these). >>>> >>>> >>> >>> pip at least, and I assume the other tools don't spider those links, but >>> they do consider them for download (e.g. if the link looks installable >>> it will be a candidate for installing, but it won't fetch it, and look for >>> more links like it will donwnload_url/home_page). >>> >>> I believe that's the way it's structured atm. >> >> That's right. Even though the long-description extracted links >> look ugly on a simple/PKGNAME page, neither pip nor easy_install do anything >> with them except if the "href" ends in "#egg=PKGNAME-" in which case they are >> taken as pointing to a development tarball (e.g. at github or bitbucket). >> ASFAIK a link like "PKGNAME-VER.tar.gz" will not be treated as >> an installation candidate, just the "#egg=PKGNAME" one. > > Both are considered "primary links". A primary link is a link whose > filename portion matches one of the supported distutils or setuptools > file formats, or is marked with an #egg tag. Primary links are > indexed as to project name and version, so that if that version/format > is chosen as the best candidate, it will be downloaded and installed. > > Links marked with rel="homepage" or rel="download" are "secondary > links". Secondary links are actively retrieved and scanned to look > for more primary links. No further secondary links are scanned or > followed. (Details of all of this can be found at: > http://peak.telecommunity.com/DevCenter/setuptools#making-your-package-available-for-easyinstall > ) > > This basically means that MAL's proposal for a download.html file is > actually a bit moot: you can just stick direct "primary" download URLs > in your PyPI description field, and the tools will pick them up. They > can even include #md5 info. (See > http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api > - item 4 mentions the description part.) > > This means, by the way, that you could make an external link cleaner > which spiders the external pages and pulls the candidates onto the > description for that release, thereby keeping useful primary links and > getting rid of the secondary links used to fetch them. > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 01 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From donald.stufft at gmail.com Fri Mar 1 23:39:03 2013 From: donald.stufft at gmail.com (Donald Stufft) Date: Fri, 1 Mar 2013 17:39:03 -0500 Subject: [Catalog-sig] homepage/download metadata cleaning In-Reply-To: <51310210.5050203@egenix.com> References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com> <71AA0F5ADB4E4C33BBB37833733526A0@gmail.com> <20130301111707.GI9677@merlinux.eu> <51310210.5050203@egenix.com> Message-ID: <4BACDE7617A842EF9BC1E155D82CFBB9@gmail.com> On Friday, March 1, 2013 at 2:31 PM, M.-A. Lemburg wrote: > On 01.03.2013 12:17, holger krekel wrote: > > On Fri, Mar 01, 2013 at 06:09 -0500, Donald Stufft wrote: > > > On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote: > > > > On 01.03.2013 11:19, holger krekel wrote: > > > > > Hi Richard, all, > > > > > > > > > > somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py" > > > > > script which takes a project name as an argument and then goes to > > > > > pypi.python.org (http://pypi.python.org) and removes all homepage/download metadata entries for > > > > > this project. This sanitizes/speeds up installation because > > > > > pip/easy_install don't need to crawl them anymore. I just did this for > > > > > three of my projects, (pytest, tox and py) and it seems to work fine. > > > > > > > > > > > > > > > > > > > > > Does it also cleanup the links that PyPI adds to the /simple/ by > > > > parsing the project description for links ? > > > > > > > > I think those are far nastier than the homepage and download links, > > > > which can be put to some good use to limit the external lookups > > > > (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) > > > > > > > > See e.g. https://pypi.python.org/simple/zc.buildout/ > > > > for a good example of the mess this generates... even mailto links > > > > get listed and "file:///" links open up the installers for all > > > > kinds of nasty things (unless they explicitly protect against > > > > following these). > > > > > > > > > > > > > pip at least, and I assume the other tools don't spider those links, but > > > they do consider them for download (e.g. if the link looks installable > > > it will be a candidate for installing, but it won't fetch it, and look for > > > more links like it will donwnload_url/home_page). > > > > > > I believe that's the way it's structured atm. > > > > That's right. Even though the long-description extracted links > > look ugly on a simple/PKGNAME page, neither pip nor easy_install do anything > > with them except if the "href" ends in "#egg=PKGNAME-" in which case they are > > taken as pointing to a development tarball (e.g. at github or bitbucket). > > ASFAIK a link like "PKGNAME-VER.tar.gz" will not be treated as > > an installation candidate, just the "#egg=PKGNAME" one. > > > > > Hmm, then why not remove links that don't match the above from > the /simple/ index pages ? > > Note that it's easily possible to make e.g. file:/// links > have a fragment that matches what you described, so I guess the > filters would have to be more careful about what to allow > (e.g. only http/ftp schemes, perhaps even only https schemes) > and what not. > > BTW: Are those links also shown as-is on the description page ? > People could do nasty stuff by adding "javascript:" links which look > like normal links to the descriptions. > > The descriptions don't allow javascript: urls anymore (I reported that ages ago and Richard fixed it). home_page and probably download_url do though. > > -- > Marc-Andre Lemburg > eGenix.com (http://eGenix.com) > > Professional Python Services directly from the Source (#1, Mar 01 2013) > > > > Python Projects, Consulting and Support ... http://www.egenix.com/ > > > > mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ > > > > mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > > > > > > > > > > > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com (http://eGenix.com) Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Fri Mar 1 23:42:34 2013 From: pje at telecommunity.com (PJ Eby) Date: Fri, 1 Mar 2013 17:42:34 -0500 Subject: [Catalog-sig] homepage/download metadata cleaning In-Reply-To: <51310210.5050203@egenix.com> References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com> <71AA0F5ADB4E4C33BBB37833733526A0@gmail.com> <20130301111707.GI9677@merlinux.eu> <51310210.5050203@egenix.com> Message-ID: On Fri, Mar 1, 2013 at 2:31 PM, M.-A. Lemburg wrote: > Hmm, then why not remove links that don't match the above from > the /simple/ index pages ? PyPI provides the links uninterpreted since the tools' interpretations have evolved over time. > Note that it's easily possible to make e.g. file:/// links > have a fragment that matches what you described, so I guess the > filters would have to be more careful about what to allow > (e.g. only http/ftp schemes, perhaps even only https schemes) > and what not. file:// URLs are an intentionally supported feature of easy_install; many users have local NFS-based or other shared repositories. But yes, it certainly would be reasonable to not include links to them on PyPI. ;-) > BTW: Are those links also shown as-is on the description page ? > People could do nasty stuff by adding "javascript:" links which look > like normal links to the descriptions. That's true, but is unrelated to the tools, since the tools can't process javascript links. It would probably be best, though, if PyPI filtered such URLs to prevent script injection/CSRF attacks on logged-in PyPI users browsing project descriptions. I don't know if it already does this or not, since I've never tried to inject a CSRF attack on PyPI. ;-) (I guess technically it would be a same-site request forgery rather than a cross-site one, but you know what I mean.) From regebro at gmail.com Fri Mar 1 23:50:10 2013 From: regebro at gmail.com (Lennart Regebro) Date: Fri, 1 Mar 2013 23:50:10 +0100 Subject: [Catalog-sig] homepage/download metadata cleaning In-Reply-To: <51310210.5050203@egenix.com> References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com> <71AA0F5ADB4E4C33BBB37833733526A0@gmail.com> <20130301111707.GI9677@merlinux.eu> <51310210.5050203@egenix.com> Message-ID: On Fri, Mar 1, 2013 at 8:31 PM, M.-A. Lemburg wrote: > Hmm, then why not remove links that don't match the above from > the /simple/ index pages ? I think we can do that, but if we *start* with that, we will just suddenly, with no warning, break everything. Its' better if the installation tools can first warn, then remove their support for this, and *then* we remove these links from /simple/. That way we break things gradually, with warnings so that package managers can react and adapt. From mal at egenix.com Fri Mar 1 23:54:41 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 01 Mar 2013 23:54:41 +0100 Subject: [Catalog-sig] homepage/download metadata cleaning In-Reply-To: References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com> <71AA0F5ADB4E4C33BBB37833733526A0@gmail.com> <20130301111707.GI9677@merlinux.eu> <51310210.5050203@egenix.com> Message-ID: <513131B1.7090507@egenix.com> On 01.03.2013 23:50, Lennart Regebro wrote: > On Fri, Mar 1, 2013 at 8:31 PM, M.-A. Lemburg wrote: >> Hmm, then why not remove links that don't match the above from >> the /simple/ index pages ? > > I think we can do that, but if we *start* with that, we will just > suddenly, with no warning, break everything. > Its' better if the installation tools can first warn, then remove > their support for this, and *then* we remove these links from > /simple/. > > That way we break things gradually, with warnings so that package > managers can react and adapt. As i understood Holger and Phillip, those linkes are not used by the existing package managers. If there are no users, then nothing should break, right ? Of course, breaking things is a bad idea and I don't want to push for that (migration is much better), I just wondered whether this would be a low hanging fruit to clean up the /simple/ index pages a bit. Is there a tools that scans those non-distribution file links from the package descriptions ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 01 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From holger at merlinux.eu Sat Mar 2 00:02:20 2013 From: holger at merlinux.eu (holger krekel) Date: Fri, 1 Mar 2013 23:02:20 +0000 Subject: [Catalog-sig] homepage/download metadata cleaning In-Reply-To: References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com> <71AA0F5ADB4E4C33BBB37833733526A0@gmail.com> <20130301111707.GI9677@merlinux.eu> <51310210.5050203@egenix.com> Message-ID: <20130301230220.GM9677@merlinux.eu> On Fri, Mar 01, 2013 at 23:50 +0100, Lennart Regebro wrote: > On Fri, Mar 1, 2013 at 8:31 PM, M.-A. Lemburg wrote: > > Hmm, then why not remove links that don't match the above from > > the /simple/ index pages ? > > I think we can do that, but if we *start* with that, we will just > suddenly, with no warning, break everything. > Its' better if the installation tools can first warn, then remove > their support for this, and *then* we remove these links from > /simple/. I think Marc-Andre was just refering to the superflous links from the long-description, namely all links which don't match the #egg format and don't have a rel of download/homepage. Phillip clarified that pypi served all long-description links at the time to leave it to the tools to interpret them. The interpretation is now pretty clear and so pypi doesn't need to provide them. It shouldn't break neither pip nor easy_install to remove those unused long-description links. > That way we break things gradually, with warnings so that package > managers can react and adapt. I generally agree to this strategy but would add that we should also consider the life of system admins or other package installers who may not be able to get maintainers to make new releases. For me this mainly means to aim for changing defaults in pip and easy_install but not to remove crawling abilities completely for the time being. best, holger From pje at telecommunity.com Sat Mar 2 06:08:47 2013 From: pje at telecommunity.com (PJ Eby) Date: Sat, 2 Mar 2013 00:08:47 -0500 Subject: [Catalog-sig] homepage/download metadata cleaning In-Reply-To: <20130301230220.GM9677@merlinux.eu> References: <20130301101956.GH9677@merlinux.eu> <51308B38.9030709@egenix.com> <71AA0F5ADB4E4C33BBB37833733526A0@gmail.com> <20130301111707.GI9677@merlinux.eu> <51310210.5050203@egenix.com> <20130301230220.GM9677@merlinux.eu> Message-ID: On Fri, Mar 1, 2013 at 6:02 PM, holger krekel wrote: > On Fri, Mar 01, 2013 at 23:50 +0100, Lennart Regebro wrote: >> On Fri, Mar 1, 2013 at 8:31 PM, M.-A. Lemburg wrote: >> > Hmm, then why not remove links that don't match the above from >> > the /simple/ index pages ? >> >> I think we can do that, but if we *start* with that, we will just >> suddenly, with no warning, break everything. >> Its' better if the installation tools can first warn, then remove >> their support for this, and *then* we remove these links from >> /simple/. > > I think Marc-Andre was just refering to the superflous links > from the long-description, namely all links which don't match > the #egg format and don't have a rel of download/homepage. > > Phillip clarified that pypi served all long-description links at the > time to leave it to the tools to interpret them. The interpretation is > now pretty clear and so pypi doesn't need to provide them. It shouldn't > break neither pip nor easy_install to remove those unused long-description > links. Provided, of course, that PyPI follows the *exact same* interpretation of what is and isn't an unused link. Since unused links do no harm, there is correspondingly no benefit to writing code to remove them, that might introduce bugs. To be clear, what I have proposed is simply removing the rel="" attributes from the special links on hidden releases. This will prevent scraping of outdated home pages or download pages, but tools will still be able to use a download or home page link that points to an actual downloadable file or source checkout. What would also be useful to have before that time, would be a tool to let people either update their description links with direct external links, or optionally upload the contents of those links instead... preferably offered via a couple of buttons in PyPI's UI, as well as a standalone tool or setup.py command to initiate the process remotely or as part of a release process. (Preferably, these tools would be offered to authors *before* the date when the rel="" attributes would be pulled from PyPI, of course.) (In principle, we could make it even easier by just automatically scraping the links and adding them to the descriptions (or some new PyPI field for "external download links") of such releases, but I think some kind of affirmative consent is probably in order, just to avoid ruffling any feathers.) Anyway, if the direct external links carry #md5 hashes, they'll be slightly more secure and the "expired domain supplying fake links" issue won't apply. The final step in the process would be to drop the rel="" attributes from *all* releases, not just hidden ones. At that point, it wouldn't be possible to download from an external site unless the author has provided a direct download link, rather than a link to a page containing download links. We could then look at uptake on the use of the pull-uploader, and feedback from package authors, to see whether dropping the remaining external links and serving everything from PyPI is a viable option. From donald.stufft at gmail.com Tue Mar 5 10:01:20 2013 From: donald.stufft at gmail.com (Donald Stufft) Date: Tue, 5 Mar 2013 04:01:20 -0500 Subject: [Catalog-sig] Deprecate External Links In-Reply-To: References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> Message-ID: <774ED93EA7CF45BFB894B8BC47DB7F8B@gmail.com> On Thursday, February 28, 2013 at 8:35 AM, Donald Stufft wrote: > > > > > > > > https://crate.io/externally-hosted/ A list of things that have no files hosted on > PyPI but have a release. This doesn't include things that uploads sometimes > but not everytime (argparse for example the latest releases have not been > uploaded to PyPI). Sorted out a better way of seeing what would be effected by this change. Here is a list of all versions that are currently installable via pip that are not hosted on PyPI (and thus would be uninstallable if all external links would be removed). This filters out projects that never existed or are no longer installable due to issues with the external hosting. I've also included the script I used to generate it. https://gist.github.com/dstufft/5088915 -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald.stufft at gmail.com Tue Mar 5 10:10:08 2013 From: donald.stufft at gmail.com (Donald Stufft) Date: Tue, 5 Mar 2013 04:10:08 -0500 Subject: [Catalog-sig] Deprecate External Links In-Reply-To: <774ED93EA7CF45BFB894B8BC47DB7F8B@gmail.com> References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> <774ED93EA7CF45BFB894B8BC47DB7F8B@gmail.com> Message-ID: <7CA6F385DB0B49D598903CF34287F015@gmail.com> On Tuesday, March 5, 2013 at 4:01 AM, Donald Stufft wrote: > On Thursday, February 28, 2013 at 8:35 AM, Donald Stufft wrote: > > > > > > > https://crate.io/externally-hosted/ A list of things that have no files hosted on > > PyPI but have a release. This doesn't include things that uploads sometimes > > but not everytime (argparse for example the latest releases have not been > > uploaded to PyPI). > > > > Sorted out a better way of seeing what would be effected by this change. > > Here is a list of all versions that are currently installable via pip that > are not hosted on PyPI (and thus would be uninstallable if all external > links would be removed). This filters out projects that never existed > or are no longer installable due to issues with the external hosting. > > I've also included the script I used to generate it. > > https://gist.github.com/dstufft/5088915 Here's some numbers fetched from that data. 928 projects w/ 2750 total versions have versions not installable directly from PyPI. 721 projects w/ 2543 total versions have versions not installable directly from PyPI if we don't consider the `dev` version. This change would affect 2-3% of the projects on PyPI, and just from scanning down the list it appears some of these appear to merely be a forgotten upload and not a conscious choice to not host their packages on PyPI (for example Django has only 1 version not installable directly from PyPI). -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald.stufft at gmail.com Tue Mar 5 10:19:49 2013 From: donald.stufft at gmail.com (Donald Stufft) Date: Tue, 5 Mar 2013 04:19:49 -0500 Subject: [Catalog-sig] Fw: Deprecate External Links In-Reply-To: References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> <774ED93EA7CF45BFB894B8BC47DB7F8B@gmail.com> <5135B6E7.6010301@egenix.com> Message-ID: Forwarding this since I assume it was accidently sent to only me, and it's important to note that there is some sort of miscounting bug going on. Forwarded message: > From: Donald Stufft > To: M.-A. Lemburg > Date: Tuesday, March 5, 2013 4:16:53 AM > Subject: Re: [Catalog-sig] Deprecate External Links > > On Tuesday, March 5, 2013 at 4:12 AM, M.-A. Lemburg wrote: > > Perhaps I'm misunderstanding, but if the list contains packages that: > > > > * are installable via pip > > > > * are not hosted on PyPI > > > > then why isn't e.g. egenix-mx-base included in that list ? > Unsure, must be a bug in the script. I saw some BadStatusLine errors > during the processing but I just assumed they were issues with the server > pip was trying to fetch from. I'll see if I can't sort out the reasoning that > egenix-mx-base doesn't show up. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris at simplistix.co.uk Tue Mar 5 10:51:41 2013 From: chris at simplistix.co.uk (Chris Withers) Date: Tue, 05 Mar 2013 09:51:41 +0000 Subject: [Catalog-sig] revoked certificate error on chrome from PyPI? Message-ID: <5135C02D.3080808@simplistix.co.uk> Hi All, When I go to PyPI on an older Chrome, I get a certificate revoked error and can't view the site. What's going on here? Works fine in newer chromes, but interested to know why older chrome sees a revoked cert... Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From holger at merlinux.eu Tue Mar 5 11:07:34 2013 From: holger at merlinux.eu (holger krekel) Date: Tue, 5 Mar 2013 10:07:34 +0000 Subject: [Catalog-sig] Fw: Deprecate External Links In-Reply-To: References: <813CA10EF6554A019B6FC98A2C9AC2EF@gmail.com> <512EED5E.1080700@zopyx.com> <20130228094343.GY9677@merlinux.eu> <774ED93EA7CF45BFB894B8BC47DB7F8B@gmail.com> <5135B6E7.6010301@egenix.com> Message-ID: <20130305100734.GZ9677@merlinux.eu> On Tue, Mar 05, 2013 at 04:19 -0500, Donald Stufft wrote: > Forwarding this since I assume it was accidently sent to only me, > and it's important to note that there is some sort of miscounting bug > going on. > > > Forwarded message: > > > From: Donald Stufft > > To: M.-A. Lemburg > > Date: Tuesday, March 5, 2013 4:16:53 AM > > Subject: Re: [Catalog-sig] Deprecate External Links > > > > On Tuesday, March 5, 2013 at 4:12 AM, M.-A. Lemburg wrote: > > > Perhaps I'm misunderstanding, but if the list contains packages that: > > > > > > * are installable via pip > > > > > > * are not hosted on PyPI > > > > > > then why isn't e.g. egenix-mx-base included in that list ? > > Unsure, must be a bug in the script. I saw some BadStatusLine errors > > during the processing but I just assumed they were issues with the server > > pip was trying to fetch from. I'll see if I can't sort out the reasoning that > > egenix-mx-base doesn't show up. FYI "lockfile" is also not in your list, and it only had lockfile-0.2 at Pypi, the rest up to 0.9.1 is all at code.google (latest is lockfile-0.9.1.tar.gz). best, holger > > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig From donald.stufft at gmail.com Tue Mar 5 11:18:29 2013 From: donald.stufft at gmail.com (Donald Stufft) Date: Tue, 5 Mar 2013 05:18:29 -0500 Subject: [Catalog-sig] revoked certificate error on chrome from PyPI? In-Reply-To: <5135C02D.3080808@simplistix.co.uk> References: <5135C02D.3080808@simplistix.co.uk> Message-ID: <1610D0657D644BAC8BF80BDEBB7FCF2F@gmail.com> On Tuesday, March 5, 2013 at 4:51 AM, Chris Withers wrote: > Hi All, > > When I go to PyPI on an older Chrome, I get a certificate revoked error > and can't view the site. > > What's going on here? > > Works fine in newer chromes, but interested to know why older chrome > sees a revoked cert... > > Chris What version of Chrome? v25 sees http://d.stufft.io/image/1J3W01473s42 > > -- > Simplistix - Content Management, Batch Processing & Python Consulting > - http://www.simplistix.co.uk > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org (mailto:Catalog-SIG at python.org) > http://mail.python.org/mailman/listinfo/catalog-sig > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris at simplistix.co.uk Tue Mar 5 11:19:46 2013 From: chris at simplistix.co.uk (Chris Withers) Date: Tue, 05 Mar 2013 10:19:46 +0000 Subject: [Catalog-sig] revoked certificate error on chrome from PyPI? In-Reply-To: <1610D0657D644BAC8BF80BDEBB7FCF2F@gmail.com> References: <5135C02D.3080808@simplistix.co.uk> <1610D0657D644BAC8BF80BDEBB7FCF2F@gmail.com> Message-ID: <5135C6C2.7060907@simplistix.co.uk> On 05/03/2013 10:18, Donald Stufft wrote: > On Tuesday, March 5, 2013 at 4:51 AM, Chris Withers wrote: >> When I go to PyPI on an older Chrome, I get a certificate revoked error >> and can't view the site. >> > What version of Chrome? v25 sees http://d.stufft.io/image/1J3W01473s42 12.0.742.112. Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From rasky at develer.com Tue Mar 5 12:09:23 2013 From: rasky at develer.com (Giovanni Bajo) Date: Tue, 5 Mar 2013 12:09:23 +0100 Subject: [Catalog-sig] revoked certificate error on chrome from PyPI? In-Reply-To: <5135C6C2.7060907@simplistix.co.uk> References: <5135C02D.3080808@simplistix.co.uk> <1610D0657D644BAC8BF80BDEBB7FCF2F@gmail.com> <5135C6C2.7060907@simplistix.co.uk> Message-ID: <30D06076-2343-4172-B438-2831F137CC6E@develer.com> Il giorno 05/mar/2013, alle ore 11:19, Chris Withers ha scritto: > On 05/03/2013 10:18, Donald Stufft wrote: >> On Tuesday, March 5, 2013 at 4:51 AM, Chris Withers wrote: >>> When I go to PyPI on an older Chrome, I get a certificate revoked error >>> and can't view the site. >>> >> What version of Chrome? v25 sees http://d.stufft.io/image/1J3W01473s42 > > 12.0.742.112. Do you manage to see any specific error message? Can you attache a screenshot? -- Giovanni Bajo :: rasky at develer.com Develer S.r.l. :: http://www.develer.com My Blog: http://giovanni.bajo.it -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4346 bytes Desc: not available URL: From chris at simplistix.co.uk Tue Mar 5 12:10:18 2013 From: chris at simplistix.co.uk (Chris Withers) Date: Tue, 05 Mar 2013 11:10:18 +0000 Subject: [Catalog-sig] revoked certificate error on chrome from PyPI? In-Reply-To: <30D06076-2343-4172-B438-2831F137CC6E@develer.com> References: <5135C02D.3080808@simplistix.co.uk> <1610D0657D644BAC8BF80BDEBB7FCF2F@gmail.com> <5135C6C2.7060907@simplistix.co.uk> <30D06076-2343-4172-B438-2831F137CC6E@develer.com> Message-ID: <5135D29A.5050103@simplistix.co.uk> On 05/03/2013 11:09, Giovanni Bajo wrote: > Il giorno 05/mar/2013, alle ore 11:19, Chris Withers ha scritto: > >> On 05/03/2013 10:18, Donald Stufft wrote: >>> On Tuesday, March 5, 2013 at 4:51 AM, Chris Withers wrote: >>>> When I go to PyPI on an older Chrome, I get a certificate revoked error >>>> and can't view the site. >>>> >>> What version of Chrome? v25 sees http://d.stufft.io/image/1J3W01473s42 >> >> 12.0.742.112. > > Do you manage to see any specific error message? Can you attache a screenshot? It's the standard "this certificate has been revoked" page from Chrome. Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From mal at egenix.com Tue Mar 5 12:28:29 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 05 Mar 2013 12:28:29 +0100 Subject: [Catalog-sig] revoked certificate error on chrome from PyPI? In-Reply-To: <5135D29A.5050103@simplistix.co.uk> References: <5135C02D.3080808@simplistix.co.uk> <1610D0657D644BAC8BF80BDEBB7FCF2F@gmail.com> <5135C6C2.7060907@simplistix.co.uk> <30D06076-2343-4172-B438-2831F137CC6E@develer.com> <5135D29A.5050103@simplistix.co.uk> Message-ID: <5135D6DD.5030404@egenix.com> On 05.03.2013 12:10, Chris Withers wrote: > On 05/03/2013 11:09, Giovanni Bajo wrote: >> Il giorno 05/mar/2013, alle ore 11:19, Chris Withers ha scritto: >> >>> On 05/03/2013 10:18, Donald Stufft wrote: >>>> On Tuesday, March 5, 2013 at 4:51 AM, Chris Withers wrote: >>>>> When I go to PyPI on an older Chrome, I get a certificate revoked error >>>>> and can't view the site. >>>>> >>>> What version of Chrome? v25 sees http://d.stufft.io/image/1J3W01473s42 >>> >>> 12.0.742.112. >> >> Do you manage to see any specific error message? Can you attache a screenshot? > > It's the standard "this certificate has been revoked" page from Chrome. Hmm... wget http://crl.startssl.com/crt2-crl.crl openssl crl -inform DER -in crt2-crl.crl -text | fgrep 013A4D doesn't return anything (013A4D is the PyPI cert serial). A bug in Chrome ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 05 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From chris at simplistix.co.uk Tue Mar 5 12:31:23 2013 From: chris at simplistix.co.uk (Chris Withers) Date: Tue, 05 Mar 2013 11:31:23 +0000 Subject: [Catalog-sig] revoked certificate error on chrome from PyPI? In-Reply-To: <5135D6DD.5030404@egenix.com> References: <5135C02D.3080808@simplistix.co.uk> <1610D0657D644BAC8BF80BDEBB7FCF2F@gmail.com> <5135C6C2.7060907@simplistix.co.uk> <30D06076-2343-4172-B438-2831F137CC6E@develer.com> <5135D29A.5050103@simplistix.co.uk> <5135D6DD.5030404@egenix.com> Message-ID: <5135D78B.9000000@simplistix.co.uk> On 05/03/2013 11:28, M.-A. Lemburg wrote: > wget http://crl.startssl.com/crt2-crl.crl > openssl crl -inform DER -in crt2-crl.crl -text | fgrep 013A4D > > doesn't return anything (013A4D is the PyPI cert serial). > > A bug in Chrome ? Might be a bug in my head... My machine's time is currently deliberately set to 7 hrs in the past, debugging some weird time of day tests failures that CI has thrown up... How would that cause the cert to appear revoked? Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From rasky at develer.com Tue Mar 5 12:37:51 2013 From: rasky at develer.com (Giovanni Bajo) Date: Tue, 5 Mar 2013 12:37:51 +0100 Subject: [Catalog-sig] revoked certificate error on chrome from PyPI? In-Reply-To: <5135D78B.9000000@simplistix.co.uk> References: <5135C02D.3080808@simplistix.co.uk> <1610D0657D644BAC8BF80BDEBB7FCF2F@gmail.com> <5135C6C2.7060907@simplistix.co.uk> <30D06076-2343-4172-B438-2831F137CC6E@develer.com> <5135D29A.5050103@simplistix.co.uk> <5135D6DD.5030404@egenix.com> <5135D78B.9000000@simplistix.co.uk> Message-ID: Il giorno 05/mar/2013, alle ore 12:31, Chris Withers ha scritto: > On 05/03/2013 11:28, M.-A. Lemburg wrote: >> wget http://crl.startssl.com/crt2-crl.crl >> openssl crl -inform DER -in crt2-crl.crl -text | fgrep 013A4D >> >> doesn't return anything (013A4D is the PyPI cert serial). >> >> A bug in Chrome ? > > Might be a bug in my head... > > My machine's time is currently deliberately set to 7 hrs in the past, debugging some weird time of day tests failures that CI has thrown up... > > How would that cause the cert to appear revoked? it might confuse the CRL code within Chrome 12 due to a bug. I don't think we should worry much. -- Giovanni Bajo :: rasky at develer.com Develer S.r.l. :: http://www.develer.com My Blog: http://giovanni.bajo.it -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4346 bytes Desc: not available URL: From chris at simplistix.co.uk Tue Mar 5 12:41:09 2013 From: chris at simplistix.co.uk (Chris Withers) Date: Tue, 05 Mar 2013 11:41:09 +0000 Subject: [Catalog-sig] revoked certificate error on chrome from PyPI? In-Reply-To: References: <5135C02D.3080808@simplistix.co.uk> <1610D0657D644BAC8BF80BDEBB7FCF2F@gmail.com> <5135C6C2.7060907@simplistix.co.uk> <30D06076-2343-4172-B438-2831F137CC6E@develer.com> <5135D29A.5050103@simplistix.co.uk> <5135D6DD.5030404@egenix.com> <5135D78B.9000000@simplistix.co.uk> Message-ID: <5135D9D5.909@simplistix.co.uk> On 05/03/2013 11:37, Giovanni Bajo wrote: >> Might be a bug in my head... >> >> My machine's time is currently deliberately set to 7 hrs in the past, debugging some weird time of day tests failures that CI has thrown up... >> >> How would that cause the cert to appear revoked? > > it might confuse the CRL code within Chrome 12 due to a bug. I don't think we should worry much. Indeed, sorry for the noise. If I still see anything when I'm finished messing with my system time, I'll let you know... Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From ct at gocept.com Tue Mar 5 16:34:37 2013 From: ct at gocept.com (Christian Theune) Date: Tue, 5 Mar 2013 16:34:37 +0100 Subject: [Catalog-sig] Inconsistency on f.pypi.python.org with Products.PluggableAuthService Message-ID: Hi, it seems my fight to keep f.pypi.python.org is at least keeping the pypi-mirrors.org page happy. Unfortunately one ouf our users detected another inconsistency that the mirror script doesn't find or clean up by itself. I also don't know how to get this back in line. If you compare those pages: http://f.pypi.python.org/packages/source/P/Products.PluggableAuthService/ http://f.pypi.python.org/simple/Products.PluggableAuthService http://pypi.python.org/simple/Products.PluggableAuthService There's definitely something wrong. Suggestions? Christian -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald.stufft at gmail.com Tue Mar 5 17:08:44 2013 From: donald.stufft at gmail.com (Donald Stufft) Date: Tue, 5 Mar 2013 11:08:44 -0500 Subject: [Catalog-sig] Inconsistency on f.pypi.python.org with Products.PluggableAuthService In-Reply-To: References: Message-ID: On Tuesday, March 5, 2013 at 10:34 AM, Christian Theune wrote: > Hi, > > it seems my fight to keep f.pypi.python.org (http://f.pypi.python.org) is at least keeping the pypi-mirrors.org (http://pypi-mirrors.org) page happy. > > Unfortunately one ouf our users detected another inconsistency that the mirror script doesn't find or clean up by itself. I also don't know how to get this back in line. > > If you compare those pages: > > http://f.pypi.python.org/packages/source/P/Products.PluggableAuthService/ > http://f.pypi.python.org/simple/Products.PluggableAuthService > http://pypi.python.org/simple/Products.PluggableAuthService (http://f.pypi.python.org/simple/Products.PluggableAuthService) > > There's definitely something wrong. > > Suggestions? Looks like when something gets deleted the files don't properly get cleaned up, look at: http://a.pypi.python.org/packages/source/P/Products.PluggableAuthService/ > > Christian > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org (mailto:Catalog-SIG at python.org) > http://mail.python.org/mailman/listinfo/catalog-sig > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Fri Mar 8 02:40:20 2013 From: donald at stufft.io (Donald Stufft) Date: Thu, 7 Mar 2013 20:40:20 -0500 Subject: [Catalog-sig] Deprecation of External Urls, Statistics Message-ID: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> So I updated my script (had to remove eventlet) and I believe it's now accurate. The total time was ~54 hours so this is hardly scientific but it should give a good idea what sort of impact we are talking about. This is a list of versions that pip's PackageFinder (what it uses to locate packages to install) could find that were not available on PyPI. The results and script is available at: https://gist.github.com/dstufft/5088915 Some statistics: Projects affected (with dev): 2269 Versions affected (with dev): 8006 Projects affected (without dev): 1880 Versions affected (without dev): 7586 These numbers are if all external urls were immediately removed from PyPI, so this would be the total affected. This does not test if the actual package is installable, just if pip is able to locate an url that it thinks represents a version for that project. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From mal at egenix.com Fri Mar 8 12:49:51 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 08 Mar 2013 12:49:51 +0100 Subject: [Catalog-sig] Deprecation of External Urls, Statistics In-Reply-To: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> Message-ID: <5139D05F.6030404@egenix.com> On 08.03.2013 02:40, Donald Stufft wrote: > So I updated my script (had to remove eventlet) and I believe it's now accurate. The total time was ~54 hours so this is hardly scientific but it should give a good idea what sort of impact we are talking about. > > This is a list of versions that pip's PackageFinder (what it uses to locate packages to install) could find that were not available on PyPI. > > The results and script is available at: https://gist.github.com/dstufft/5088915 > > Some statistics: > > Projects affected (with dev): 2269 > Versions affected (with dev): 8006 > > Projects affected (without dev): 1880 > Versions affected (without dev): 7586 > > These numbers are if all external urls were immediately removed from PyPI, so this would be the total affected. This does not test if the actual package is installable, just if pip is able to locate an url that it thinks represents a version for that project. Thanks for running the test. About 10% of all packages. The numbers are already impressive, but if you factor in the popularity of some of those packages, the situation becomes worse. I'm beginning to wonder whether caching the external link content on the PyPI CDN wouldn't be a better idea. We'd have to make that legally waterproof and also have an opt-out mechanism, but it would get us from here to there a lot faster. Together with the added hash tag on the download file URLs (*), this would solve the availability and the security aspects. Instead of deprecating external links altogether, we could then deprecate non-compliant download links and get an overall very flexible system for Python package distribution. (*) Yes, I know, I still have to deliver the updated proposal - been working on getting our indexes ready to serve as example :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 07 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From christian at python.org Fri Mar 8 13:15:23 2013 From: christian at python.org (Christian Heimes) Date: Fri, 08 Mar 2013 13:15:23 +0100 Subject: [Catalog-sig] hash tags (was: Deprecation of External Urls, Statistics) In-Reply-To: <5139D05F.6030404@egenix.com> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> Message-ID: <5139D65B.3070907@python.org> Am 08.03.2013 12:49, schrieb M.-A. Lemburg: > Together with the added hash tag on the download file URLs (*), > this would solve the availability and the security aspects. > Instead of deprecating external links altogether, we could then > deprecate non-compliant download links and get an overall > very flexible system for Python package distribution. > > (*) Yes, I know, I still have to deliver the updated proposal - > been working on getting our indexes ready to serve as example :-) How does your proposal look like? I like to propose query string-like key/value pairs. key/value pairs are more flexible and allow us to add/remove new information in the future. I also propose that we add the file size in octets (bytes with 8bits in each byte) to the fragment identifier. File size validation prohibits e.g. length extension attacks. It is useful to download tools. I know that HTTP servers usually set a Content-Length header for static files. But the header is set by the CDN while the information in the fragment identifier shall come from PyPI's internal database. Example: defusedxml-0.4.tar.gz#md5=09873c31ce773d48b8a4759571655a2c&sha1=33821e6891e3fc3829f5a238a93490f939533d62&octets=48324 Christian From mal at egenix.com Fri Mar 8 13:50:33 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 08 Mar 2013 13:50:33 +0100 Subject: [Catalog-sig] hash tags In-Reply-To: <5139D65B.3070907@python.org> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> Message-ID: <5139DE99.9020005@egenix.com> On 08.03.2013 13:15, Christian Heimes wrote: > Am 08.03.2013 12:49, schrieb M.-A. Lemburg: >> Together with the added hash tag on the download file URLs (*), >> this would solve the availability and the security aspects. >> Instead of deprecating external links altogether, we could then >> deprecate non-compliant download links and get an overall >> very flexible system for Python package distribution. >> >> (*) Yes, I know, I still have to deliver the updated proposal - >> been working on getting our indexes ready to serve as example :-) > > How does your proposal look like? Here's the first version with the basic idea: http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal After the feedback I got from Holger and Phillip, I'm currently writing a new version, which drops some of the unneeded requirements and spells out a few more things. Here's a very short version... Installers are modified: * to only follow rel="download" links from the /simple/ index page, which have a hash tag (e.g. #md5=...) * will only use the fetched download page if its contents match the hash tag * scan that page for rel="download" links, which again have to have a hash tag to be taken into account * only install files for which the hash tag matches the downloaded content This should provide a good way to make sure that the downloaded files are indeed under control of the package maintainer. So far the only practical problem I've found with the approach is that the download page may not contain dynamic data, e.g. a date or timestamp, since that causes the hash tag not to verify. The package maintainer will also have to reregister the package whenever changes to the download page are made - but that's actually intended :-) > I like to propose query string-like > key/value pairs. key/value pairs are more flexible and allow us to > add/remove new information in the future. Good idea. I'll add that as extension mechanism. > I also propose that we add the file size in octets (bytes with 8bits in > each byte) to the fragment identifier. File size validation prohibits > e.g. length extension attacks. It is useful to download tools. I know > that HTTP servers usually set a Content-Length header for static files. > But the header is set by the CDN while the information in the fragment > identifier shall come from PyPI's internal database. > > Example: > > defusedxml-0.4.tar.gz#md5=09873c31ce773d48b8a4759571655a2c&sha1=33821e6891e3fc3829f5a238a93490f939533d62&octets=48324 Minor nit: s/octets/size We could probably even add GPG sigs to the link. The only problem with the extension mechanism is that the currently available installers only support "#md5=...". Perhaps there's some way to trick them into still working with the query-style fragment links ?! -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 07 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From jnoller at gmail.com Fri Mar 8 14:07:51 2013 From: jnoller at gmail.com (Jesse Noller) Date: Fri, 8 Mar 2013 08:07:51 -0500 Subject: [Catalog-sig] Deprecation of External Urls, Statistics In-Reply-To: <5139D05F.6030404@egenix.com> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> Message-ID: <396F795E-8B6D-4EF3-8B45-08527A04C60E@gmail.com> As long as external URLs eventually are completely removed I'm okay with caching things On Mar 8, 2013, at 6:49 AM, "M.-A. Lemburg" wrote: > On 08.03.2013 02:40, Donald Stufft wrote: >> So I updated my script (had to remove eventlet) and I believe it's now accurate. The total time was ~54 hours so this is hardly scientific but it should give a good idea what sort of impact we are talking about. >> >> This is a list of versions that pip's PackageFinder (what it uses to locate packages to install) could find that were not available on PyPI. >> >> The results and script is available at: https://gist.github.com/dstufft/5088915 >> >> Some statistics: >> >> Projects affected (with dev): 2269 >> Versions affected (with dev): 8006 >> >> Projects affected (without dev): 1880 >> Versions affected (without dev): 7586 >> >> These numbers are if all external urls were immediately removed from PyPI, so this would be the total affected. This does not test if the actual package is installable, just if pip is able to locate an url that it thinks represents a version for that project. > > Thanks for running the test. > > About 10% of all packages. The numbers are already impressive, > but if you factor in the popularity of some of those > packages, the situation becomes worse. > > I'm beginning to wonder whether caching the external link content > on the PyPI CDN wouldn't be a better idea. > > We'd have to make that legally waterproof and also have an opt-out > mechanism, but it would get us from here to there a lot faster. > > Together with the added hash tag on the download file URLs (*), > this would solve the availability and the security aspects. > Instead of deprecating external links altogether, we could then > deprecate non-compliant download links and get an overall > very flexible system for Python package distribution. > > (*) Yes, I know, I still have to deliver the updated proposal - > been working on getting our indexes ready to serve as example :-) > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 07 2013) >>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig From donald at stufft.io Fri Mar 8 14:09:04 2013 From: donald at stufft.io (Donald Stufft) Date: Fri, 8 Mar 2013 08:09:04 -0500 Subject: [Catalog-sig] hash tags In-Reply-To: <5139DE99.9020005@egenix.com> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> Message-ID: Accidentally sent this to only MAL so resending! On Mar 8, 2013, at 7:50 AM, "M.-A. Lemburg" wrote: > On 08.03.2013 13:15, Christian Heimes wrote: >> Am 08.03.2013 12:49, schrieb M.-A. Lemburg: >>> Together with the added hash tag on the download file URLs (*), >>> this would solve the availability and the security aspects. >>> Instead of deprecating external links altogether, we could then >>> deprecate non-compliant download links and get an overall >>> very flexible system for Python package distribution. >>> >>> (*) Yes, I know, I still have to deliver the updated proposal - >>> been working on getting our indexes ready to serve as example :-) >> >> How does your proposal look like? > > Here's the first version with the basic idea: > > http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal > > After the feedback I got from Holger and Phillip, I'm currently > writing a new version, which drops some of the unneeded > requirements and spells out a few more things. > > Here's a very short version... > > Installers are modified: > > * to only follow rel="download" links from the /simple/ index page, > which have a hash tag (e.g. #md5=?) Sounds like a pretty serious break in backwards compat. Only 29 releases out of 144493 currently have a #md5= in their download_url. Either PyPI will be expected to download url and compute a hash (DoS vector, will need to be coded properly) which is error prone and is likely to break in non obvious ways for maintainers. While I'm obviously not against breaking backwards compatibility, I think if we're going to do that we might as well go whole hog and kill external links completely. > * will only use the fetched download page if its contents match > the hash tag > * scan that page for rel="download" links, which again have to > have a hash tag to be taken into account > * only install files for which the hash tag matches the > downloaded content > > This should provide a good way to make sure that the downloaded > files are indeed under control of the package maintainer. > > So far the only practical problem I've found with the approach > is that the download page may not contain dynamic data, e.g. > a date or timestamp, since that causes the hash tag not to > verify. > > The package maintainer will also have to reregister the > package whenever changes to the download page are made - > but that's actually intended :-) > >> I like to propose query string-like >> key/value pairs. key/value pairs are more flexible and allow us to >> add/remove new information in the future. > > Good idea. I'll add that as extension mechanism. > >> I also propose that we add the file size in octets (bytes with 8bits in >> each byte) to the fragment identifier. File size validation prohibits >> e.g. length extension attacks. It is useful to download tools. I know >> that HTTP servers usually set a Content-Length header for static files. >> But the header is set by the CDN while the information in the fragment >> identifier shall come from PyPI's internal database. >> >> Example: >> >> defusedxml-0.4.tar.gz#md5=09873c31ce773d48b8a4759571655a2c&sha1=33821e6891e3fc3829f5a238a93490f939533d62&octets=48324 > > Minor nit: s/octets/size > > We could probably even add GPG sigs to the link. > > The only problem with the extension mechanism is that the currently > available installers only support "#md5=?". pip works just fine with any of the algorithms from hashlib. The installers all also support #egg=, and there might be some others I can't recall offhand. > > Perhaps there's some way to trick them into still working with > the query-style fragment links ?! > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 07 2013) >>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From donald at stufft.io Fri Mar 8 14:13:25 2013 From: donald at stufft.io (Donald Stufft) Date: Fri, 8 Mar 2013 08:13:25 -0500 Subject: [Catalog-sig] Deprecation of External Urls, Statistics In-Reply-To: <396F795E-8B6D-4EF3-8B45-08527A04C60E@gmail.com> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <396F795E-8B6D-4EF3-8B45-08527A04C60E@gmail.com> Message-ID: <5A30A698-71A8-445D-9565-07D5769951BD@stufft.io> On Mar 8, 2013, at 8:07 AM, Jesse Noller wrote: > As long as external URLs eventually are completely removed I'm okay with caching things So I have mixed feelings on caching the urls. I'm not completely against it however it does present a problem of "Well how do we know if the url we are fetching is the accurate url for that package". Downloading and caching them and presenting them the same as if someone uploaded them directly to PyPI loses a point of distinction between "PyPI can verify this is the package that the author intended to release" and "This is something we think that the author releases, maybe, probably?". It does solve the backwards compatibility issue of killing external urls immediately so I'm not flat out against it, but there may be legal issues involved too? > > On Mar 8, 2013, at 6:49 AM, "M.-A. Lemburg" wrote: > >> On 08.03.2013 02:40, Donald Stufft wrote: >>> So I updated my script (had to remove eventlet) and I believe it's now accurate. The total time was ~54 hours so this is hardly scientific but it should give a good idea what sort of impact we are talking about. >>> >>> This is a list of versions that pip's PackageFinder (what it uses to locate packages to install) could find that were not available on PyPI. >>> >>> The results and script is available at: https://gist.github.com/dstufft/5088915 >>> >>> Some statistics: >>> >>> Projects affected (with dev): 2269 >>> Versions affected (with dev): 8006 >>> >>> Projects affected (without dev): 1880 >>> Versions affected (without dev): 7586 >>> >>> These numbers are if all external urls were immediately removed from PyPI, so this would be the total affected. This does not test if the actual package is installable, just if pip is able to locate an url that it thinks represents a version for that project. >> >> Thanks for running the test. >> >> About 10% of all packages. The numbers are already impressive, >> but if you factor in the popularity of some of those >> packages, the situation becomes worse. >> >> I'm beginning to wonder whether caching the external link content >> on the PyPI CDN wouldn't be a better idea. >> >> We'd have to make that legally waterproof and also have an opt-out >> mechanism, but it would get us from here to there a lot faster. >> >> Together with the added hash tag on the download file URLs (*), >> this would solve the availability and the security aspects. >> Instead of deprecating external links altogether, we could then >> deprecate non-compliant download links and get an overall >> very flexible system for Python package distribution. >> >> (*) Yes, I know, I still have to deliver the updated proposal - >> been working on getting our indexes ready to serve as example :-) >> >> -- >> Marc-Andre Lemburg >> eGenix.com >> >> Professional Python Services directly from the Source (#1, Mar 07 2013) >>>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ >> ________________________________________________________________________ >> >> ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: >> >> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >> Registered at Amtsgericht Duesseldorf: HRB 46611 >> http://www.egenix.com/company/contact/ >> _______________________________________________ >> Catalog-SIG mailing list >> Catalog-SIG at python.org >> http://mail.python.org/mailman/listinfo/catalog-sig ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From donald at stufft.io Fri Mar 8 14:18:44 2013 From: donald at stufft.io (Donald Stufft) Date: Fri, 8 Mar 2013 08:18:44 -0500 Subject: [Catalog-sig] Deprecation of External Urls, Statistics In-Reply-To: <5A30A698-71A8-445D-9565-07D5769951BD@stufft.io> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <396F795E-8B6D-4EF3-8B45-08527A04C60E@gmail.com> <5A30A698-71A8-445D-9565-07D5769951BD@stufft.io> Message-ID: <814AE930-BFD3-4507-BFA2-21BC07C4C07A@stufft.io> On Mar 8, 2013, at 8:13 AM, Donald Stufft wrote: > > On Mar 8, 2013, at 8:07 AM, Jesse Noller wrote: > >> As long as external URLs eventually are completely removed I'm okay with caching things > > So I have mixed feelings on caching the urls. I'm not completely against it however it does present a problem of "Well how do we know if the url we are fetching is the accurate url for that package". Downloading and caching them and presenting them the same as if someone uploaded them directly to PyPI loses a point of distinction between "PyPI can verify this is the package that the author intended to release" and "This is something we think that the author releases, maybe, probably?". The distinction can be fixed with a rel="external" or rel="cached" or whatever. I believe all the tools will still find them as downloadable targets and can be adapted to print a warning if that's desired. We *might* be caching a package that has already been replaced by an attacker but by caching and centralizing it we have a better way of removing it once it's found. The legal issues is something we'd probably need to ask VanL? So that's an Ok, Neutral, and Unknown for my 3 major complaints. > > It does solve the backwards compatibility issue of killing external urls immediately so I'm not flat out against it, but there may be legal issues involved too? > >> >> On Mar 8, 2013, at 6:49 AM, "M.-A. Lemburg" wrote: >> >>> On 08.03.2013 02:40, Donald Stufft wrote: >>>> So I updated my script (had to remove eventlet) and I believe it's now accurate. The total time was ~54 hours so this is hardly scientific but it should give a good idea what sort of impact we are talking about. >>>> >>>> This is a list of versions that pip's PackageFinder (what it uses to locate packages to install) could find that were not available on PyPI. >>>> >>>> The results and script is available at: https://gist.github.com/dstufft/5088915 >>>> >>>> Some statistics: >>>> >>>> Projects affected (with dev): 2269 >>>> Versions affected (with dev): 8006 >>>> >>>> Projects affected (without dev): 1880 >>>> Versions affected (without dev): 7586 >>>> >>>> These numbers are if all external urls were immediately removed from PyPI, so this would be the total affected. This does not test if the actual package is installable, just if pip is able to locate an url that it thinks represents a version for that project. >>> >>> Thanks for running the test. >>> >>> About 10% of all packages. The numbers are already impressive, >>> but if you factor in the popularity of some of those >>> packages, the situation becomes worse. >>> >>> I'm beginning to wonder whether caching the external link content >>> on the PyPI CDN wouldn't be a better idea. >>> >>> We'd have to make that legally waterproof and also have an opt-out >>> mechanism, but it would get us from here to there a lot faster. >>> >>> Together with the added hash tag on the download file URLs (*), >>> this would solve the availability and the security aspects. >>> Instead of deprecating external links altogether, we could then >>> deprecate non-compliant download links and get an overall >>> very flexible system for Python package distribution. >>> >>> (*) Yes, I know, I still have to deliver the updated proposal - >>> been working on getting our indexes ready to serve as example :-) >>> >>> -- >>> Marc-Andre Lemburg >>> eGenix.com >>> >>> Professional Python Services directly from the Source (#1, Mar 07 2013) >>>>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ >>> ________________________________________________________________________ >>> >>> ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: >>> >>> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >>> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >>> Registered at Amtsgericht Duesseldorf: HRB 46611 >>> http://www.egenix.com/company/contact/ >>> _______________________________________________ >>> Catalog-SIG mailing list >>> Catalog-SIG at python.org >>> http://mail.python.org/mailman/listinfo/catalog-sig > > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From jnoller at gmail.com Fri Mar 8 14:19:07 2013 From: jnoller at gmail.com (Jesse Noller) Date: Fri, 8 Mar 2013 08:19:07 -0500 Subject: [Catalog-sig] Deprecation of External Urls, Statistics In-Reply-To: <5A30A698-71A8-445D-9565-07D5769951BD@stufft.io> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <396F795E-8B6D-4EF3-8B45-08527A04C60E@gmail.com> <5A30A698-71A8-445D-9565-07D5769951BD@stufft.io> Message-ID: On Mar 8, 2013, at 8:13 AM, Donald Stufft wrote: > > On Mar 8, 2013, at 8:07 AM, Jesse Noller wrote: > >> As long as external URLs eventually are completely removed I'm okay with caching things > > So I have mixed feelings on caching the urls. I'm not completely against it however it does present a problem of "Well how do we know if the url we are fetching is the accurate url for that package". Downloading and caching them and presenting them the same as if someone uploaded them directly to PyPI loses a point of distinction between "PyPI can verify this is the package that the author intended to release" and "This is something we think that the author releases, maybe, probably?". > > It does solve the backwards compatibility issue of killing external urls immediately so I'm not flat out against it, but there may be legal issues involved too? Let them opt out. > >> >> On Mar 8, 2013, at 6:49 AM, "M.-A. Lemburg" wrote: >> >>> On 08.03.2013 02:40, Donald Stufft wrote: >>>> So I updated my script (had to remove eventlet) and I believe it's now accurate. The total time was ~54 hours so this is hardly scientific but it should give a good idea what sort of impact we are talking about. >>>> >>>> This is a list of versions that pip's PackageFinder (what it uses to locate packages to install) could find that were not available on PyPI. >>>> >>>> The results and script is available at: https://gist.github.com/dstufft/5088915 >>>> >>>> Some statistics: >>>> >>>> Projects affected (with dev): 2269 >>>> Versions affected (with dev): 8006 >>>> >>>> Projects affected (without dev): 1880 >>>> Versions affected (without dev): 7586 >>>> >>>> These numbers are if all external urls were immediately removed from PyPI, so this would be the total affected. This does not test if the actual package is installable, just if pip is able to locate an url that it thinks represents a version for that project. >>> >>> Thanks for running the test. >>> >>> About 10% of all packages. The numbers are already impressive, >>> but if you factor in the popularity of some of those >>> packages, the situation becomes worse. >>> >>> I'm beginning to wonder whether caching the external link content >>> on the PyPI CDN wouldn't be a better idea. >>> >>> We'd have to make that legally waterproof and also have an opt-out >>> mechanism, but it would get us from here to there a lot faster. >>> >>> Together with the added hash tag on the download file URLs (*), >>> this would solve the availability and the security aspects. >>> Instead of deprecating external links altogether, we could then >>> deprecate non-compliant download links and get an overall >>> very flexible system for Python package distribution. >>> >>> (*) Yes, I know, I still have to deliver the updated proposal - >>> been working on getting our indexes ready to serve as example :-) >>> >>> -- >>> Marc-Andre Lemburg >>> eGenix.com >>> >>> Professional Python Services directly from the Source (#1, Mar 07 2013) >>>>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ >>> ________________________________________________________________________ >>> >>> ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: >>> >>> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >>> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >>> Registered at Amtsgericht Duesseldorf: HRB 46611 >>> http://www.egenix.com/company/contact/ >>> _______________________________________________ >>> Catalog-SIG mailing list >>> Catalog-SIG at python.org >>> http://mail.python.org/mailman/listinfo/catalog-sig > > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > From mal at egenix.com Fri Mar 8 14:32:20 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 08 Mar 2013 14:32:20 +0100 Subject: [Catalog-sig] hash tags In-Reply-To: References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> Message-ID: <5139E864.8010507@egenix.com> On 08.03.2013 14:09, Donald Stufft wrote: > Accidentally sent this to only MAL so resending! > > On Mar 8, 2013, at 7:50 AM, "M.-A. Lemburg" wrote: > >> On 08.03.2013 13:15, Christian Heimes wrote: >>> Am 08.03.2013 12:49, schrieb M.-A. Lemburg: >>>> Together with the added hash tag on the download file URLs (*), >>>> this would solve the availability and the security aspects. >>>> Instead of deprecating external links altogether, we could then >>>> deprecate non-compliant download links and get an overall >>>> very flexible system for Python package distribution. >>>> >>>> (*) Yes, I know, I still have to deliver the updated proposal - >>>> been working on getting our indexes ready to serve as example :-) >>> >>> How does your proposal look like? >> >> Here's the first version with the basic idea: >> >> http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal >> >> After the feedback I got from Holger and Phillip, I'm currently >> writing a new version, which drops some of the unneeded >> requirements and spells out a few more things. >> >> Here's a very short version... >> >> Installers are modified: >> >> * to only follow rel="download" links from the /simple/ index page, >> which have a hash tag (e.g. #md5=?) > > Sounds like a pretty serious break in backwards compat. Only 29 releases out of 144493 currently have a #md5= in their download_url. Either PyPI will be expected to download url and compute a hash (DoS vector, will need to be coded properly) which is error prone and is likely to break in non obvious ways for maintainers. > > While I'm obviously not against breaking backwards compatibility, I think if we're going to do that we might as well go whole hog and kill external links completely. This was just the main new download theme. If the new scheme doesn't work, they should revert back to the old scheme, after a BIG warning the user. Later on they could switch to requiring users to use an option to reenable the old scheme. In any case, I'll have to put all this into proper words and will then post it for another review cycle. >> * will only use the fetched download page if its contents match >> the hash tag >> * scan that page for rel="download" links, which again have to >> have a hash tag to be taken into account >> * only install files for which the hash tag matches the >> downloaded content >> >> This should provide a good way to make sure that the downloaded >> files are indeed under control of the package maintainer. >> >> So far the only practical problem I've found with the approach >> is that the download page may not contain dynamic data, e.g. >> a date or timestamp, since that causes the hash tag not to >> verify. >> >> The package maintainer will also have to reregister the >> package whenever changes to the download page are made - >> but that's actually intended :-) >> >>> I like to propose query string-like >>> key/value pairs. key/value pairs are more flexible and allow us to >>> add/remove new information in the future. >> >> Good idea. I'll add that as extension mechanism. >> >>> I also propose that we add the file size in octets (bytes with 8bits in >>> each byte) to the fragment identifier. File size validation prohibits >>> e.g. length extension attacks. It is useful to download tools. I know >>> that HTTP servers usually set a Content-Length header for static files. >>> But the header is set by the CDN while the information in the fragment >>> identifier shall come from PyPI's internal database. >>> >>> Example: >>> >>> defusedxml-0.4.tar.gz#md5=09873c31ce773d48b8a4759571655a2c&sha1=33821e6891e3fc3829f5a238a93490f939533d62&octets=48324 >> >> Minor nit: s/octets/size >> >> We could probably even add GPG sigs to the link. >> >> The only problem with the extension mechanism is that the currently >> available installers only support "#md5=?". > > pip works just fine with any of the algorithms from hashlib. The installers all > also support #egg=, and there might be some others I can't recall offhand. Ah, good to know. Thanks. >> >> Perhaps there's some way to trick them into still working with >> the query-style fragment links ?! -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 07 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Fri Mar 8 14:47:14 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 08 Mar 2013 14:47:14 +0100 Subject: [Catalog-sig] hash tags In-Reply-To: <5139DE99.9020005@egenix.com> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> Message-ID: <5139EBE2.9020500@egenix.com> On 08.03.2013 13:50, M.-A. Lemburg wrote: > On 08.03.2013 13:15, Christian Heimes wrote: >> I like to propose query string-like >> key/value pairs. key/value pairs are more flexible and allow us to >> add/remove new information in the future. > > Good idea. I'll add that as extension mechanism. > >> I also propose that we add the file size in octets (bytes with 8bits in >> each byte) to the fragment identifier. File size validation prohibits >> e.g. length extension attacks. It is useful to download tools. I know >> that HTTP servers usually set a Content-Length header for static files. >> But the header is set by the CDN while the information in the fragment >> identifier shall come from PyPI's internal database. >> >> Example: >> >> defusedxml-0.4.tar.gz#md5=09873c31ce773d48b8a4759571655a2c&sha1=33821e6891e3fc3829f5a238a93490f939533d62&octets=48324 > > Minor nit: s/octets/size > > We could probably even add GPG sigs to the link. > > The only problem with the extension mechanism is that the currently > available installers only support "#md5=...". > > Perhaps there's some way to trick them into still working with > the query-style fragment links ?! Too bad... at least distribute/setuptools enforces this: def check_md5(self, cs, info, filename, tfp): if re.match('md5=[0-9a-f]{32}$', info): ... If it weren't for that '$', we'd have no problem. At least distribute currently doesn't check the download links from the /simple/ page at all, so we can use the extension mechanism there without breaking older versions of the tools. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 07 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Fri Mar 8 15:00:40 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 8 Mar 2013 14:00:40 +0000 (UTC) Subject: [Catalog-sig] Search engine relevance Message-ID: Hello, It seems the PyPI search engine is quite crude and doesn't try to make the results relevant at all. For example, if I'm trying to search "agi" in the hope of finding modules relevant to the Asterisk Gateway Interface (nicknamed "AGI"), I get the following results: https://pypi.python.org/pypi?%3Aaction=search&term=agi&submit=search As you can see, a large number of results pop up simply because they contain the word "magic", which apparently is considered to match the "agi" request. Clearly either the selection or the weighting algorithm isn't very efficient here. Regards Antoine. From jacob at jacobian.org Fri Mar 8 15:51:05 2013 From: jacob at jacobian.org (Jacob Kaplan-Moss) Date: Fri, 8 Mar 2013 08:51:05 -0600 Subject: [Catalog-sig] Search engine relevance In-Reply-To: References: Message-ID: Hi Antoine - Yes, PyPI's search engine is rather simplistic, I think that's a pretty well-known problem. For the time being you might try Crate instead (crate.io); I've found its search engine to be much much better. Jacob On Fri, Mar 8, 2013 at 8:00 AM, Antoine Pitrou wrote: > > Hello, > > It seems the PyPI search engine is quite crude and doesn't try to make the > results relevant at all. > For example, if I'm trying to search "agi" in the hope of finding modules > relevant to the Asterisk Gateway Interface (nicknamed "AGI"), I get the > following results: > > https://pypi.python.org/pypi?%3Aaction=search&term=agi&submit=search > > As you can see, a large number of results pop up simply because they contain > the word "magic", which apparently is considered to match the "agi" request. > Clearly either the selection or the weighting algorithm isn't very efficient > here. > > Regards > > Antoine. > > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig From ubershmekel at gmail.com Fri Mar 8 16:03:32 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Fri, 8 Mar 2013 07:03:32 -0800 Subject: [Catalog-sig] Search engine relevance In-Reply-To: References: Message-ID: https://crate.io/?has_releases=on&q=agi No results found. On Fri, Mar 8, 2013 at 6:51 AM, Jacob Kaplan-Moss wrote: > Hi Antoine - > > Yes, PyPI's search engine is rather simplistic, I think that's a > pretty well-known problem. > > For the time being you might try Crate instead (crate.io); I've found > its search engine to be much much better. > > Jacob > > On Fri, Mar 8, 2013 at 8:00 AM, Antoine Pitrou > wrote: > > > > Hello, > > > > It seems the PyPI search engine is quite crude and doesn't try to make > the > > results relevant at all. > > For example, if I'm trying to search "agi" in the hope of finding modules > > relevant to the Asterisk Gateway Interface (nicknamed "AGI"), I get the > > following results: > > > > https://pypi.python.org/pypi?%3Aaction=search&term=agi&submit=search > > > > As you can see, a large number of results pop up simply because they > contain > > the word "magic", which apparently is considered to match the "agi" > request. > > Clearly either the selection or the weighting algorithm isn't very > efficient > > here. > > > > Regards > > > > Antoine. > > > > > > _______________________________________________ > > Catalog-SIG mailing list > > Catalog-SIG at python.org > > http://mail.python.org/mailman/listinfo/catalog-sig > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Fri Mar 8 16:22:39 2013 From: donald at stufft.io (Donald Stufft) Date: Fri, 8 Mar 2013 10:22:39 -0500 Subject: [Catalog-sig] Search engine relevance In-Reply-To: References: Message-ID: <77B5D8F8-7334-441E-A987-17FAD068D90C@stufft.io> On Mar 8, 2013, at 9:51 AM, Jacob Kaplan-Moss wrote: > Hi Antoine - > > Yes, PyPI's search engine is rather simplistic, I think that's a > pretty well-known problem. > > For the time being you might try Crate instead (crate.io); I've found > its search engine to be much much better. Crate's search uses ElasticSearch whereas I believe PyPI is just using SQL against the DB. That being said Crate's search could be a lot better still :/ But I'm not an expert on how to get the best search results. > > Jacob > > On Fri, Mar 8, 2013 at 8:00 AM, Antoine Pitrou wrote: >> >> Hello, >> >> It seems the PyPI search engine is quite crude and doesn't try to make the >> results relevant at all. >> For example, if I'm trying to search "agi" in the hope of finding modules >> relevant to the Asterisk Gateway Interface (nicknamed "AGI"), I get the >> following results: >> >> https://pypi.python.org/pypi?%3Aaction=search&term=agi&submit=search >> >> As you can see, a large number of results pop up simply because they contain >> the word "magic", which apparently is considered to match the "agi" request. >> Clearly either the selection or the weighting algorithm isn't very efficient >> here. >> >> Regards >> >> Antoine. >> >> >> _______________________________________________ >> Catalog-SIG mailing list >> Catalog-SIG at python.org >> http://mail.python.org/mailman/listinfo/catalog-sig > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From solipsis at pitrou.net Fri Mar 8 16:24:00 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 8 Mar 2013 15:24:00 +0000 (UTC) Subject: [Catalog-sig] Search engine relevance References: Message-ID: Yuval Greenfield gmail.com> writes: > > https://crate.io/?has_releases=on&q=agi > > No results found. Thanks for the answers. Yes, crate.io is at least missing pyst2 which does mention AGI in its description: https://crate.io/packages/pyst2/ (pyst2 is rather unmaintained, but that shouldn't matter a lot here :-)) Regards Antoine. From donald at stufft.io Fri Mar 8 16:28:06 2013 From: donald at stufft.io (Donald Stufft) Date: Fri, 8 Mar 2013 10:28:06 -0500 Subject: [Catalog-sig] Search engine relevance In-Reply-To: References: Message-ID: <6EE76A9E-FF3B-48E5-9370-3153A7C51561@stufft.io> On Mar 8, 2013, at 10:24 AM, Antoine Pitrou wrote: > Yuval Greenfield gmail.com> writes: >> >> https://crate.io/?has_releases=on&q=agi >> >> No results found. > > Thanks for the answers. > Yes, crate.io is at least missing pyst2 which does mention AGI in its > description: > https://crate.io/packages/pyst2/ So it comes up when you search for "asterisk" https://crate.io/?q=asterisk&has_releases=on however that is less than optimal. Basically the long_description isn't currently included in indexing for Crate because it trashed the search relevancy and I was unable to (with my limited experience in searches) come up with a method here that didn't trash the overall relevancy. > > (pyst2 is rather unmaintained, but that shouldn't matter a lot here :-)) > > Regards > > Antoine. > > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From ubershmekel at gmail.com Fri Mar 8 16:29:44 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Fri, 8 Mar 2013 07:29:44 -0800 Subject: [Catalog-sig] Search engine relevance In-Reply-To: References: Message-ID: On Fri, Mar 8, 2013 at 7:24 AM, Antoine Pitrou wrote: > Yes, crate.io is at least missing pyst2 which does mention AGI in its > description: > https://crate.io/packages/pyst2/ > > > I agree. There's only one effective search engine for pypi I know of, e.g. https://www.google.com/search?q=site%3Apypi.python.org+agi -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Fri Mar 8 20:16:57 2013 From: pje at telecommunity.com (PJ Eby) Date: Fri, 8 Mar 2013 14:16:57 -0500 Subject: [Catalog-sig] hash tags In-Reply-To: <5139DE99.9020005@egenix.com> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> Message-ID: On Fri, Mar 8, 2013 at 7:50 AM, M.-A. Lemburg wrote: > After the feedback I got from Holger and Phillip, I'm currently > writing a new version, which drops some of the unneeded > requirements and spells out a few more things. > > Here's a very short version... > > Installers are modified: > > * to only follow rel="download" links from the /simple/ index page, > which have a hash tag (e.g. #md5=...) > * will only use the fetched download page if its contents match > the hash tag > * scan that page for rel="download" links, which again have to > have a hash tag to be taken into account > * only install files for which the hash tag matches the > downloaded content > > This should provide a good way to make sure that the downloaded > files are indeed under control of the package maintainer. There is, as I said before, a MUCH simpler way to do this, that works right now: put direct #md5 download links in your description, and phase out the rel="" attributes altogether. The key to making this transition isn't creating elaborate new standards for the tools, it's *creating new tools for the standards*. Specifically, *migration tools*. A migration tool could be made that scans existing external links and converts found links to #md5 links or alternately uploads the files themselves to PyPI. You can do that without changing pip or distribute or anything else but PyPI, so there's no need to wait out update cycles to take advantage. Once a project/version has switched to either #md5 links or PyPI copies, you can just drop the rel="" attributes and you're done. Alternately, if using the description for download links is considered a bad idea, add a new field to PyPI for them. Point is, this entire thing can be done correctly at the PyPI end and work with the existing API of the download tools. > So far the only practical problem I've found with the approach > is that the download page may not contain dynamic data, e.g. > a date or timestamp, since that causes the hash tag not to > verify. Which is completely unnecessary if one simply exposes the *actual* download links directly on PyPI. The download page is redundant, in a couple different ways. First, since it can't change, there's no point in re-fetching it all the time. Second, since it's only going to be read by tools anyway, there's no point to it containing anything besides the link. So, since the page only contains links, might as well put the links straight on PyPI, or at most have an option/tool to load the links from an external source. Again, the key to making this work is going to be somebody putting buttons in the PyPI interface (and making setuptools/distutils commands or similar CLI tools) to migrate their files (or links to the files) to PyPI hosting. A new API for such tools is entirely unnecessary -- at most there might need to be a new field made available/accessible. (Personally I don't care if your download links have to be in the description field if you're hosting off-site, but that's just me.) From noah at coderanger.net Fri Mar 8 20:52:33 2013 From: noah at coderanger.net (Noah Kantrowitz) Date: Fri, 8 Mar 2013 11:52:33 -0800 Subject: [Catalog-sig] hash tags In-Reply-To: <5139DE99.9020005@egenix.com> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> Message-ID: On Mar 8, 2013, at 4:50 AM, M.-A. Lemburg wrote: > On 08.03.2013 13:15, Christian Heimes wrote: >> Am 08.03.2013 12:49, schrieb M.-A. Lemburg: >>> Together with the added hash tag on the download file URLs (*), >>> this would solve the availability and the security aspects. >>> Instead of deprecating external links altogether, we could then >>> deprecate non-compliant download links and get an overall >>> very flexible system for Python package distribution. >>> >>> (*) Yes, I know, I still have to deliver the updated proposal - >>> been working on getting our indexes ready to serve as example :-) >> >> How does your proposal look like? > > Here's the first version with the basic idea: > > http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal > > After the feedback I got from Holger and Phillip, I'm currently > writing a new version, which drops some of the unneeded > requirements and spells out a few more things. > > Here's a very short version... > > Installers are modified: > > * to only follow rel="download" links from the /simple/ index page, > which have a hash tag (e.g. #md5=...) > * will only use the fetched download page if its contents match > the hash tag > * scan that page for rel="download" links, which again have to > have a hash tag to be taken into account > * only install files for which the hash tag matches the > downloaded content > > This should provide a good way to make sure that the downloaded > files are indeed under control of the package maintainer. MD5 is _not_ acceptable for anything security related and we shouldn't be adding anything that increases our dependence on it. MD5's only use in the packaging world is to make people who forget that TCP has its own checksums feel all warm and fuzzy that there hasn't been _accidental_ download corruption. --Noah -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 203 bytes Desc: Message signed with OpenPGP using GPGMail URL: From pje at telecommunity.com Fri Mar 8 20:54:28 2013 From: pje at telecommunity.com (PJ Eby) Date: Fri, 8 Mar 2013 14:54:28 -0500 Subject: [Catalog-sig] Deprecation of External Urls, Statistics In-Reply-To: <5A30A698-71A8-445D-9565-07D5769951BD@stufft.io> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <396F795E-8B6D-4EF3-8B45-08527A04C60E@gmail.com> <5A30A698-71A8-445D-9565-07D5769951BD@stufft.io> Message-ID: On Fri, Mar 8, 2013 at 8:13 AM, Donald Stufft wrote: > It does solve the backwards compatibility issue of killing external urls immediately so I'm not flat out against it, but there may be legal issues involved too? I've mentioned this in the other thread as well, but the best way to actually ensure this stuff gets moved over to PyPI is to make it *easy*. Give developers a button to click on PyPI that fetches all their external links (requiring first that you give matching MD5 or other checksums) and uploads them to PyPI, and a whole bunch of those projects are likely to be okay with clicking it a few times. A command-line tool to do it (especially as a distutils/setuptools command) would be a good idea, too. Of the tiny minority of remaining people who object to PyPI hosting for some reason other than convenience/familiarity (e.g. MAL's licensing objection), it will likely be sufficient to provide an option to add #md5 links to their description, in lieu of actual rehosting. FWIW, it's hard to get people to change behavior when one condemns that behavior as unlikeable or socially undesirable, because it means one is less likely to consider the other person's motivations, needs, etc., and on top of that, the other person's resistance and rebellion are stirred up by being the subject of one's disapproval. So please, let's all stop talking about ways to work around the package authors and project maintainers, or how to force them into doing our bidding, and start talking instead about how to make it *easy* and *obvious* for them to do what we want. (And people who think it's already easy and obvious enough, so those 10% of projects must be stupid, will obviously not have anything positive to contribute.) So let me kick off that discussion with a list of known-so-far use cases for external hosting, in descending order of my extremely rough guesstimate of frequency: * Always did it that way, never saw a reason to change, or didn't know you could upload to PyPI * Lots of files that are currently generated on the system where they're hosted, or in an automated system that would need significant rework to support PyPI * Development snapshots (which may in fact be depended upon by other in-development projects, so manual URL specification doesn't help here) * Had an issue w/PyPI availability in the past * Objectors to PyPI's licensing requirements Automation is aimed at the first two: make it easy enough, w/a carrot and a stick ("external link spidering is going away, you have to put either the links or the files on PyPI directly if you want them found"), and a lot of people will move (assuming they're actually still maintaining their project). Development snapshots are an interesting case, because one of the reasons they're valuable is that PyPI's existing multi-release behavior is a major PITA. You can't upload a new version of something without PyPI creating a new release for it... and automatically hiding all your previous releases, including your stable release. There's a lot that would have to be done to PyPI's release management before it would actually be sane to track such releases there. So the obvious fix is to do nothing; such links being external doesn't hurt availability for people that don't depend on them (unlike rel=homepage/download links). The last two issues are education/persuasion problems that won't be affected by technology changes. Does anybody know of any other use cases for the thousands of projects and releases relying on external link discovery spidering? (Disparaging remarks about why a particular use case is bad, no good, makes you go blind, etc. need not apply: they serve only to show that the person providing the opinion lacks sufficient empathy with the target audience to be *useful* in a discussion of how to persuade that target audience to behave differently.) From donald at stufft.io Fri Mar 8 21:06:17 2013 From: donald at stufft.io (Donald Stufft) Date: Fri, 8 Mar 2013 15:06:17 -0500 Subject: [Catalog-sig] Deprecation of External Urls, Statistics In-Reply-To: References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <396F795E-8B6D-4EF3-8B45-08527A04C60E@gmail.com> <5A30A698-71A8-445D-9565-07D5769951BD@stufft.io> Message-ID: <83DDC321-9809-4E08-B0DC-3159A15130DA@stufft.io> On Mar 8, 2013, at 2:54 PM, PJ Eby wrote: > On Fri, Mar 8, 2013 at 8:13 AM, Donald Stufft wrote: >> It does solve the backwards compatibility issue of killing external urls immediately so I'm not flat out against it, but there may be legal issues involved too? > > I've mentioned this in the other thread as well, but the best way to > actually ensure this stuff gets moved over to PyPI is to make it > *easy*. Give developers a button to click on PyPI that fetches all > their external links (requiring first that you give matching MD5 or > other checksums) and uploads them to PyPI, and a whole bunch of those > projects are likely to be okay with clicking it a few times. A > command-line tool to do it (especially as a distutils/setuptools > command) would be a good idea, too. Tooling is the easy part. I've already volunteered to write a PR to add this functionality to PyPI, maybe with a mail out for maximal conversion. > > Of the tiny minority of remaining people who object to PyPI hosting > for some reason other than convenience/familiarity (e.g. MAL's > licensing objection), it will likely be sufficient to provide an > option to add #md5 links to their description, in lieu of actual > rehosting. Keeping the ability to include external links lowers the overall effectiveness of the service in uptime and privacy. MD5 hashes are also unacceptable as a secure hash but that's another argument. > > FWIW, it's hard to get people to change behavior when one condemns > that behavior as unlikeable or socially undesirable, because it means > one is less likely to consider the other person's motivations, needs, > etc., and on top of that, the other person's resistance and rebellion > are stirred up by being the subject of one's disapproval. > > So please, let's all stop talking about ways to work around the > package authors and project maintainers, or how to force them into > doing our bidding, and start talking instead about how to make it > *easy* and *obvious* for them to do what we want. > > (And people who think it's already easy and obvious enough, so those > 10% of projects must be stupid, will obviously not have anything > positive to contribute.) > > So let me kick off that discussion with a list of known-so-far use > cases for external hosting, in descending order of my extremely rough > guesstimate of frequency: > > * Always did it that way, never saw a reason to change, or didn't know > you could upload to PyPI > * Lots of files that are currently generated on the system where > they're hosted, or in an automated system that would need significant > rework to support PyPI > * Development snapshots (which may in fact be depended upon by other > in-development projects, so manual URL specification doesn't help > here) > * Had an issue w/PyPI availability in the past > * Objectors to PyPI's licensing requirements > > Automation is aimed at the first two: make it easy enough, w/a carrot > and a stick ("external link spidering is going away, you have to put > either the links or the files on PyPI directly if you want them > found"), and a lot of people will move (assuming they're actually > still maintaining their project). > > Development snapshots are an interesting case, because one of the > reasons they're valuable is that PyPI's existing multi-release > behavior is a major PITA. You can't upload a new version of something > without PyPI creating a new release for it... and automatically > hiding all your previous releases, including your stable release. > There's a lot that would have to be done to PyPI's release management > before it would actually be sane to track such releases there. So the > obvious fix is to do nothing; such links being external doesn't hurt > availability for people that don't depend on them (unlike > rel=homepage/download links). This is false, PyPI has a toggle to turn off the automatic hiding by default. However PyPI does need an option to prefer stable for what it uses as the default release when you visit a page in the Web UI. If you're going to release a snapshot to PyPI you _should_ need to create a new release for it. > > The last two issues are education/persuasion problems that won't be > affected by technology changes. > > Does anybody know of any other use cases for the thousands of projects > and releases relying on external link discovery spidering? > > (Disparaging remarks about why a particular use case is bad, no good, > makes you go blind, etc. need not apply: they serve only to show that > the person providing the opinion lacks sufficient empathy with the > target audience to be *useful* in a discussion of how to persuade that > target audience to behave differently.) ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From mal at egenix.com Fri Mar 8 22:11:38 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 08 Mar 2013 22:11:38 +0100 Subject: [Catalog-sig] hash tags In-Reply-To: References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> Message-ID: <513A540A.1010703@egenix.com> On 08.03.2013 20:52, Noah Kantrowitz wrote: > > On Mar 8, 2013, at 4:50 AM, M.-A. Lemburg wrote: > >> On 08.03.2013 13:15, Christian Heimes wrote: >>> Am 08.03.2013 12:49, schrieb M.-A. Lemburg: >>>> Together with the added hash tag on the download file URLs (*), >>>> this would solve the availability and the security aspects. >>>> Instead of deprecating external links altogether, we could then >>>> deprecate non-compliant download links and get an overall >>>> very flexible system for Python package distribution. >>>> >>>> (*) Yes, I know, I still have to deliver the updated proposal - >>>> been working on getting our indexes ready to serve as example :-) >>> >>> How does your proposal look like? >> >> Here's the first version with the basic idea: >> >> http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal >> >> After the feedback I got from Holger and Phillip, I'm currently >> writing a new version, which drops some of the unneeded >> requirements and spells out a few more things. >> >> Here's a very short version... >> >> Installers are modified: >> >> * to only follow rel="download" links from the /simple/ index page, >> which have a hash tag (e.g. #md5=...) >> * will only use the fetched download page if its contents match >> the hash tag >> * scan that page for rel="download" links, which again have to >> have a hash tag to be taken into account >> * only install files for which the hash tag matches the >> downloaded content >> >> This should provide a good way to make sure that the downloaded >> files are indeed under control of the package maintainer. > > MD5 is _not_ acceptable for anything security related and we shouldn't be adding anything that increases our dependence on it. MD5's only use in the packaging world is to make people who forget that TCP has its own checksums feel all warm and fuzzy that there hasn't been _accidental_ download corruption. I was only using the existing md5 hash tags as example. Tools should migrate to support all hashlib algorithms (pip already does), so the hash tag can be e.g. #sha1=... or #sha256=... For Python 2.4 only md5 and sha1 would work, since it didn't come with a hashlib module. With the extension mechanism Christian proposed, we can also add all sorts of other things as well, e.g. size indications, GPG key ID, GPG sigs, etc. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 07 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From pje at telecommunity.com Fri Mar 8 22:12:05 2013 From: pje at telecommunity.com (PJ Eby) Date: Fri, 8 Mar 2013 16:12:05 -0500 Subject: [Catalog-sig] hash tags In-Reply-To: References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> Message-ID: On Fri, Mar 8, 2013 at 2:52 PM, Noah Kantrowitz wrote: > MD5 is _not_ acceptable for anything security related and we shouldn't be adding anything that increases our dependence on it. MD5's only use in the packaging world is to make people who forget that TCP has its own checksums feel all warm and fuzzy that there hasn't been _accidental_ download corruption. So, you're saying that someone has found a second-preimage attack against MD5 that's more efficient than the current 2**127 threshold established in 2009? "Anything security related" is pretty broad. Out of the many classes of attacks on hashes, AFAIK the only class that's relevant to PyPI is second preimage attacks, i.e. one where the attacker has the original file and the hash, and must construct a new file that produces the same hash value. Did you have some other type of hash attack in mind? And in either case, do you have a referent for the attack complexity? From mal at egenix.com Fri Mar 8 22:17:41 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 08 Mar 2013 22:17:41 +0100 Subject: [Catalog-sig] hash tags In-Reply-To: References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> Message-ID: <513A5575.5000200@egenix.com> On 08.03.2013 20:16, PJ Eby wrote: > On Fri, Mar 8, 2013 at 7:50 AM, M.-A. Lemburg wrote: >> After the feedback I got from Holger and Phillip, I'm currently >> writing a new version, which drops some of the unneeded >> requirements and spells out a few more things. >> >> Here's a very short version... >> >> Installers are modified: >> >> * to only follow rel="download" links from the /simple/ index page, >> which have a hash tag (e.g. #md5=...) >> * will only use the fetched download page if its contents match >> the hash tag >> * scan that page for rel="download" links, which again have to >> have a hash tag to be taken into account >> * only install files for which the hash tag matches the >> downloaded content >> >> This should provide a good way to make sure that the downloaded >> files are indeed under control of the package maintainer. > > There is, as I said before, a MUCH simpler way to do this, that works > right now: put direct #md5 download links in your description, and > phase out the rel="" attributes altogether. No, that would be a pretty poor design :-) The rel="" attributes are good design, since they were meant for exactly this purpose (machine reading and understanding relations between origin and target). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 07 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From donald at stufft.io Fri Mar 8 22:26:07 2013 From: donald at stufft.io (Donald Stufft) Date: Fri, 8 Mar 2013 16:26:07 -0500 Subject: [Catalog-sig] hash tags In-Reply-To: References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> Message-ID: On Mar 8, 2013, at 4:12 PM, PJ Eby wrote: > On Fri, Mar 8, 2013 at 2:52 PM, Noah Kantrowitz wrote: >> MD5 is _not_ acceptable for anything security related and we shouldn't be adding anything that increases our dependence on it. MD5's only use in the packaging world is to make people who forget that TCP has its own checksums feel all warm and fuzzy that there hasn't been _accidental_ download corruption. > > So, you're saying that someone has found a second-preimage attack > against MD5 that's more efficient than the current 2**127 threshold > established in 2009? > > "Anything security related" is pretty broad. Out of the many classes > of attacks on hashes, AFAIK the only class that's relevant to PyPI is > second preimage attacks, i.e. one where the attacker has the original > file and the hash, and must construct a new file that produces the > same hash value. Relevant to PyPI is pretty broad, and when you're developing a secure system you need to look past what is ok *today* and design for the next 5, 10, or 20 years. So even if there's no attack that can directly allow replacing the target file with a new one, continuing to utilize it is bad. It has a number of weaknesses which do not install confidence in its future security meanwhile there are a number of other hashes which _do_. Unless you'd rather be trying to replace hashes everywhere once it's already completely broken. > > Did you have some other type of hash attack in mind? And in either > case, do you have a referent for the attack complexity? > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From r1chardj0n3s at gmail.com Fri Mar 8 22:26:54 2013 From: r1chardj0n3s at gmail.com (Richard Jones) Date: Sat, 9 Mar 2013 08:26:54 +1100 Subject: [Catalog-sig] Search engine relevance In-Reply-To: References: Message-ID: That *was* the original search engine :-) Then after user complaints we devised a better solution... Always happy to take criticism of it and improve it! :-) Sent from my portable device, please excuse the brevity. On Mar 9, 2013 2:29 AM, "Yuval Greenfield" wrote: > On Fri, Mar 8, 2013 at 7:24 AM, Antoine Pitrou wrote: > >> Yes, crate.io is at least missing pyst2 which does mention AGI in its >> description: >> https://crate.io/packages/pyst2/ >> >> >> > I agree. There's only one effective search engine for pypi I know of, e.g. > > https://www.google.com/search?q=site%3Apypi.python.org+agi > > > > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Fri Mar 8 22:28:31 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 08 Mar 2013 22:28:31 +0100 Subject: [Catalog-sig] hash tags In-Reply-To: References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> Message-ID: <513A57FF.6000905@egenix.com> On 08.03.2013 20:16, PJ Eby wrote: > On Fri, Mar 8, 2013 at 7:50 AM, M.-A. Lemburg wrote: >> So far the only practical problem I've found with the approach >> is that the download page may not contain dynamic data, e.g. >> a date or timestamp, since that causes the hash tag not to >> verify. > > Which is completely unnecessary if one simply exposes the *actual* > download links directly on PyPI. The download page is redundant, in a > couple different ways. First, since it can't change, there's no point > in re-fetching it all the time. Second, since it's only going to be > read by tools anyway, there's no point to it containing anything > besides the link. > > So, since the page only contains links, might as well put the links > straight on PyPI, or at most have an option/tool to load the links > from an external source. I don't follow you. We only have a single download_url field available to store a download link. We'd need to modify the meta data format to allow for more than one such field, which doesn't work if you want to stay backwards compatible. BTW: If we go with the CDN caching model for external files, we'd pull the download page links directly on the /simple/ index page - as files, not external links. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 07 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From donald at stufft.io Fri Mar 8 22:32:21 2013 From: donald at stufft.io (Donald Stufft) Date: Fri, 8 Mar 2013 16:32:21 -0500 Subject: [Catalog-sig] hash tags In-Reply-To: References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> Message-ID: <8A3002A9-5E2B-4D38-BABD-9253A027E7F6@stufft.io> On Mar 8, 2013, at 4:12 PM, PJ Eby wrote: > On Fri, Mar 8, 2013 at 2:52 PM, Noah Kantrowitz wrote: >> MD5 is _not_ acceptable for anything security related and we shouldn't be adding anything that increases our dependence on it. MD5's only use in the packaging world is to make people who forget that TCP has its own checksums feel all warm and fuzzy that there hasn't been _accidental_ download corruption. > > So, you're saying that someone has found a second-preimage attack > against MD5 that's more efficient than the current 2**127 threshold > established in 2009? > > "Anything security related" is pretty broad. Out of the many classes > of attacks on hashes, AFAIK the only class that's relevant to PyPI is > second preimage attacks, i.e. one where the attacker has the original > file and the hash, and must construct a new file that produces the > same hash value. > > Did you have some other type of hash attack in mind? And in either > case, do you have a referent for the attack complexity? > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig Here's some more information pulled straight from Wikiepdia: However, it has since been shown that MD5 is not collision resistant;[3] as such, MD5 is not suitable for applications like SSL certificates or digital signatures that rely on this property. In 1996, a flaw was found with the design of MD5, and while it was not a clearly fatal weakness, cryptographers began recommending the use of other algorithms, such as SHA-1?which has since been found to be vulnerable as well. In 2004, more serious flaws were discovered in MD5, making further use of the algorithm for security purposes questionable?specifically, a group of researchers described how to create a pair of files that share the same MD5 checksum.[4][5] Further advances were made in breaking MD5 in 2005, 2006, and 2007.[6] In December 2008, a group of researchers used this technique to fake SSL certificate validity,[7][8] and CMU Software Engineering Institute now says that MD5 "should be considered cryptographically broken and unsuitable for further use",[9] and most U.S. government applications now require the SHA-2 family of hash functions.[10] Here's the important highlights: - specifically, a group of researchers described how to create a pair of files that share the same MD5 checksum - MD5 "should be considered cryptographically broken and unsuitable for further use" ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From donald at stufft.io Fri Mar 8 22:33:44 2013 From: donald at stufft.io (Donald Stufft) Date: Fri, 8 Mar 2013 16:33:44 -0500 Subject: [Catalog-sig] hash tags In-Reply-To: <513A57FF.6000905@egenix.com> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> <513A57FF.6000905@egenix.com> Message-ID: <539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io> On Mar 8, 2013, at 4:28 PM, "M.-A. Lemburg" wrote: > BTW: If we go with the CDN caching model for external files, we'd > pull the download page links directly on the /simple/ index > page - as files, not external links. We cannot download and rehost (even if we call it a cache) external files without getting permission from their owners to do so. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From noah at coderanger.net Fri Mar 8 22:35:50 2013 From: noah at coderanger.net (Noah Kantrowitz) Date: Fri, 8 Mar 2013 13:35:50 -0800 Subject: [Catalog-sig] hash tags In-Reply-To: <539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> <513A57FF.6000905@egenix.com> <539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io> Message-ID: On Mar 8, 2013, at 1:33 PM, Donald Stufft wrote: > On Mar 8, 2013, at 4:28 PM, "M.-A. Lemburg" wrote: > >> BTW: If we go with the CDN caching model for external files, we'd >> pull the download page links directly on the /simple/ index >> page - as files, not external links. > > We cannot download and rehost (even if we call it a cache) external files without getting permission from their owners to do so. At which point, they can just upload them the normal way. --Noah -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 203 bytes Desc: Message signed with OpenPGP using GPGMail URL: From dholth at gmail.com Fri Mar 8 22:43:59 2013 From: dholth at gmail.com (Daniel Holth) Date: Fri, 8 Mar 2013 16:43:59 -0500 Subject: [Catalog-sig] hash tags In-Reply-To: <539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> <513A57FF.6000905@egenix.com> <539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io> Message-ID: Check out https://blake2.net/ ; it is both faster and more secure than md5. md5 does have to go, no matter how secure it is in this particular application. SHA2 is the only choice that doesn't require a long explanation. When this came up a little less than a year ago we talked about maybe including the SHA2 hash in one of the link attributes for the benefit of old clients. From mal at egenix.com Fri Mar 8 22:45:14 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 08 Mar 2013 22:45:14 +0100 Subject: [Catalog-sig] hash tags In-Reply-To: <539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> <513A57FF.6000905@egenix.com> <539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io> Message-ID: <513A5BEA.1090603@egenix.com> On 08.03.2013 22:33, Donald Stufft wrote: > On Mar 8, 2013, at 4:28 PM, "M.-A. Lemburg" wrote: > >> BTW: If we go with the CDN caching model for external files, we'd >> pull the download page links directly on the /simple/ index >> page - as files, not external links. > > We cannot download and rehost (even if we call it a cache) external files without getting permission from their owners to do so. Well, in the CDN version of the /simple/ dir, they would look like files hosted on the CDN. The download pages would still be feeding the CDN, though. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 07 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From donald at stufft.io Fri Mar 8 22:47:27 2013 From: donald at stufft.io (Donald Stufft) Date: Fri, 8 Mar 2013 16:47:27 -0500 Subject: [Catalog-sig] hash tags In-Reply-To: <513A5BEA.1090603@egenix.com> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> <513A57FF.6000905@egenix.com> <539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io> <513A5BEA.1090603@egenix.com> Message-ID: <9963F1EB-A9DF-4405-B1E4-86ADCB2A1040@stufft.io> On Mar 8, 2013, at 4:45 PM, "M.-A. Lemburg" wrote: > On 08.03.2013 22:33, Donald Stufft wrote: >> On Mar 8, 2013, at 4:28 PM, "M.-A. Lemburg" wrote: >> >>> BTW: If we go with the CDN caching model for external files, we'd >>> pull the download page links directly on the /simple/ index >>> page - as files, not external links. >> >> We cannot download and rehost (even if we call it a cache) external files without getting permission from their owners to do so. > > Well, in the CDN version of the /simple/ dir, they would look > like files hosted on the CDN. The download pages would still > be feeding the CDN, though. I'm unsure what you're saying here. If it involves downloading files hosted outside of PyPI and putting it on a PSF controlled CDN it's a non starter. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 07 2013) >>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From christian at python.org Fri Mar 8 22:50:55 2013 From: christian at python.org (Christian Heimes) Date: Fri, 08 Mar 2013 22:50:55 +0100 Subject: [Catalog-sig] hash tags In-Reply-To: <539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> <513A57FF.6000905@egenix.com> <539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io> Message-ID: <513A5D3F.9020202@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Am 08.03.2013 22:33, schrieb Donald Stufft: > On Mar 8, 2013, at 4:28 PM, "M.-A. Lemburg" > wrote: > >> BTW: If we go with the CDN caching model for external files, >> we'd pull the download page links directly on the /simple/ index >> page - as files, not external links. > > We cannot download and rehost (even if we call it a cache) external > files without getting permission from their owners to do so. (CC to Van as this is a legal matter) Would it be sufficient to add a checkbox to the administration section of PyPI packages that say something like "I'm an owner of this package and I grant PyPI the permission to rehost my stuff"? Christian -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iQIcBAEBCAAGBQJROl07AAoJEMeIxMHUVQ1Fl38P/j5FKyg9C/QLODkOhzJlNeln MkxUYYMx6iVc8GW1tU6eOw5NIChlgjXmFvL97VAJWLGcw+Crs9ChUyivABH4KPNm nOxr/hXGTOlFrWahcvMvLthIRofNjTVNqphZNFDYApbdD8zGvilDxG0kvuPPom9K RER4FIzk7KbkqSTQA7/Wg5Ekd1Cnw3mChkqwGcVfmYn/5ROWwa9h4bBwD0EiCCAn RsmMWtfWIeP+94KroOKOHIdgnGhIvGyN5bkvixSeNkA1HZsxxdpzpF9ZQ5MhLavN bxZbySXdaJfG9pyMQ2HtPOWnBfPWU0ywwDX+Q514Tjs68Jxpz5nUs3yPfFzuPdov rONt9BAHyHQsbNpSNOfs6kULdfcNvrDoWiCKXoceUobQfSy5hpEkC7W8VwIU9Hp2 T0k4H63O3uk2pTTbQQM1fL5yiNcyhUSZEchnCadPRYTkxcifUZN6z3v3yLmGMYsL HSns8aH1b21MVCn7mFQiQZcPl9gHUS97yAArrDfWtPw4UmMpfGcjJlriXsTRGN22 ZPyzts66ZupXR1eoKWPBTzFXVP337z0kyqUGE2VJDyuAGSM0NaNT38RJCiOd6RKz CKGdIfwCUDj0c6PdXaVQH+SMefvL7/AnqJrGAB8FDNHx9Hr2reZF4qSuGx66VM8k 6vHvtXX8yuKkByOzhDQj =I46J -----END PGP SIGNATURE----- From donald at stufft.io Fri Mar 8 22:59:15 2013 From: donald at stufft.io (Donald Stufft) Date: Fri, 8 Mar 2013 16:59:15 -0500 Subject: [Catalog-sig] hash tags In-Reply-To: <513A5D3F.9020202@python.org> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> <513A57FF.6000905@egenix.com> <539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io> <513A5D3F.9020202@python.org> Message-ID: <435A6200-BE8B-4DA5-884F-B193EF5984DF@stufft.io> On Mar 8, 2013, at 4:50 PM, Christian Heimes wrote: > Am 08.03.2013 22:33, schrieb Donald Stufft: > > On Mar 8, 2013, at 4:28 PM, "M.-A. Lemburg" > > wrote: > > > >> BTW: If we go with the CDN caching model for external files, > >> we'd pull the download page links directly on the /simple/ index > >> page - as files, not external links. > > > > We cannot download and rehost (even if we call it a cache) external > > files without getting permission from their owners to do so. > > (CC to Van as this is a legal matter) > > Would it be sufficient to add a checkbox to the administration section > of PyPI packages that say something like "I'm an owner of this package > and I grant PyPI the permission to rehost my stuff"? > > Christian > If we have permission to rehost we might as well just kill the external list and rehost it. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From christian at python.org Fri Mar 8 23:02:11 2013 From: christian at python.org (Christian Heimes) Date: Fri, 08 Mar 2013 23:02:11 +0100 Subject: [Catalog-sig] hash tags In-Reply-To: References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> <513A57FF.6000905@egenix.com> <539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io> Message-ID: <513A5FE3.2010604@python.org> Am 08.03.2013 22:43, schrieb Daniel Holth: > Check out https://blake2.net/ ; it is both faster and more secure than > md5. md5 does have to go, no matter how secure it is in this > particular application. SHA2 is the only choice that doesn't require a > long explanation. When this came up a little less than a year ago we > talked about maybe including the SHA2 hash in one of the link > attributes for the benefit of old clients. Let's not add yet another crypto hash algorithm. :) We have SHA-1 and SHA-2, that's ought be be enough. SHA-3 is available for Python 3.4 and I provide stand-alone sources and binaries for 2.6 to 3.3. Blake2 looks nice but we should stick to NIST-approved algorithms. The combination of file size, MD5 (for legacy reasons), SHA-1 and perhaps SHA-256 is more than sufficient. Don't forget that files have to be valid tar.gz, tar.bz2, zip or Windows binaries, too ... Christian From donald at stufft.io Fri Mar 8 23:03:30 2013 From: donald at stufft.io (Donald Stufft) Date: Fri, 8 Mar 2013 17:03:30 -0500 Subject: [Catalog-sig] hash tags In-Reply-To: <513A5FE3.2010604@python.org> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> <513A57FF.6000905@egenix.com> <539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io> <513A5FE3.2010604@python.org> Message-ID: On Mar 8, 2013, at 5:02 PM, Christian Heimes wrote: > Am 08.03.2013 22:43, schrieb Daniel Holth: >> Check out https://blake2.net/ ; it is both faster and more secure than >> md5. md5 does have to go, no matter how secure it is in this >> particular application. SHA2 is the only choice that doesn't require a >> long explanation. When this came up a little less than a year ago we >> talked about maybe including the SHA2 hash in one of the link >> attributes for the benefit of old clients. > > Let's not add yet another crypto hash algorithm. :) > > We have SHA-1 and SHA-2, that's ought be be enough. SHA-3 is available > for Python 3.4 and I provide stand-alone sources and binaries for 2.6 to > 3.3. Blake2 looks nice but we should stick to NIST-approved algorithms. > > The combination of file size, MD5 (for legacy reasons), SHA-1 and > perhaps SHA-256 is more than sufficient. Don't forget that files have to > be valid tar.gz, tar.bz2, zip or Windows binaries, too ? Sha-1 is broken. Sha-2 or better is the only real acceptable one in the stdlib. > > Christian ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From mal at egenix.com Fri Mar 8 23:04:24 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 08 Mar 2013 23:04:24 +0100 Subject: [Catalog-sig] hash tags In-Reply-To: <9963F1EB-A9DF-4405-B1E4-86ADCB2A1040@stufft.io> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> <513A57FF.6000905@egenix.com> <539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io> <513A5BEA.1090603@egenix.com> <9963F1EB-A9DF-4405-B1E4-86ADCB2A1040@stufft.io> Message-ID: <513A6068.6090207@egenix.com> On 08.03.2013 22:47, Donald Stufft wrote: > On Mar 8, 2013, at 4:45 PM, "M.-A. Lemburg" wrote: > >> On 08.03.2013 22:33, Donald Stufft wrote: >>> On Mar 8, 2013, at 4:28 PM, "M.-A. Lemburg" wrote: >>> >>>> BTW: If we go with the CDN caching model for external files, we'd >>>> pull the download page links directly on the /simple/ index >>>> page - as files, not external links. >>> >>> We cannot download and rehost (even if we call it a cache) external files without getting permission from their owners to do so. >> >> Well, in the CDN version of the /simple/ dir, they would look >> like files hosted on the CDN. The download pages would still >> be feeding the CDN, though. > > I'm unsure what you're saying here. If it involves downloading files hosted outside of PyPI and putting it on a PSF controlled CDN it's a non starter. My idea was to have PyPI send a redirect to the external URL when getting a request for the file, so we could avoid hosting the files and instead just have the CDN cache them for a certain time period. However, I've now read up on the CloudFront docs, which point out that the CDN won't follow the redirect, but simply forward it to the user, bypassing the CDN: http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html#ResponseCustomRedirects I suspect other CDNs to work in the same way, so the redirect idea doesn't work. We'd have to use a proxy solution on the PyPI server to make the caching CDN work, but that will likely cause more legal problems than the plain caching of content on the way to the user. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 07 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From pje at telecommunity.com Fri Mar 8 23:06:28 2013 From: pje at telecommunity.com (PJ Eby) Date: Fri, 8 Mar 2013 17:06:28 -0500 Subject: [Catalog-sig] hash tags In-Reply-To: <513A5575.5000200@egenix.com> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> <513A5575.5000200@egenix.com> Message-ID: On Fri, Mar 8, 2013 at 4:17 PM, M.-A. Lemburg wrote: > On 08.03.2013 20:16, PJ Eby wrote: >> There is, as I said before, a MUCH simpler way to do this, that works >> right now: put direct #md5 download links in your description, and >> phase out the rel="" attributes altogether. > > No, that would be a pretty poor design :-) > > The rel="" attributes are good design, since they were meant for > exactly this purpose (machine reading and understanding relations > between origin and target). That depends on the goal of your design. If the goal is to phase out offsite spidering by downloader tools in a reasonably easy and low-cost way, introducing new API is not a good way to do it. The simple way to do it is to replace download-time end-user unsupervised spidering with upload-time or registration-time author-supervised spidering, which requires only that the tools exist and people be informed of them (and encouraged to use them). From pje at telecommunity.com Fri Mar 8 23:08:52 2013 From: pje at telecommunity.com (PJ Eby) Date: Fri, 8 Mar 2013 17:08:52 -0500 Subject: [Catalog-sig] hash tags In-Reply-To: References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> Message-ID: On Fri, Mar 8, 2013 at 4:26 PM, Donald Stufft wrote: > On Mar 8, 2013, at 4:12 PM, PJ Eby wrote: > >> On Fri, Mar 8, 2013 at 2:52 PM, Noah Kantrowitz wrote: >>> MD5 is _not_ acceptable for anything security related and we shouldn't be adding anything that increases our dependence on it. MD5's only use in the packaging world is to make people who forget that TCP has its own checksums feel all warm and fuzzy that there hasn't been _accidental_ download corruption. >> >> So, you're saying that someone has found a second-preimage attack >> against MD5 that's more efficient than the current 2**127 threshold >> established in 2009? >> >> "Anything security related" is pretty broad. Out of the many classes >> of attacks on hashes, AFAIK the only class that's relevant to PyPI is >> second preimage attacks, i.e. one where the attacker has the original >> file and the hash, and must construct a new file that produces the >> same hash value. > > Relevant to PyPI is pretty broad, and when you're developing a secure system you need to look past what is ok *today* and design for the next 5, 10, or 20 years. So even if there's no attack that can directly allow replacing the target file with a new one, continuing to utilize it is bad. It has a number of weaknesses which do not install confidence in its future security meanwhile there are a number of other hashes which _do_. > > Unless you'd rather be trying to replace hashes everywhere once it's already completely broken. We can replace it completely in a lot less than that many years, if the new PEP-based tools can be brought to pass. Using new protocols (e.g. the embedded signatures in wheel files) will make most of this moot. What I'm against is trying to patch over the existing protocol when what we really want is to replace it altogether. Adding hashes and filesizes and whatnot is just gilding the existing lily, or more like gilding the pond scum, actually. ;-) From christian at python.org Fri Mar 8 23:09:58 2013 From: christian at python.org (Christian Heimes) Date: Fri, 08 Mar 2013 23:09:58 +0100 Subject: [Catalog-sig] hash tags In-Reply-To: References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> <513A57FF.6000905@egenix.com> <539534FF-1199-4AAF-9D8E-5160D67FD16B@stufft.io> <513A5FE3.2010604@python.org> Message-ID: <513A61B6.8020003@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Am 08.03.2013 23:03, schrieb Donald Stufft: > Sha-1 is broken. Sha-2 or better is the only real acceptable one > in the stdlib. Well, then SHA-384 it is. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iQIcBAEBCAAGBQJROmGyAAoJEMeIxMHUVQ1FmiMQAIqRskGY53GFclfE1TDUGkBk KsmatFXfenMYkvJ2w1m5GGqj0AKeeLEBHub/+efgynzd1TVzx0CZUwGJt+XTzB7Y jUeqbGOxlqPcOujvI880Yh4npYzxJvmLbhiUSx3/6PEOje4TIhlRW8iiLjHKSNt7 Ky0jHA5c3I/I0WaOG+KlgvYGr7McOVoSfRyqKO8IjiLqxeRi757OzLOHCtbyuuEj N/zWt8dzoXn56D1WNaeV50qvJBejfu+OtCSfvohL2uCmEWFTNulgGy9W4um7U/L2 RHClqchO1aSKUTDzwEKNiDFQK7FAkk1YlehCZva5Er43dTQJFINYm+WDKLEFamWO 7KoNah9ToyIoKENTJJr/Oe/3wsBVh82bcl4pKlP7heOtRQx3bDn1z4ktWWYDSEcr 3MgJOKeu+NyebnOr3DfwQPeNfxPa1qpfX3+UvmMgstvWFEOxJ828SBTZDIIr8LGq Fb/9IrCVxbXUo5F8qS8klAXbnPGrTGyktYkwi9wMHEoMOrrrNKPqiqpSt0/cpTJV Kj16JpgT+zvHyJx3hgtu+iynvRSnQ7G4SzI29t1eLhLhNG2RbSNYDafk3yjs8UGA tUDS+PIxRKEgDMH5stdlAKJJWSDYfpMqf+06TC8FoUKhZQHPwbajsp/anRihidFm TJ8hCbAGsh3iaR0k8dA9 =W+vN -----END PGP SIGNATURE----- From pje at telecommunity.com Fri Mar 8 23:11:34 2013 From: pje at telecommunity.com (PJ Eby) Date: Fri, 8 Mar 2013 17:11:34 -0500 Subject: [Catalog-sig] hash tags In-Reply-To: <513A57FF.6000905@egenix.com> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> <513A57FF.6000905@egenix.com> Message-ID: On Fri, Mar 8, 2013 at 4:28 PM, M.-A. Lemburg wrote: > On 08.03.2013 20:16, PJ Eby wrote: >> So, since the page only contains links, might as well put the links >> straight on PyPI, or at most have an option/tool to load the links >> from an external source. > > I don't follow you. We only have a single download_url field > available to store a download link. > > We'd need to modify the meta data format to allow for more than > one such field, which doesn't work if you want to stay backwards > compatible. Links included in the long description field are placed on the /simple index of links. So you can just edit your standard metadata right this minute if you want to offer more download links. And you can put #md5 tags on them if you want the tools to check that. From donald at stufft.io Fri Mar 8 23:12:46 2013 From: donald at stufft.io (Donald Stufft) Date: Fri, 8 Mar 2013 17:12:46 -0500 Subject: [Catalog-sig] hash tags In-Reply-To: References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> Message-ID: On Mar 8, 2013, at 5:08 PM, PJ Eby wrote: > On Fri, Mar 8, 2013 at 4:26 PM, Donald Stufft wrote: >> On Mar 8, 2013, at 4:12 PM, PJ Eby wrote: >> >>> On Fri, Mar 8, 2013 at 2:52 PM, Noah Kantrowitz wrote: >>>> MD5 is _not_ acceptable for anything security related and we shouldn't be adding anything that increases our dependence on it. MD5's only use in the packaging world is to make people who forget that TCP has its own checksums feel all warm and fuzzy that there hasn't been _accidental_ download corruption. >>> >>> So, you're saying that someone has found a second-preimage attack >>> against MD5 that's more efficient than the current 2**127 threshold >>> established in 2009? >>> >>> "Anything security related" is pretty broad. Out of the many classes >>> of attacks on hashes, AFAIK the only class that's relevant to PyPI is >>> second preimage attacks, i.e. one where the attacker has the original >>> file and the hash, and must construct a new file that produces the >>> same hash value. >> >> Relevant to PyPI is pretty broad, and when you're developing a secure system you need to look past what is ok *today* and design for the next 5, 10, or 20 years. So even if there's no attack that can directly allow replacing the target file with a new one, continuing to utilize it is bad. It has a number of weaknesses which do not install confidence in its future security meanwhile there are a number of other hashes which _do_. >> >> Unless you'd rather be trying to replace hashes everywhere once it's already completely broken. > > We can replace it completely in a lot less than that many years, if > the new PEP-based tools can be brought to pass. Using new protocols > (e.g. the embedded signatures in wheel files) will make most of this > moot. > > What I'm against is trying to patch over the existing protocol when > what we really want is to replace it altogether. Adding hashes and > filesizes and whatnot is just gilding the existing lily, or more like > gilding the pond scum, actually. ;-) Unless we are planning on removing the existing tooling this still matters even with the new system in place. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From pje at telecommunity.com Fri Mar 8 23:50:37 2013 From: pje at telecommunity.com (PJ Eby) Date: Fri, 8 Mar 2013 17:50:37 -0500 Subject: [Catalog-sig] hash tags In-Reply-To: <8A3002A9-5E2B-4D38-BABD-9253A027E7F6@stufft.io> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> <8A3002A9-5E2B-4D38-BABD-9253A027E7F6@stufft.io> Message-ID: On Fri, Mar 8, 2013 at 4:32 PM, Donald Stufft wrote: > Here's some more information pulled straight from Wikiepdia: Trust me, I've read a LOT of Wikipedia (and even more from other sites, including at least the conclusions of a number of cryptography papers) about hashing attacks recently, because I was seeing inconsistencies in what people are saying about hashes and their weaknesses and so forth. 99.9% of the discussion about attacks on hashes have to do with collision attacks, prefix attacks, and length extension attacks, all of which are extremely relevant for *cryptographic* purposes. Specifically, the use of hashes to verify identity, authority, repudiability, etc... which emphatically do *not* apply to the use of an MD5 as a checksum to verify a correct download. All of these attacks depend on *something else* being at stake besides the integrity of the original message. For example length-extension attacks bypass the need to know a "secret" used in a naive hash-based signature scheme (which is why you're supposed to use HMAC for such things), while collision attacks let you trick a signer into signing something that you can later replace with something altered. The current use of #md5 tags isn't subject to either of these kinds of attack, because: 1. There is no "secret" to be revealed, and 2. The author and signer are the same person So the only type of attack I've found out about thus far, in my (admittedly few) hours of study on the subject, that is relevant to the way we use MD5 on PyPI at present is the so-called "second pre-image" attack, which is when you're given an existing message and a hash, and have to create a new message with the same hash... while also incorporating something useful in the new message. The most recent report I saw on second pre-image attacks against full MD5 estimated a 2**127 strength, meaning that even if you could process a great many billion tries per second, it would take you thousands of years to come up with a file that could masquerade as an existing download. (And most people's computers and/or internet connections would choke on the massive file sizes needed for the still-theoretical Kelsey-Schneier generalized preimage attack, which in any case would apply equally to just about any other hash we could currently put out in the field. i.e., it's not specific to a particular hash algorithm, it just relies on certain properties of the algorithm.) So, yeah, MD5 is *cryptographically* broken, sure. But it's not broken for *data integrity*. And in the PyPI use case, the "cryptographic" part is all in the SSL being used to fetch the MD5 link in the first place. > Here's the important highlights: > > - specifically, a group of researchers described how to create a pair of files that share the same MD5 checksum Right, that's what's called a "collision attack". It means that you can go out *ahead of time*, and make two files with the same checksum, one good, one evil. It does *not* mean you get to take an existing file, and then make a second file with the same checksum. (The latter is a "second preimage" attack, which is *not* broken Hash collision attacks in PyPI would basically require an author to upload a special version of their package that looked innocent, and then they could later switch that version out with one that's harmful. And the *way* that this works is that you specially generate *both* files, in advance. Which means that the author themselves is compromised, so the threat is moot. The author can already upload compromised code (either through being evil or having their PC hijacked), and what #md5 it has is 100% irrelevant. That is, there's nothing stopping an evil author or an author with a compromised PC from simply uploading a new file with a new MD5, because PyPI will pass it along in exactly the same way. Changing hash algorithms will not affect this threat vector in the slightest. Given these facts, it makes no sense to fuss over the hash algorithm in current use, since a concurrent goal here is to switch to file formats that can be directly signed using, you know, *actual* cryptography. ;-) The new .wheel format makes provisions for modern signature techniques. It'd be good if sdists also did. Then the #md5 tag can die a natural death, hopefully within the year replaced by a hashtag that say, fingerprints the author's public key as registered with PyPI, or something of that sort. In the meantime, there's no actual threat here, so bikeshedding what to replace it with *while keeping the current system* is like rearranging office furniture in a building that's about to have demolition charges set underneath it. ;-) From donald at stufft.io Sat Mar 9 00:15:13 2013 From: donald at stufft.io (Donald Stufft) Date: Fri, 8 Mar 2013 18:15:13 -0500 Subject: [Catalog-sig] hash tags In-Reply-To: References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> <8A3002A9-5E2B-4D38-BABD-9253A027E7F6@stufft.io> Message-ID: <9C6FA1F5-694D-4C24-9E92-39A7C18B80D6@stufft.io> On Mar 8, 2013, at 5:50 PM, PJ Eby wrote: > On Fri, Mar 8, 2013 at 4:32 PM, Donald Stufft wrote: >> Here's some more information pulled straight from Wikiepdia: > > Trust me, I've read a LOT of Wikipedia (and even more from other > sites, including at least the conclusions of a number of cryptography > papers) about hashing attacks recently, because I was seeing > inconsistencies in what people are saying about hashes and their > weaknesses and so forth. 99.9% of the discussion about attacks on > hashes have to do with collision attacks, prefix attacks, and length > extension attacks, all of which are extremely relevant for > *cryptographic* purposes. Specifically, the use of hashes to verify > identity, authority, repudiability, etc... which emphatically do > *not* apply to the use of an MD5 as a checksum to verify a correct > download. > > All of these attacks depend on *something else* being at stake besides > the integrity of the original message. For example length-extension > attacks bypass the need to know a "secret" used in a naive hash-based > signature scheme (which is why you're supposed to use HMAC for such > things), while collision attacks let you trick a signer into signing > something that you can later replace with something altered. > > The current use of #md5 tags isn't subject to either of these kinds of > attack, because: > > 1. There is no "secret" to be revealed, and > 2. The author and signer are the same person > > So the only type of attack I've found out about thus far, in my > (admittedly few) hours of study on the subject, that is relevant to > the way we use MD5 on PyPI at present is the so-called "second > pre-image" attack, which is when you're given an existing message and > a hash, and have to create a new message with the same hash... while > also incorporating something useful in the new message. > > The most recent report I saw on second pre-image attacks against full > MD5 estimated a 2**127 strength, meaning that even if you could > process a great many billion tries per second, it would take you > thousands of years to come up with a file that could masquerade as an > existing download. (And most people's computers and/or internet > connections would choke on the massive file sizes needed for the > still-theoretical Kelsey-Schneier generalized preimage attack, which > in any case would apply equally to just about any other hash we could > currently put out in the field. i.e., it's not specific to a > particular hash algorithm, it just relies on certain properties of the > algorithm.) > > So, yeah, MD5 is *cryptographically* broken, sure. But it's not > broken for *data integrity*. And in the PyPI use case, the > "cryptographic" part is all in the SSL being used to fetch the MD5 > link in the first place. > > >> Here's the important highlights: >> >> - specifically, a group of researchers described how to create a pair of files that share the same MD5 checksum > > Right, that's what's called a "collision attack". It means that you > can go out *ahead of time*, and make two files with the same checksum, > one good, one evil. It does *not* mean you get to take an existing > file, and then make a second file with the same checksum. (The latter > is a "second preimage" attack, which is *not* broken > > Hash collision attacks in PyPI would basically require an author to > upload a special version of their package that looked innocent, and > then they could later switch that version out with one that's harmful. > And the *way* that this works is that you specially generate *both* > files, in advance. Which means that the author themselves is > compromised, so the threat is moot. The author can already upload > compromised code (either through being evil or having their PC > hijacked), and what #md5 it has is 100% irrelevant. > > That is, there's nothing stopping an evil author or an author with a > compromised PC from simply uploading a new file with a new MD5, > because PyPI will pass it along in exactly the same way. Changing > hash algorithms will not affect this threat vector in the slightest. > > Given these facts, it makes no sense to fuss over the hash algorithm > in current use, since a concurrent goal here is to switch to file > formats that can be directly signed using, you know, *actual* > cryptography. ;-) > > The new .wheel format makes provisions for modern signature > techniques. It'd be good if sdists also did. Then the #md5 tag can > die a natural death, hopefully within the year replaced by a hashtag > that say, fingerprints the author's public key as registered with > PyPI, or something of that sort. In the meantime, there's no actual > threat here, so bikeshedding what to replace it with *while keeping > the current system* is like rearranging office furniture in a building > that's about to have demolition charges set underneath it. ;-) http://i.imgur.com/wq6GH17.gif There's an old saying inside the NSA: "Attacks always get better; they never get worse." [1] Even if you accept the premise that for this one tiny little segment MD5 is still theoretically ok MD5 isn't going to get any better. The simple API is not going anywhere. Waving your hands and saying the stuff will obselete all of this is great but it won't. This stuff is going to be around for a long time and we need to look towards the future not shove our head in the sand and point towards a toolchain that may or may not happen in the near future. [1] Stolen from Bruce Schneier ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From rasky at develer.com Sat Mar 9 02:06:48 2013 From: rasky at develer.com (Giovanni Bajo) Date: Sat, 9 Mar 2013 02:06:48 +0100 Subject: [Catalog-sig] hash tags In-Reply-To: <9C6FA1F5-694D-4C24-9E92-39A7C18B80D6@stufft.io> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> <8A3002A9-5E2B-4D38-BABD-9253A027E7F6@stufft.io> <9C6FA1F5-694D-4C24-9E92-39A7C18B80D6@stufft.io> Message-ID: Il giorno 09/mar/2013, alle ore 00:15, Donald Stufft ha scritto: > > On Mar 8, 2013, at 5:50 PM, PJ Eby wrote: > >> On Fri, Mar 8, 2013 at 4:32 PM, Donald Stufft wrote: >>> Here's some more information pulled straight from Wikiepdia: >> >> Trust me, I've read a LOT of Wikipedia (and even more from other >> sites, including at least the conclusions of a number of cryptography >> papers) about hashing attacks recently, because I was seeing >> inconsistencies in what people are saying about hashes and their >> weaknesses and so forth. 99.9% of the discussion about attacks on >> hashes have to do with collision attacks, prefix attacks, and length >> extension attacks, all of which are extremely relevant for >> *cryptographic* purposes. Specifically, the use of hashes to verify >> identity, authority, repudiability, etc... which emphatically do >> *not* apply to the use of an MD5 as a checksum to verify a correct >> download. >> >> All of these attacks depend on *something else* being at stake besides >> the integrity of the original message. For example length-extension >> attacks bypass the need to know a "secret" used in a naive hash-based >> signature scheme (which is why you're supposed to use HMAC for such >> things), while collision attacks let you trick a signer into signing >> something that you can later replace with something altered. >> >> The current use of #md5 tags isn't subject to either of these kinds of >> attack, because: >> >> 1. There is no "secret" to be revealed, and >> 2. The author and signer are the same person >> >> So the only type of attack I've found out about thus far, in my >> (admittedly few) hours of study on the subject, that is relevant to >> the way we use MD5 on PyPI at present is the so-called "second >> pre-image" attack, which is when you're given an existing message and >> a hash, and have to create a new message with the same hash... while >> also incorporating something useful in the new message. >> >> The most recent report I saw on second pre-image attacks against full >> MD5 estimated a 2**127 strength, meaning that even if you could >> process a great many billion tries per second, it would take you >> thousands of years to come up with a file that could masquerade as an >> existing download. (And most people's computers and/or internet >> connections would choke on the massive file sizes needed for the >> still-theoretical Kelsey-Schneier generalized preimage attack, which >> in any case would apply equally to just about any other hash we could >> currently put out in the field. i.e., it's not specific to a >> particular hash algorithm, it just relies on certain properties of the >> algorithm.) >> >> So, yeah, MD5 is *cryptographically* broken, sure. But it's not >> broken for *data integrity*. And in the PyPI use case, the >> "cryptographic" part is all in the SSL being used to fetch the MD5 >> link in the first place. >> >> >>> Here's the important highlights: >>> >>> - specifically, a group of researchers described how to create a pair of files that share the same MD5 checksum >> >> Right, that's what's called a "collision attack". It means that you >> can go out *ahead of time*, and make two files with the same checksum, >> one good, one evil. It does *not* mean you get to take an existing >> file, and then make a second file with the same checksum. (The latter >> is a "second preimage" attack, which is *not* broken >> >> Hash collision attacks in PyPI would basically require an author to >> upload a special version of their package that looked innocent, and >> then they could later switch that version out with one that's harmful. >> And the *way* that this works is that you specially generate *both* >> files, in advance. Which means that the author themselves is >> compromised, so the threat is moot. The author can already upload >> compromised code (either through being evil or having their PC >> hijacked), and what #md5 it has is 100% irrelevant. >> >> That is, there's nothing stopping an evil author or an author with a >> compromised PC from simply uploading a new file with a new MD5, >> because PyPI will pass it along in exactly the same way. Changing >> hash algorithms will not affect this threat vector in the slightest. >> >> Given these facts, it makes no sense to fuss over the hash algorithm >> in current use, since a concurrent goal here is to switch to file >> formats that can be directly signed using, you know, *actual* >> cryptography. ;-) >> >> The new .wheel format makes provisions for modern signature >> techniques. It'd be good if sdists also did. Then the #md5 tag can >> die a natural death, hopefully within the year replaced by a hashtag >> that say, fingerprints the author's public key as registered with >> PyPI, or something of that sort. In the meantime, there's no actual >> threat here, so bikeshedding what to replace it with *while keeping >> the current system* is like rearranging office furniture in a building >> that's about to have demolition charges set underneath it. ;-) > > > http://i.imgur.com/wq6GH17.gif > > There's an old saying inside the NSA: "Attacks always get better; they never get worse." [1] > > Even if you accept the premise that for this one tiny little segment MD5 is still theoretically ok MD5 isn't going to get any better. The simple API is not going anywhere. Waving your hands and saying the stuff will obselete all of this is great but it won't. This stuff is going to be around for a long time and we need to look towards the future not shove our head in the sand and point towards a toolchain that may or may not happen in the near future. Exactly. Pj, the point is that even MD5 is not currently broken for 1st/2nd pre-image, there is absolutely no confidence that the security margin for such a broken algorithm is enough to keep it working in that case for even a short time. This is to say that it's not unrealistic that a 1st/2nd pre-image attack is published like tomorrow. You should expect it *any day* at this point. It's a good practice to avoid crypto algorithms whose foundations are known to be broken. This is one of those cases. If we ever touch code that uses MD5, we should drop it immediately. There is no reason to keep it and wait for someone to release an attack, so that the world can point fingers at us and laugh. -- Giovanni Bajo :: rasky at develer.com Develer S.r.l. :: http://www.develer.com My Blog: http://giovanni.bajo.it -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4346 bytes Desc: not available URL: From holger at merlinux.eu Sat Mar 9 07:51:03 2013 From: holger at merlinux.eu (holger krekel) Date: Sat, 9 Mar 2013 06:51:03 +0000 Subject: [Catalog-sig] hash tags In-Reply-To: References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> Message-ID: <20130309065103.GW9677@merlinux.eu> Hi Philip, all, On Fri, Mar 08, 2013 at 14:16 -0500, PJ Eby wrote: > The key to making this transition isn't creating elaborate new > standards for the tools, it's *creating new tools for the standards*. If we can find a way to improve PyPI and not require the world to change first, that's a big plus in my book as well. > Point is, this entire thing can be done correctly at the PyPI end and > work with the existing API of the download tools. I think so as well. Will suggest a transition model in a new top-level thread, trying to follow this idea. best, holger From holger at merlinux.eu Sat Mar 9 08:22:22 2013 From: holger at merlinux.eu (holger krekel) Date: Sat, 9 Mar 2013 07:22:22 +0000 Subject: [Catalog-sig] transition to pypi-hosting through server-side changes Message-ID: <20130309072222.GX9677@merlinux.eu> Hi all, i think Philip Eby brought up a very worthwhile idea to consider: if we can transition to a no-external-hosting situation by making pypi-server changes without requiring client-side installers or releases processes to change, that would be great. We would have one place to implement things, and less friction on the probably millions of places where pip/easy_install and CI/release processes are used today. Basically all revolves around the issue of what links are served on the simple/* pages. What about adding a "hosting mode" field to a package which effects all historic and future releases, i.e. the mode is not specific to a particular release but to all releases. This field could have these values and meanings: - "pypi-only": homepage/download links are not added to simple/ pages unless they are #egg ones. Release registration with a non-empty and non-#egg download url is rejected. client-side tools will not need to crawl or download anything externally unless requring an #egg development tarball. - "pypi-cache": homepage/download pages are crawled at the pypi server side exactly once at release registration time. Or once at "transition" time when an author chooses to have his externally hosted release files be served from pypi. - "pypi-linkext": homepage/download urls are crawled at the pypi server side for release files, and the simple/ page serves links to them without requiring client-side tools to crawl external sites for determining the set of candidate release files. Legally, this should not pose a problem because the files are still hosted externally so we could at some point automatically switch projects to this mode. - "pypi-ext": like it is today: homepage/download urls are presented in simple/ pages and client-side tools need to crawl them themselves to find release file links. Now it is a matter of choosing good defaults and designing friendly user interactions to allow package maintainers to move to at least pypi-cache or best "pypi-only" mode. My current thoughts on this: - 90% of the projects could directly get the "pypi-only" mode as a default according to Donald's statistics. They'd still receive a mail with a link to a page where they can change the mode, if needed. And of course the friendly information that "pypi-only" provides the fastest and most reliable way for users to install their package. - 10% of the projects having external release files: - if they have their newest releases on pypi already, they could get a "linkext" mode so that client-side tools will not need to crawl and not need to download from external sites, if they only look for the newest release - if they have their newest release on pypi, they could get "ext" mode as default in either case, maintainers/authors get a mail with a link to the page where they can change the mode. And with information about the time frame for phasing out particular modes: - pypi-ext: in N months we automatically switch this mode to pypi-linkext - in N+M months only "pypi-only" and "pypi-cache" is allowed. With the latter you can still host your files externally but you need to accept that pypi caches release files at release registration time and serves them afterwards itself. If you do not agree, your release files will not be automatically discoverable anymore and you need to tell your users how to install things manually through the descrition of your package. - (and maybe: in N+M+X months only pypi-hosted is allowed as a mode) I think this (or a variation/refinements of this scheme) would offer a smooth transition where nobody needs to get upset and people would clearly see we are doing everything we can to make it easy to transition. cheers, holger From ncoghlan at gmail.com Sat Mar 9 09:05:44 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 9 Mar 2013 18:05:44 +1000 Subject: [Catalog-sig] transition to pypi-hosting through server-side changes In-Reply-To: <20130309072222.GX9677@merlinux.eu> References: <20130309072222.GX9677@merlinux.eu> Message-ID: On Sat, Mar 9, 2013 at 5:22 PM, holger krekel wrote: > I think this (or a variation/refinements of this scheme) would offer a > smooth transition where nobody needs to get upset and people would clearly > see we are doing everything we can to make it easy to transition. It sounds good to me, too (says the guy not writing the new code who already hosts his releases on PyPI...) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mal at egenix.com Sat Mar 9 15:56:17 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 09 Mar 2013 15:56:17 +0100 Subject: [Catalog-sig] hash tags In-Reply-To: References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> <8A3002A9-5E2B-4D38-BABD-9253A027E7F6@stufft.io> <9C6FA1F5-694D-4C24-9E92-39A7C18B80D6@stufft.io> Message-ID: <513B4D91.80005@egenix.com> [Discussion about MD5] I think there's not much point in discussing MD5 in this context. When creating new designs, you should always use the current best and most widely deployed algorithm, IMO. For Python, this is the SHA-2 family at the moment, since SHA-3 is not supported by Python's hashlib. MD5 is only needed to support older software. SHA-1 is also support by Python versions older than Python 2.5. It seems that SHA-256 and SHA-512, both from the SHA-2 family, are the most popular at the moment, so I guess SHA-256 is a good candidate to move forward and satisfy the 80/20 rule. Agreed ? FWIW, I'm pretty sure, SHA-256 will be broken in 10 years from now as well :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 09 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From donald at stufft.io Sat Mar 9 15:59:21 2013 From: donald at stufft.io (Donald Stufft) Date: Sat, 9 Mar 2013 09:59:21 -0500 Subject: [Catalog-sig] hash tags In-Reply-To: <513B4D91.80005@egenix.com> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> <8A3002A9-5E2B-4D38-BABD-9253A027E7F6@stufft.io> <9C6FA1F5-694D-4C24-9E92-39A7C18B80D6@stufft.io> <513B4D91.80005@egenix.com> Message-ID: <70844BDB-9538-4C1A-B853-3D6E60E749C1@stufft.io> On Mar 9, 2013, at 9:56 AM, "M.-A. Lemburg" wrote: > [Discussion about MD5] > > I think there's not much point in discussing MD5 in this context. > When creating new designs, you should always use the current > best and most widely deployed algorithm, IMO. > > For Python, this is the SHA-2 family at the moment, since SHA-3 is > not supported by Python's hashlib. MD5 is only needed to support older > software. SHA-1 is also support by Python versions older than Python 2.5. > > It seems that SHA-256 and SHA-512, both from the SHA-2 family, > are the most popular at the moment, so I guess SHA-256 is a good > candidate to move forward and satisfy the 80/20 rule. Sha256 and Sha512 are generally considered equivalent in a security context and either would be a perfectly fine candidate. > > Agreed ? > > FWIW, I'm pretty sure, SHA-256 will be broken in 10 years from > now as well :-) > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 09 2013) >>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From christian at python.org Sat Mar 9 19:09:37 2013 From: christian at python.org (Christian Heimes) Date: Sat, 09 Mar 2013 19:09:37 +0100 Subject: [Catalog-sig] hash tags In-Reply-To: References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> <8A3002A9-5E2B-4D38-BABD-9253A027E7F6@stufft.io> <9C6FA1F5-694D-4C24-9E92-39A7C18B80D6@stufft.io> Message-ID: <513B7AE1.6060002@python.org> Am 09.03.2013 02:06, schrieb Giovanni Bajo: > It's a good practice to avoid crypto algorithms whose foundations are known to be broken. This is one of those cases. If we ever touch code that uses MD5, we should drop it immediately. There is no reason to keep it and wait for someone to release an attack, so that the world can point fingers at us and laugh. Relax, MD5 is still fine to detect broken or partial downloads. Trust me, this still happens a lot with broken proxy servers and unstable network connections. I have seen my fair share of broken files during deployments at works. If we are going to remove MD5 *now*, then we are going to remove the last bit of security from old tools. I agree that MD5 doesn't provide strong cryptographic security. But it's still better than no checksum. I also agree that we should no longer endorse MD5 and move to a strong hash algorithm for checksums. People will point their fingers towards us and laugh about Python when somebody abuses MD5 for an attack on PyPI. file size + MD5 (for legacy) + SHA-2 look good to me. Christian From rasky at develer.com Sat Mar 9 20:20:17 2013 From: rasky at develer.com (Giovanni Bajo) Date: Sat, 9 Mar 2013 20:20:17 +0100 Subject: [Catalog-sig] hash tags In-Reply-To: <513B7AE1.6060002@python.org> References: <5B9DAC56-1654-4F46-A185-B0A144D5E29D@stufft.io> <5139D05F.6030404@egenix.com> <5139D65B.3070907@python.org> <5139DE99.9020005@egenix.com> <8A3002A9-5E2B-4D38-BABD-9253A027E7F6@stufft.io> <9C6FA1F5-694D-4C24-9E92-39A7C18B80D6@stufft.io> <513B7AE1.6060002@python.org> Message-ID: Il giorno 09/mar/2013, alle ore 19:09, Christian Heimes ha scritto: > Am 09.03.2013 02:06, schrieb Giovanni Bajo: >> It's a good practice to avoid crypto algorithms whose foundations are known to be broken. This is one of those cases. If we ever touch code that uses MD5, we should drop it immediately. There is no reason to keep it and wait for someone to release an attack, so that the world can point fingers at us and laugh. > > Relax, MD5 is still fine to detect broken or partial downloads. Trust > me, this still happens a lot with broken proxy servers and unstable > network connections. I have seen my fair share of broken files during > deployments at works. > > If we are going to remove MD5 *now*, then we are going to remove the > last bit of security from old tools. I agree that MD5 doesn't provide > strong cryptographic security. But it's still better than no checksum. When I say "we should drop it", I obviously meant "replace it with a different algorithm". The post was intended to make sure that we migrate away from it, since we're touching that code. I wasn't certainly advocating against using any checksum algorithm. -- Giovanni Bajo :: rasky at develer.com Develer S.r.l. :: http://www.develer.com My Blog: http://giovanni.bajo.it -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4346 bytes Desc: not available URL: From ubershmekel at gmail.com Sun Mar 10 09:05:57 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sun, 10 Mar 2013 10:05:57 +0200 Subject: [Catalog-sig] Search engine relevance In-Reply-To: References: Message-ID: On Fri, Mar 8, 2013 at 11:26 PM, Richard Jones wrote: > That *was* the original search engine :-) > > Then after user complaints we devised a better solution... > > Always happy to take criticism of it and improve it! :-) > > Sent from my portable device, please excuse the brevity. > > We can go a few directions: Easy & python.org styled * google's JS search API to get, parse and display results. $5 per 1K queries. * bing's JS search API. 5$ per 2.5K queries. Easy but external * textbox links to a google/bing search with site:pypi.python.org Hard to get good results, but perhaps easy to try: * Change/improve internal search engine, and invent a good ranking algorithm. Though I wouldn't say this is high priority at all. I personally never use pypi search, just site:pypi.python.org on google. Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From r1chardj0n3s at gmail.com Sun Mar 10 09:23:43 2013 From: r1chardj0n3s at gmail.com (Richard Jones) Date: Sun, 10 Mar 2013 19:23:43 +1100 Subject: [Catalog-sig] Search engine relevance In-Reply-To: References: Message-ID: On 10 March 2013 19:05, Yuval Greenfield wrote: > On Fri, Mar 8, 2013 at 11:26 PM, Richard Jones > wrote: >> >> That *was* the original search engine :-) >> >> Then after user complaints we devised a better solution... >> >> Always happy to take criticism of it and improve it! :-) >> >> Sent from my portable device, please excuse the brevity. >> >> > > We can go a few directions: > > Easy & python.org styled > * google's JS search API to get, parse and display results. $5 per 1K > queries. > * bing's JS search API. 5$ per 2.5K queries. Would be worth investigating if we can reasonably format the results. Figuring out the billing will be something to discuss with the PSF admin. > Easy but external > * textbox links to a google/bing search with site:pypi.python.org As I said, this is how it was done, but there were complaints. > Hard to get good results, but perhaps easy to try: > * Change/improve internal search engine, and invent a good ranking > algorithm. We could probably just use the text search stuff built into postgres, rather than the current naive LIKE searching. There is a ranking algorithm in place and it does strongly prefer matching the name you've entered; it doubly prefers an exact package name match. This might solve the AGI problem and could probably produce good results using the current ranking algorithm. Not sure. Google's search algorithms are far advanced ;-) > Though I wouldn't say this is high priority at all. I personally never use > pypi search, just site:pypi.python.org on google. I also often use google - but I don't even bother with the site: bit. My go-to search is usually just "python ". I note though that unless I add "site:pypi.python.org" to the search even google struggles to suggest something on PyPI (try "python agi"...) Richard From robertc at robertcollins.net Sun Mar 10 09:52:27 2013 From: robertc at robertcollins.net (Robert Collins) Date: Sun, 10 Mar 2013 21:52:27 +1300 Subject: [Catalog-sig] Search engine relevance In-Reply-To: References: Message-ID: On 10 March 2013 21:23, Richard Jones wrote: > We could probably just use the text search stuff built into postgres, > rather than the current naive LIKE searching. There is a ranking > algorithm in place and it does strongly prefer matching the name > you've entered; it doubly prefers an exact package name match. This > might solve the AGI problem and could probably produce good results > using the current ranking algorithm. Not sure. Google's search > algorithms are far advanced ;-) tsearch2 is hard to get good results with - we had issues with that when I was working on Launchpad. -Rob -- Robert Collins Distinguished Technologist HP Cloud Services From holger at merlinux.eu Sun Mar 10 16:07:40 2013 From: holger at merlinux.eu (holger krekel) Date: Sun, 10 Mar 2013 15:07:40 +0000 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site Message-ID: <20130310150740.GE9677@merlinux.eu> Hi Donald, Richard, Nick, Philip, Marc-Andre, all, after some more thinking i wrote a simplified PEP draft for transitioning hosting of release files to pypi.python.org. A PEP is warranted IMO because the according changes will affect all python package maintainers and the Python packaging ecology in general. See the current draft (pre-submit-v1) further below in this mail. I also created a bitbucket repository, see "PEP-PYPI-DRAFT.txt" at https://bitbucket.org/hpk42/pep-pypi/src Donald, i'd be happy if you join as a co-author and contribute your statistics script and possibly more implementation stuff (PRs to pypi software etc.). Philip, Marc-Andre, Richard (Jones), Nick and catalog-sig/distutils-sig: scrutiny and feedback welcome. Nick: if you could collect feedback on the PEP (draft) around the packaging and distribution mini-summit at Pycon US (15th March), that'd be very useful. Richard: I may ask you to become BDFL-delegate for this PEP especially since you will need to integrate any resulting changes :) I'd like to formally submit this PEP soon but not before i got some feedback. I am not subscribed to distutils-sig and i think distutils is not much affected, but it probably still would help if someone cross-posts there (please put me in CC). cheers, holger PEP-draft: transition to release file hosting at pypi.python.org ================================================================= Status ----------- PRE-SUBMIT-v1 Abstract ------------ This PEP proposes to move hosting of all release files to pypi.python.org itself. To ease transition and minimize client-side friction, **no changes to distutils or installers** are required. Rather, the transition is implemented through changes to the pypi.python.org implementation and by interactions with package maintainers. Problem --------------- Today, python package installers (pip and easy_install) need to query multiple sites to discover release files. Apart from querying pypi.python.org's simple index pages, also all homepages and download pages ever specified with any release of a package need to be crawled by an installer. The need for installers to crawl 3rd party sites slows down installation and makes for a brittle unreliable installation process. As of March 2013, about 10% of packages have release files which are not hosted directly from pypi.python.org but rather from places referenced by download/homepage sites. Conversely, roughly 90% of packages are hosted directly on pypi.python.org [1]_. Even for them installers still need to crawl the homepage(s) of a package. Many package uploaders are particularly not aware that specifying the "homepage" will slow down the installation process. Solution ----------- Each package is going to get a "hosting mode" field which effects all historic and future releases of a package and its release files. The field has these values and meanings: - "pypi-ext" (transitional) encodes exactly the current mode of operations: homepage/download urls are presented in simple/ pages and client-side tools need to crawl them themselves to find release file links. - "pypi-cache": Release files located on remote sites will be downloaded and cached by pypi.python.org by crawling homepage/download metadata sites. The resulting simple index contains links to release files hosted by pypi.python.org. The original homepage/download links are added as links without a ``rel`` attribute if they have the ``#egg`` format. - "pypi-only": homepage/download links are served on simple indexes but without a ``rel`` attribute. Installation tools will thus not crawl those pages anymore. Use this option if you commit to always uploading your release files to pypi.python.org. Phases of transition ------------------------- 1. At the outset, we set hosting-mode to "pypi-ext" for all packages. This will not change any link served via the simple index and thus no bad effects are expected. Early adopters and testers may now change the mode to either pypi-only or pypy-cache to help with streamlining issues. After implementation and UI issues are streamlined, the next phase can start. 2. We perform automatic analysis for each package to determine if it is a package with externally hosted release files. Packages which only have release files on pypi.python.org are put in the group "A", those which have at least some packages outside are put in the group "B". We sent then a mail to all maintainers of packages in A that their hosting-mode is going to be switched automatically to "pypi-only" after N weeks, unless they visit their package administration page earlier and set it to either pypi-cache or pypi-only earlier. We sent then a mail to all maintainers of packages in B that their hosting-mode is going to be switched automatically to "pypi-cache" after N weeks, unless they visit their package administration page and set it to either pypi-only or pypi-cache earlier. 3. all packages will have a hosting mode of either "pypi-cache" or "pypi-only", resulting in installers to only query packages hosted through pypi.python.org. Transitioning to "pypi-cache" mode ------------------------------------- When transitioning from the currently implicit "pypi-ext" mode to "pypi-cache" for a given package, a package maintainer should be able to retrieve/verify the historic release files which will be cached from pypi.python.org. The UI should present this list and have the maintainer accept it for completing the transition to the "pypi-cache" mode. Upon future release registration actions, pypi.python.org will perform crawling for the homepage/download sites and cache release files *before* returning a success return code for the release registration. References ------------ .. [1] ratio of externally hosted versus pypi-hosted http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html Acknowledgments ---------------------- Donald Stufft for pushing away from external hosting and doing the 90/10 % statistics script and offering to implement a PR. Philip Eby for precise information and the basic idea to implement the transition via server-side changes only. Marc-Andre Lemburg, Nick Coghlan and catalog-sig for thinking through issues regarding getting rid of "external hosting". Copyright ----------------- This document has been placed in the public domain. From donald at stufft.io Sun Mar 10 18:35:00 2013 From: donald at stufft.io (Donald Stufft) Date: Sun, 10 Mar 2013 13:35:00 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <20130310150740.GE9677@merlinux.eu> References: <20130310150740.GE9677@merlinux.eu> Message-ID: <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> On Mar 10, 2013, at 11:07 AM, holger krekel wrote: > Hi Donald, Richard, Nick, Philip, Marc-Andre, all, > > after some more thinking i wrote a simplified PEP draft for > transitioning hosting of release files to pypi.python.org. A PEP is > warranted IMO because the according changes will affect all python > package maintainers and the Python packaging ecology in general. See > the current draft (pre-submit-v1) further below in this mail. > I also created a bitbucket repository, see "PEP-PYPI-DRAFT.txt" at > > https://bitbucket.org/hpk42/pep-pypi/src > > Donald, i'd be happy if you join as a co-author and contribute > your statistics script and possibly more implementation stuff (PRs > to pypi software etc.). > > Philip, Marc-Andre, Richard (Jones), Nick and catalog-sig/distutils-sig: > scrutiny and feedback welcome. > > Nick: if you could collect feedback on the PEP (draft) around the > packaging and distribution mini-summit at Pycon US (15th March), that'd > be very useful. > > Richard: I may ask you to become BDFL-delegate for this PEP especially > since you will need to integrate any resulting changes :) > > I'd like to formally submit this PEP soon but not before i got some > feedback. > > I am not subscribed to distutils-sig and i think distutils is not much > affected, but it probably still would help if someone cross-posts there > (please put me in CC). > > cheers, > holger > > > PEP-draft: transition to release file hosting at pypi.python.org > ================================================================= > > Status > ----------- > > PRE-SUBMIT-v1 > > Abstract > ------------ > > This PEP proposes to move hosting of all release files to > pypi.python.org itself. To ease transition and minimize client-side > friction, **no changes to distutils or installers** are required. > Rather, the transition is implemented through changes to the pypi.python.org > implementation and by interactions with package maintainers. > > Problem > --------------- > > Today, python package installers (pip and easy_install) need to > query multiple sites to discover release files. Apart from querying > pypi.python.org's simple index pages, also all homepages and > download pages ever specified with any release of a package need to > be crawled by an installer. The need for installers to crawl 3rd party > sites slows down installation and makes for a brittle unreliable > installation process. > > As of March 2013, about 10% of packages have release files which > are not hosted directly from pypi.python.org but rather from places > referenced by download/homepage sites. > > Conversely, roughly 90% of packages are hosted directly on > pypi.python.org [1]_. Even for them installers still need to crawl the > homepage(s) of a package. Many package uploaders are particularly not > aware that specifying the "homepage" will slow down the installation > process. > > > Solution > ----------- > > Each package is going to get a "hosting mode" field which effects > all historic and future releases of a package and its release files. > The field has these values and meanings: > > - "pypi-ext" (transitional) encodes exactly the current mode of operations: > homepage/download urls are presented in simple/ pages and client-side > tools need to crawl them themselves to find release file links. > > - "pypi-cache": Release files located on remote sites will be downloaded > and cached by pypi.python.org by crawling homepage/download metadata sites. > The resulting simple index contains links to release files hosted by > pypi.python.org. The original homepage/download links are added as > links without a ``rel`` attribute if they have the ``#egg`` format. > > - "pypi-only": homepage/download links are served on simple indexes > but without a ``rel`` attribute. Installation tools will thus not > crawl those pages anymore. Use this option if you commit to always > uploading your release files to pypi.python.org. > > > Phases of transition > ------------------------- > > 1. At the outset, we set hosting-mode to "pypi-ext" for all packages. > This will not change any link served via the simple index and thus > no bad effects are expected. Early adopters and testers may now > change the mode to either pypi-only or pypy-cache to help with > streamlining issues. After implementation and UI issues are > streamlined, the next phase can start. > > 2. We perform automatic analysis for each package to determine if it is > a package with externally hosted release files. Packages which only > have release files on pypi.python.org are put in the group "A", > those which have at least some packages outside are put in the group "B". > > We sent then a mail to all maintainers of packages in A > that their hosting-mode is going to be switched automatically to > "pypi-only" after N weeks, unless they visit their package > administration page earlier and set it to either pypi-cache or > pypi-only earlier. > > We sent then a mail to all maintainers of packages in B > that their hosting-mode is going to be switched automatically to > "pypi-cache" after N weeks, unless they visit their package > administration page and set it to either pypi-only or > pypi-cache earlier. > > 3. all packages will have a hosting mode of either "pypi-cache" > or "pypi-only", resulting in installers to only query > packages hosted through pypi.python.org. > > > Transitioning to "pypi-cache" mode > ------------------------------------- > > When transitioning from the currently implicit "pypi-ext" mode to > "pypi-cache" for a given package, a package maintainer should > be able to retrieve/verify the historic release files which will > be cached from pypi.python.org. The UI should present this list > and have the maintainer accept it for completing the transition > to the "pypi-cache" mode. Upon future release registration actions, > pypi.python.org will perform crawling for the homepage/download sites > and cache release files *before* returning a success return code for > the release registration. > > > References > ------------ > > .. [1] ratio of externally hosted versus pypi-hosted http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html > > Acknowledgments > ---------------------- > > Donald Stufft for pushing away from external hosting and doing > the 90/10 % statistics script and offering to implement a PR. > > Philip Eby for precise information and the basic idea to > implement the transition via server-side changes only. > > Marc-Andre Lemburg, Nick Coghlan and catalog-sig for thinking > through issues regarding getting rid of "external hosting". > > > Copyright > ----------------- > > This document has been placed in the public domain. > > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig Some concerns: 1. We cannot automatically switch people to pypi-cache. We _have_ to get explicit permission from them. 2. The cache mechanism is going to be fragile, and in the long term leaves a window open for security issues. If we're going to do a phased in per project solution like this I think it would work much better to have 2 modes. 1. Legacy - Current behavior, new external links are accepted, existing ones are displayed 2. PyPI Only - New behavior, no new external links are accepted, existing ones are removed Present the project owners with 2 one way buttons: - Switch to PyPI Only and re-host external files [1] - Switch to PyPI Only and do NOT re-host external files These buttons would be one time and quit. Once your project has been switched to PyPI Only you cannot go back to Legacy mode. All new projects would be already switched to PyPI Only. After some amount of time switch all Projects to PyPI Only but _do not_ re-host their packages as we cannot legally do so without their permission. The above is simpler, still provides people an easy migration path, moves us to remove external hosting, and doesn't entangle us with legal issues. [1] There is still a small window here where someone could MITM PyPI fetching these files, however since it would be a one time and down deal this risk is minimal and is worth it to move to an pypi only solution. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From jnoller at gmail.com Sun Mar 10 18:46:32 2013 From: jnoller at gmail.com (Jesse Noller) Date: Sun, 10 Mar 2013 13:46:32 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> Message-ID: +1 On Mar 10, 2013, at 1:35 PM, Donald Stufft wrote: > > On Mar 10, 2013, at 11:07 AM, holger krekel wrote: > >> Hi Donald, Richard, Nick, Philip, Marc-Andre, all, >> >> after some more thinking i wrote a simplified PEP draft for >> transitioning hosting of release files to pypi.python.org. A PEP is >> warranted IMO because the according changes will affect all python >> package maintainers and the Python packaging ecology in general. See >> the current draft (pre-submit-v1) further below in this mail. >> I also created a bitbucket repository, see "PEP-PYPI-DRAFT.txt" at >> >> https://bitbucket.org/hpk42/pep-pypi/src >> >> Donald, i'd be happy if you join as a co-author and contribute >> your statistics script and possibly more implementation stuff (PRs >> to pypi software etc.). >> >> Philip, Marc-Andre, Richard (Jones), Nick and catalog-sig/distutils-sig: >> scrutiny and feedback welcome. >> >> Nick: if you could collect feedback on the PEP (draft) around the >> packaging and distribution mini-summit at Pycon US (15th March), that'd >> be very useful. >> >> Richard: I may ask you to become BDFL-delegate for this PEP especially >> since you will need to integrate any resulting changes :) >> >> I'd like to formally submit this PEP soon but not before i got some >> feedback. >> >> I am not subscribed to distutils-sig and i think distutils is not much >> affected, but it probably still would help if someone cross-posts there >> (please put me in CC). >> >> cheers, >> holger >> >> >> PEP-draft: transition to release file hosting at pypi.python.org >> ================================================================= >> >> Status >> ----------- >> >> PRE-SUBMIT-v1 >> >> Abstract >> ------------ >> >> This PEP proposes to move hosting of all release files to >> pypi.python.org itself. To ease transition and minimize client-side >> friction, **no changes to distutils or installers** are required. >> Rather, the transition is implemented through changes to the pypi.python.org >> implementation and by interactions with package maintainers. >> >> Problem >> --------------- >> >> Today, python package installers (pip and easy_install) need to >> query multiple sites to discover release files. Apart from querying >> pypi.python.org's simple index pages, also all homepages and >> download pages ever specified with any release of a package need to >> be crawled by an installer. The need for installers to crawl 3rd party >> sites slows down installation and makes for a brittle unreliable >> installation process. >> >> As of March 2013, about 10% of packages have release files which >> are not hosted directly from pypi.python.org but rather from places >> referenced by download/homepage sites. >> >> Conversely, roughly 90% of packages are hosted directly on >> pypi.python.org [1]_. Even for them installers still need to crawl the >> homepage(s) of a package. Many package uploaders are particularly not >> aware that specifying the "homepage" will slow down the installation >> process. >> >> >> Solution >> ----------- >> >> Each package is going to get a "hosting mode" field which effects >> all historic and future releases of a package and its release files. >> The field has these values and meanings: >> >> - "pypi-ext" (transitional) encodes exactly the current mode of operations: >> homepage/download urls are presented in simple/ pages and client-side >> tools need to crawl them themselves to find release file links. >> >> - "pypi-cache": Release files located on remote sites will be downloaded >> and cached by pypi.python.org by crawling homepage/download metadata sites. >> The resulting simple index contains links to release files hosted by >> pypi.python.org. The original homepage/download links are added as >> links without a ``rel`` attribute if they have the ``#egg`` format. >> >> - "pypi-only": homepage/download links are served on simple indexes >> but without a ``rel`` attribute. Installation tools will thus not >> crawl those pages anymore. Use this option if you commit to always >> uploading your release files to pypi.python.org. >> >> >> Phases of transition >> ------------------------- >> >> 1. At the outset, we set hosting-mode to "pypi-ext" for all packages. >> This will not change any link served via the simple index and thus >> no bad effects are expected. Early adopters and testers may now >> change the mode to either pypi-only or pypy-cache to help with >> streamlining issues. After implementation and UI issues are >> streamlined, the next phase can start. >> >> 2. We perform automatic analysis for each package to determine if it is >> a package with externally hosted release files. Packages which only >> have release files on pypi.python.org are put in the group "A", >> those which have at least some packages outside are put in the group "B". >> >> We sent then a mail to all maintainers of packages in A >> that their hosting-mode is going to be switched automatically to >> "pypi-only" after N weeks, unless they visit their package >> administration page earlier and set it to either pypi-cache or >> pypi-only earlier. >> >> We sent then a mail to all maintainers of packages in B >> that their hosting-mode is going to be switched automatically to >> "pypi-cache" after N weeks, unless they visit their package >> administration page and set it to either pypi-only or >> pypi-cache earlier. >> >> 3. all packages will have a hosting mode of either "pypi-cache" >> or "pypi-only", resulting in installers to only query >> packages hosted through pypi.python.org. >> >> >> Transitioning to "pypi-cache" mode >> ------------------------------------- >> >> When transitioning from the currently implicit "pypi-ext" mode to >> "pypi-cache" for a given package, a package maintainer should >> be able to retrieve/verify the historic release files which will >> be cached from pypi.python.org. The UI should present this list >> and have the maintainer accept it for completing the transition >> to the "pypi-cache" mode. Upon future release registration actions, >> pypi.python.org will perform crawling for the homepage/download sites >> and cache release files *before* returning a success return code for >> the release registration. >> >> >> References >> ------------ >> >> .. [1] ratio of externally hosted versus pypi-hosted http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html >> >> Acknowledgments >> ---------------------- >> >> Donald Stufft for pushing away from external hosting and doing >> the 90/10 % statistics script and offering to implement a PR. >> >> Philip Eby for precise information and the basic idea to >> implement the transition via server-side changes only. >> >> Marc-Andre Lemburg, Nick Coghlan and catalog-sig for thinking >> through issues regarding getting rid of "external hosting". >> >> >> Copyright >> ----------------- >> >> This document has been placed in the public domain. >> >> >> _______________________________________________ >> Catalog-SIG mailing list >> Catalog-SIG at python.org >> http://mail.python.org/mailman/listinfo/catalog-sig > > Some concerns: > > 1. We cannot automatically switch people to pypi-cache. We _have_ to get explicit permission from them. > 2. The cache mechanism is going to be fragile, and in the long term leaves a window open for security issues. > > If we're going to do a phased in per project solution like this I think it would work much better to have 2 modes. > > 1. Legacy - Current behavior, new external links are accepted, existing ones are displayed > 2. PyPI Only - New behavior, no new external links are accepted, existing ones are removed > > Present the project owners with 2 one way buttons: > - Switch to PyPI Only and re-host external files [1] > - Switch to PyPI Only and do NOT re-host external files > > These buttons would be one time and quit. Once your project has been switched to PyPI Only you cannot go back to Legacy mode. All new projects would be already switched to PyPI Only. After some amount of time switch all Projects to PyPI Only but _do not_ re-host their packages as we cannot legally do so without their permission. > > The above is simpler, still provides people an easy migration path, moves us to remove external hosting, and doesn't entangle us with legal issues. > > [1] There is still a small window here where someone could MITM PyPI fetching these files, however since it would be a one time and down deal this risk is minimal and is worth it to move to an pypi only solution. > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig From holger at merlinux.eu Sun Mar 10 19:18:28 2013 From: holger at merlinux.eu (holger krekel) Date: Sun, 10 Mar 2013 18:18:28 +0000 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> Message-ID: <20130310181828.GH9677@merlinux.eu> On Sun, Mar 10, 2013 at 13:35 -0400, Donald Stufft wrote: > On Mar 10, 2013, at 11:07 AM, holger krekel wrote: > > [...] > > Transitioning to "pypi-cache" mode > > ------------------------------------- > > > > When transitioning from the currently implicit "pypi-ext" mode to > > "pypi-cache" for a given package, a package maintainer should > > be able to retrieve/verify the historic release files which will > > be cached from pypi.python.org. The UI should present this list > > and have the maintainer accept it for completing the transition > > to the "pypi-cache" mode. Upon future release registration actions, > > pypi.python.org will perform crawling for the homepage/download sites > > and cache release files *before* returning a success return code for > > the release registration. > > [...] > > Some concerns: > > 1. We cannot automatically switch people to pypi-cache. We _have_ to get explicit permission from them. Could you detail how you arrive at this conclusion? (I've seen the claim before but not the underlying reasoning, maybe i just missed it) There would be prior notifications to the package maintainers. If they don't want to have their packages cached at pypi.python.org, they can set the mode to "pypi-only" and leave manual instructions. I suspect there will be very few people if anyone, objecting to pypi-cache mode. If that is false we might need to prolong pypi-ext mode some more for them and eventually switch them to pypi-only when we eventually decide to get rid of external hosting. > 2. The cache mechanism is going to be fragile, and in the long term leaves a window open for security issues. fragility: not sure it's too bad. Once the mode is activited release registration ("submit" POST action on "/pypi" http endpoint) will only succeed if according releases can be found through homepage/download. Changing the mode to pypi-cache in the presence of historic release files hosted elsewhere needs a good pypi.python.org UI interaction and may take several tries if neccessary sites cannot be reached. Nevertheless, this step is potentially fragile [X]. Security: the PEP does not try to prevent package tampering. MITM attacks between pypi.python.org and the download sites may occur as much as they can happen today between installers and the download sites. I think we should consider protection against package tampering in a separate discussion/PEP. > If we're going to do a phased in per project solution like this I think it would work much better to have 2 modes. > > 1. Legacy - Current behavior, new external links are accepted, existing ones are displayed > 2. PyPI Only - New behavior, no new external links are accepted, existing ones are removed > > Present the project owners with 2 one way buttons: > - Switch to PyPI Only and re-host external files [1] Doesn't this have the same fragility problem as [X] above? > - Switch to PyPI Only and do NOT re-host external files Are there any problems for doing this automatically (with a prior notification to maintainers) for all the projects where we don't find externally hosted packages? I'd expect very few false negatives and they can be quickly switched back. Back to pypi-cache: it is there to make it super-easy for package maintainers. There are all kinds of release habits and scripts pushing out things to google/bitbucket/github/other sites. With "pypi-cache" they don't need to change any of that. They just need to be fine with pypi.python.org pulling in the packages for caching. We might think about phasing out pypi-cache after some larger time frame so that we eventually only have pypi-only and things are eventually simple and saner. best, holger > These buttons would be one time and quit. Once your project has been switched to PyPI Only you cannot go back to Legacy mode. All new projects would be already switched to PyPI Only. After some amount of time switch all Projects to PyPI Only but _do not_ re-host their packages as we cannot legally do so without their permission. > > The above is simpler, still provides people an easy migration path, moves us to remove external hosting, and doesn't entangle us with legal issues. > > [1] There is still a small window here where someone could MITM PyPI fetching these files, however since it would be a one time and down deal this risk is minimal and is worth it to move to an pypi only solution. > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > From donald at stufft.io Sun Mar 10 19:29:34 2013 From: donald at stufft.io (Donald Stufft) Date: Sun, 10 Mar 2013 14:29:34 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <20130310181828.GH9677@merlinux.eu> References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> Message-ID: On Mar 10, 2013, at 2:18 PM, holger krekel wrote: > On Sun, Mar 10, 2013 at 13:35 -0400, Donald Stufft wrote: >> On Mar 10, 2013, at 11:07 AM, holger krekel wrote: >>> [...] >>> Transitioning to "pypi-cache" mode >>> ------------------------------------- >>> >>> When transitioning from the currently implicit "pypi-ext" mode to >>> "pypi-cache" for a given package, a package maintainer should >>> be able to retrieve/verify the historic release files which will >>> be cached from pypi.python.org. The UI should present this list >>> and have the maintainer accept it for completing the transition >>> to the "pypi-cache" mode. Upon future release registration actions, >>> pypi.python.org will perform crawling for the homepage/download sites >>> and cache release files *before* returning a success return code for >>> the release registration. >>> [...] >> >> Some concerns: >> >> 1. We cannot automatically switch people to pypi-cache. We _have_ to get explicit permission from them. > > Could you detail how you arrive at this conclusion? > (I've seen the claim before but not the underlying reasoning, maybe > i just missed it) > > There would be prior notifications to the package maintainers. If they > don't want to have their packages cached at pypi.python.org, they can set > the mode to "pypi-only" and leave manual instructions. I suspect there will > be very few people if anyone, objecting to pypi-cache mode. If that is > false we might need to prolong pypi-ext mode some more for them and > eventually switch them to pypi-only when we eventually decide to get > rid of external hosting. I asked VanL. His statement on re-hosting packages was: "We could do it if we had permission. The tricky part would be getting permission for already-existing packages." I'm pretty sure that emailing someone and assuming we have permission if they don't opt-out doesn't count as permission. > >> 2. The cache mechanism is going to be fragile, and in the long term leaves a window open for security issues. > > fragility: not sure it's too bad. Once the mode is activited release > registration ("submit" POST action on "/pypi" http endpoint) will only > succeed if according releases can be found through homepage/download. > Changing the mode to pypi-cache in the presence of historic release > files hosted elsewhere needs a good pypi.python.org UI interaction and > may take several tries if neccessary sites cannot be reached. Nevertheless, > this step is potentially fragile [X]. I see, so pypi-cache would only be triggered once during release creation. Cache makes it sound like we'd continuously monitor the given external urls instead of it actually being a pull based method of getting files. > > Security: the PEP does not try to prevent package tampering. MITM attacks > between pypi.python.org and the download sites may occur as much as they > can happen today between installers and the download sites. > I think we should consider protection against package tampering > in a separate discussion/PEP. > >> If we're going to do a phased in per project solution like this I think it would work much better to have 2 modes. >> >> 1. Legacy - Current behavior, new external links are accepted, existing ones are displayed > >> 2. PyPI Only - New behavior, no new external links are accepted, existing ones are removed >> >> Present the project owners with 2 one way buttons: >> - Switch to PyPI Only and re-host external files [1] > > Doesn't this have the same fragility problem as [X] above? Yes, and any pull based solution will. The difference is with a one time and done solution we can live with a little bit more fragility. > >> - Switch to PyPI Only and do NOT re-host external files > > Are there any problems for doing this automatically (with a prior > notification to maintainers) for all the projects where we don't > find externally hosted packages? I'd expect very few false negatives > and they can be quickly switched back. Only thing I could think of is a host being temporarily down being counted as a false positive. > > Back to pypi-cache: it is there to make it super-easy for package > maintainers. There are all kinds of release habits and scripts pushing out > things to google/bitbucket/github/other sites. With "pypi-cache" they > don't need to change any of that. They just need to be fine with > pypi.python.org pulling in the packages for caching. Yes I understand the goal here. The problem is that there's not really a good way to secure this without requiring changes to their workflow. At best they'll have to push information about every file so that PyPI is able to verify the files it is downloading, and if we are requiring them to push data about those files we might as well require them to push the files themselves. This also has the effect we can provide immediate feedback when files do not validate on PyPI. > > We might think about phasing out pypi-cache after some larger time > frame so that we eventually only have pypi-only and things are eventually > simple and saner. > > best, > holger > > > >> These buttons would be one time and quit. Once your project has been switched to PyPI Only you cannot go back to Legacy mode. All new projects would be already switched to PyPI Only. After some amount of time switch all Projects to PyPI Only but _do not_ re-host their packages as we cannot legally do so without their permission. >> >> The above is simpler, still provides people an easy migration path, moves us to remove external hosting, and doesn't entangle us with legal issues. >> >> [1] There is still a small window here where someone could MITM PyPI fetching these files, however since it would be a one time and down deal this risk is minimal and is worth it to move to an pypi only solution. >> >> ----------------- >> Donald Stufft >> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA >> > > ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From asmeurer at gmail.com Sun Mar 10 19:51:15 2013 From: asmeurer at gmail.com (Aaron Meurer) Date: Sun, 10 Mar 2013 12:51:15 -0600 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> Message-ID: <-8601578799313976966@unknownmsgid> On Mar 10, 2013, at 12:29 PM, Donald Stufft wrote: > > On Mar 10, 2013, at 2:18 PM, holger krekel wrote: > >> On Sun, Mar 10, 2013 at 13:35 -0400, Donald Stufft wrote: >>> On Mar 10, 2013, at 11:07 AM, holger krekel wrote: >>>> [...] >>>> Transitioning to "pypi-cache" mode >>>> ------------------------------------- >>>> >>>> When transitioning from the currently implicit "pypi-ext" mode to >>>> "pypi-cache" for a given package, a package maintainer should >>>> be able to retrieve/verify the historic release files which will >>>> be cached from pypi.python.org. The UI should present this list >>>> and have the maintainer accept it for completing the transition >>>> to the "pypi-cache" mode. Upon future release registration actions, >>>> pypi.python.org will perform crawling for the homepage/download sites >>>> and cache release files *before* returning a success return code for >>>> the release registration. >>>> [...] >>> >>> Some concerns: >>> >>> 1. We cannot automatically switch people to pypi-cache. We _have_ to get explicit permission from them. >> >> Could you detail how you arrive at this conclusion? >> (I've seen the claim before but not the underlying reasoning, maybe >> i just missed it) >> >> There would be prior notifications to the package maintainers. If they >> don't want to have their packages cached at pypi.python.org, they can set >> the mode to "pypi-only" and leave manual instructions. I suspect there will >> be very few people if anyone, objecting to pypi-cache mode. If that is >> false we might need to prolong pypi-ext mode some more for them and >> eventually switch them to pypi-only when we eventually decide to get >> rid of external hosting. > > I asked VanL. His statement on re-hosting packages was: > > "We could do it if we had permission. The tricky part would be getting permission for already-existing packages." > > I'm pretty sure that emailing someone and assuming we have permission if they don't opt-out doesn't count as permission. > >> >>> 2. The cache mechanism is going to be fragile, and in the long term leaves a window open for security issues. >> >> fragility: not sure it's too bad. Once the mode is activited release >> registration ("submit" POST action on "/pypi" http endpoint) will only >> succeed if according releases can be found through homepage/download. >> Changing the mode to pypi-cache in the presence of historic release >> files hosted elsewhere needs a good pypi.python.org UI interaction and >> may take several tries if neccessary sites cannot be reached. Nevertheless, >> this step is potentially fragile [X]. > > I see, so pypi-cache would only be triggered once during release creation. Cache makes it sound like we'd continuously monitor the given external urls instead of it actually being a pull based method of getting files. I think the term "mirror" is more accurate than "cache" here. Aaron Meurer > >> >> Security: the PEP does not try to prevent package tampering. MITM attacks >> between pypi.python.org and the download sites may occur as much as they >> can happen today between installers and the download sites. >> I think we should consider protection against package tampering >> in a separate discussion/PEP. >> >>> If we're going to do a phased in per project solution like this I think it would work much better to have 2 modes. >>> >>> 1. Legacy - Current behavior, new external links are accepted, existing ones are displayed >> >>> 2. PyPI Only - New behavior, no new external links are accepted, existing ones are removed >>> >>> Present the project owners with 2 one way buttons: >>> - Switch to PyPI Only and re-host external files [1] >> >> Doesn't this have the same fragility problem as [X] above? > > Yes, and any pull based solution will. The difference is with a one time and done solution we can live with a little bit more fragility. > >> >>> - Switch to PyPI Only and do NOT re-host external files >> >> Are there any problems for doing this automatically (with a prior >> notification to maintainers) for all the projects where we don't >> find externally hosted packages? I'd expect very few false negatives >> and they can be quickly switched back. > > Only thing I could think of is a host being temporarily down being counted as a false positive. > >> >> Back to pypi-cache: it is there to make it super-easy for package >> maintainers. There are all kinds of release habits and scripts pushing out >> things to google/bitbucket/github/other sites. With "pypi-cache" they >> don't need to change any of that. They just need to be fine with >> pypi.python.org pulling in the packages for caching. > > Yes I understand the goal here. The problem is that there's not really a good way to secure this without requiring changes to their workflow. At best they'll have to push information about every file so that PyPI is able to verify the files it is downloading, and if we are requiring them to push data about those files we might as well require them to push the files themselves. This also has the effect we can provide immediate feedback when files do not validate on PyPI. > >> >> We might think about phasing out pypi-cache after some larger time >> frame so that we eventually only have pypi-only and things are eventually >> simple and saner. >> >> best, >> holger >> >> >> >>> These buttons would be one time and quit. Once your project has been switched to PyPI Only you cannot go back to Legacy mode. All new projects would be already switched to PyPI Only. After some amount of time switch all Projects to PyPI Only but _do not_ re-host their packages as we cannot legally do so without their permission. >>> >>> The above is simpler, still provides people an easy migration path, moves us to remove external hosting, and doesn't entangle us with legal issues. >>> >>> [1] There is still a small window here where someone could MITM PyPI fetching these files, however since it would be a one time and down deal this risk is minimal and is worth it to move to an pypi only solution. >>> >>> ----------------- >>> Donald Stufft >>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig From pje at telecommunity.com Sun Mar 10 20:41:50 2013 From: pje at telecommunity.com (PJ Eby) Date: Sun, 10 Mar 2013 15:41:50 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <20130310150740.GE9677@merlinux.eu> References: <20130310150740.GE9677@merlinux.eu> Message-ID: On Sun, Mar 10, 2013 at 11:07 AM, holger krekel wrote: > Philip, Marc-Andre, Richard (Jones), Nick and catalog-sig/distutils-sig: > scrutiny and feedback welcome. Hi Holger. I'm having some difficulty interpreting your proposal because it is leaving out some things, and in other places contradicting what I know of how the tools work. It is also a bit at odds with itself in some places. For instance, at the beginning, the PEP states its proposed solution is to host all release files on PyPI, but then the problem section describes the problems that arise from crawling external pages: problems that can be solved without actually hosting the files on PyPI. To me, it needs a clearer explanation of why the actual hosting part also needs to be on PyPI, not just the links. In the threads to date, people have argued about uptime, security, etc., and these points are not covered by the PEP or even really touched on for the most part. (Actually, thinking about that makes me wonder.... Donald: did your analysis collect any stats on *where* those externally hosted files were hosted? My intuition says that the bulk of the files (by *file count*) will come from a handful of highly-available domains, i.e. sourceforge, github, that sort of thing, with actual self-hosting being relatively rare *for the files themselves*, vs. a much wider range of domains for the homepage/download URLs (especially because those change from one release to the next.) If that's true, then most complaints about availability are being caused by crawling multiple not-highly-available HTML pages, *not* by the downloading of the actual files. If my intuition about the distribution is wrong, OTOH, it would provide a stronger argument for moving the files themselves to PyPI as well.) Digression aside, this is one of things that needs to be clearer so that there's a better explanation for package authors as to why they're being asked to change. And although the base argument is good ("specifying the "homepage" will slow down the installation process"), it could be amplified further with an example of some project that has had multiple homepages over its lifetime, listing all the URLs that currently must be crawled before an installer can be sure it has found all available versions, platforms, and formats of the that project. Okay, on to the Solution section. Again, your stated problem is to fix crawling, but the solution is all about file hosting. Regardless of which of these three "hosting modes" is selected, it remains an option for the developer to host files elsewhere, and provide the links in their description... unless of course you intended to rule that out and forgot to mention it. (Or, I suppose, if you did *not* intend to rule it out and intentionally omitted mention of that so the rabid anti-externalists would think you were on their side and not create further controversy... in which case I've now spoiled things. Darn. ;-) ) Some technical details are also either incorrect or confusing. For example, you state that "The original homepage/download links are added as links without a ``rel`` attribute if they have the ``#egg`` format". But if they are added without a rel attribute, it doesn't *matter* whether they have an #egg marker or not. It is quite possible for a PyPI package to have a download_url of say, "http://sourceforge.net/download/someproject-1.2.tgz". Thus, I would suggest simply stating that changing hosting mode does not actually remove any links from the /simple index, it just removes the rel="" attributes from the "Home page" and "Download" links, thus preventing them from being crawled in search of additional file links. With that out of the way, that brings me to the larger scope issue with the modes as presented. Notice now that with this clarification, there is no real difference in *state* between the "pypi-cache" and "pypi-only" modes. There is only a *functional* difference... and that function is underspecified in the PEP. What I mean is, in both pypi-cache and pypi-only, the *state* of things is that rel="" attributes are gone, and there are links to files on PyPI. The only difference is in *how* the files get there. And for the pypi-cache mode, this function is *really* under-specified. Arguably, this is the meat of the proposal, but it is entirely missing. There is nothing here about the frequency of crawling, the methods used to select or validate files, whether there is any expiration... it is all just magically assumed to happen somehow. My suggestion would be to do two things: First, make the state a boolean: crawl external links, with the current state yes and the future state no, with "no" simply meaning that the rel="" attribute is removed from the links that currently have it. Second, propose to offer tools in the PyPI interface (and command line) to assist authors in making the transition, rather than proposing a completely unspecified caching mechanism. Better to have some vaguely specified tools than a completely unspecified caching mechanism, and better still to spell out very precisely what those tools do. Okay, on to the "Phases of transtion". This section gets a lot simpler if there are only two stages. Specifically, we let everyone know the change is going to happen, and how long they have, give 'em links to migration tools. Done. ;-) (Okay, so analysis still makes sense: the people who don't have any externally hosted files can get a different message, i.e., "Hey, we notice that installing your package is slow because you have these links that don't go anywhere. Click here if you'd like PyPI to stop sending people on wild goose chases". The people who have external hosted files will need a more involved message.) Whew. Okay, that ends my critique of the PEP as it sits. Now for an outside-the-box suggestion. If you'd like to be able to transition people away from spidered links in the fewest possible steps, with the least user action, no legal issues, and in a completely automated way, note that this can be done with a one-time spidering of the existing links to find the download links, then adding those links directly to the /simple index, and switching off the rel="" attributes. This can be done without explicit user consent, though they can be given the chance to do it manually, sooner. To implement this you'd need two project-level (*not* release-level) fields: one to indicate whether the project is using rel="" or not, and one to contain the list of external download links, which would be user-editable. This overall approach I'm proposing can be extended to also support mirroring, since it provides an explicit place to list what it is you're mirroring. (At any rate, it's more explicitly specified than any such place in the current PEP.) That field can also be fairly easily populated for any given project in just a few lines of code: from pkg_resources import Requirement pr = Requirement.parse('Projectname') from setuptools.package_index import PackageIndex pi = PackageIndex(search_path=[], python=None, platform=None) pi.find_packages(pr) all_urls = dist.location for dist in pi[pr.key] external_urls = [ url for url in all_urls if not '//pypi.python.org' in url] (Although if you want more information, like what kind of link each one is, the dist objects actually know a bit more than just the URL.) Anyway, I hope you found at least some of all this helpful. ;-) From holger at merlinux.eu Sun Mar 10 20:54:05 2013 From: holger at merlinux.eu (holger krekel) Date: Sun, 10 Mar 2013 19:54:05 +0000 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> Message-ID: <20130310195405.GI9677@merlinux.eu> On Sun, Mar 10, 2013 at 14:29 -0400, Donald Stufft wrote: > > On Mar 10, 2013, at 2:18 PM, holger krekel wrote: > > > On Sun, Mar 10, 2013 at 13:35 -0400, Donald Stufft wrote: > >> On Mar 10, 2013, at 11:07 AM, holger krekel wrote: > >>> [...] > >>> Transitioning to "pypi-cache" mode > >>> ------------------------------------- > >>> > >>> When transitioning from the currently implicit "pypi-ext" mode to > >>> "pypi-cache" for a given package, a package maintainer should > >>> be able to retrieve/verify the historic release files which will > >>> be cached from pypi.python.org. The UI should present this list > >>> and have the maintainer accept it for completing the transition > >>> to the "pypi-cache" mode. Upon future release registration actions, > >>> pypi.python.org will perform crawling for the homepage/download sites > >>> and cache release files *before* returning a success return code for > >>> the release registration. > >>> [...] > >> > >> Some concerns: > >> > >> 1. We cannot automatically switch people to pypi-cache. We _have_ to get explicit permission from them. > > > > Could you detail how you arrive at this conclusion? > > (I've seen the claim before but not the underlying reasoning, maybe > > i just missed it) > > > > There would be prior notifications to the package maintainers. If they > > don't want to have their packages cached at pypi.python.org, they can set > > the mode to "pypi-only" and leave manual instructions. I suspect there will > > be very few people if anyone, objecting to pypi-cache mode. If that is > > false we might need to prolong pypi-ext mode some more for them and > > eventually switch them to pypi-only when we eventually decide to get > > rid of external hosting. > > I asked VanL. His statement on re-hosting packages was: > > "We could do it if we had permission. The tricky part would be getting permission for already-existing packages." > > I'm pretty sure that emailing someone and assuming we have permission if they don't opt-out doesn't count as permission. Hum, i I saw Jesse Noller saying a few days ago "let them opt out". But i guess VanL can trump that :) If that is true we could change the notification to maintainers of B packages that hosting mode is going to change to pypi-only, which would loose their release files unless they opt-in to pypi-cache. As long as that is a no-brainer for them, we are not asking for much and can count on most people's good will to not make other people's installation life harder. Besides, admins could still set the "pypi-ext" mode if a maintainer can explain why it's a problem for them to agree to "pypi-cache" or "pypi-only". I'd really like to not have too many packages lingering around in "pypi-ext" mode if it can be avoided. > > > >> 2. The cache mechanism is going to be fragile, and in the long term leaves a window open for security issues. > > > > fragility: not sure it's too bad. Once the mode is activited release > > registration ("submit" POST action on "/pypi" http endpoint) will only > > succeed if according releases can be found through homepage/download. > > Changing the mode to pypi-cache in the presence of historic release > > files hosted elsewhere needs a good pypi.python.org UI interaction and > > may take several tries if neccessary sites cannot be reached. Nevertheless, > > this step is potentially fragile [X]. > > I see, so pypi-cache would only be triggered once during release creation. Cache makes it sound like we'd continuously monitor the given external urls instead of it actually being a pull based method of getting files. Right, we need to avoid cache invalidation problems by only allowing updates at user-chosen point in times (there might also be an explicit "update cache" button in case a maintainer pushes a egg/wheel later). It's still technically a cache i think but the term "rehost" would work as well i guess. > [...] > > Back to pypi-cache: it is there to make it super-easy for package > > maintainers. There are all kinds of release habits and scripts > > pushing out things to google/bitbucket/github/other sites. With > > "pypi-cache" they don't need to change any of that. They just need > > to be fine with pypi.python.org pulling in the packages for caching. > > Yes I understand the goal here. The problem is that there's not really > a good way to secure this without requiring changes to their workflow. > At best they'll have to push information about every file so that PyPI > is able to verify the files it is downloading, and if we are requiring > them to push data about those files we might as well require them to > push the files themselves. Is this about protection against package tampering? If so, I think a proper solution involves maintainers signing their release files but this is outside the intended scope of the PEP. Otherwise, the "re-hosting" process for pypi-cache mode is at least as secure as currently where all hosts issuing pip/easy_install commands visit external sites and can thus be MITM-attacked. For pypi-only server packages it's safer because no crawling takes place. In any case, asking people to change their release process is not a no-brainer. The PEP tries to avoid this source of friction. That being said, i think we both agree to recommend maintainers to (eventually) go for pypi-only and change their release processes accordingly. This PEP is not the end of the story of evolving package hosting and i'd like to be careful about asking maintainers to change how they do things. > This also has the effect we can provide > immediate feedback when files do not validate on PyPI. At release registration or switch-to-pypi-rehost time we could also do package validation but i am inclined to see this as out of scope for this PEP which tries to focus on the minimal steps to move from pypi-ext to everything-hosted-through-pypi.python.org. cheers, holger > > > > > We might think about phasing out pypi-cache after some larger time > > frame so that we eventually only have pypi-only and things are eventually > > simple and saner. > > > > best, > > holger > > > > > > > >> These buttons would be one time and quit. Once your project has been switched to PyPI Only you cannot go back to Legacy mode. All new projects would be already switched to PyPI Only. After some amount of time switch all Projects to PyPI Only but _do not_ re-host their packages as we cannot legally do so without their permission. > >> > >> The above is simpler, still provides people an easy migration path, moves us to remove external hosting, and doesn't entangle us with legal issues. > >> > >> [1] There is still a small window here where someone could MITM PyPI fetching these files, however since it would be a one time and down deal this risk is minimal and is worth it to move to an pypi only solution. > >> > >> ----------------- > >> Donald Stufft > >> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > >> > > > > > > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > From pje at telecommunity.com Sun Mar 10 20:59:43 2013 From: pje at telecommunity.com (PJ Eby) Date: Sun, 10 Mar 2013 15:59:43 -0400 Subject: [Catalog-sig] Search engine relevance In-Reply-To: References: Message-ID: On Sun, Mar 10, 2013 at 4:23 AM, Richard Jones wrote: > This might solve the AGI problem and could probably produce good results > using the current ranking algorithm. Not sure. Google's search > algorithms are far advanced ;-) Heh. This just gave me a bit of a chuckle, taken out of context. "AGI", you see, is also an acronym for "artificial general intelligence", so for a moment there I thought you were suggesting that using Postgres rankings properly could bring about the Singularity. ;-) From jnoller at gmail.com Sun Mar 10 21:03:42 2013 From: jnoller at gmail.com (Jesse Noller) Date: Sun, 10 Mar 2013 16:03:42 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <20130310195405.GI9677@merlinux.eu> References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> Message-ID: <1639F813-A646-4ECF-BDF1-A5C581A64CE0@gmail.com> I said that before we talked to a lawyer On Mar 10, 2013, at 3:54 PM, holger krekel wrote: > On Sun, Mar 10, 2013 at 14:29 -0400, Donald Stufft wrote: >> >> On Mar 10, 2013, at 2:18 PM, holger krekel wrote: >> >>> On Sun, Mar 10, 2013 at 13:35 -0400, Donald Stufft wrote: >>>> On Mar 10, 2013, at 11:07 AM, holger krekel wrote: >>>>> [...] >>>>> Transitioning to "pypi-cache" mode >>>>> ------------------------------------- >>>>> >>>>> When transitioning from the currently implicit "pypi-ext" mode to >>>>> "pypi-cache" for a given package, a package maintainer should >>>>> be able to retrieve/verify the historic release files which will >>>>> be cached from pypi.python.org. The UI should present this list >>>>> and have the maintainer accept it for completing the transition >>>>> to the "pypi-cache" mode. Upon future release registration actions, >>>>> pypi.python.org will perform crawling for the homepage/download sites >>>>> and cache release files *before* returning a success return code for >>>>> the release registration. >>>>> [...] >>>> >>>> Some concerns: >>>> >>>> 1. We cannot automatically switch people to pypi-cache. We _have_ to get explicit permission from them. >>> >>> Could you detail how you arrive at this conclusion? >>> (I've seen the claim before but not the underlying reasoning, maybe >>> i just missed it) >>> >>> There would be prior notifications to the package maintainers. If they >>> don't want to have their packages cached at pypi.python.org, they can set >>> the mode to "pypi-only" and leave manual instructions. I suspect there will >>> be very few people if anyone, objecting to pypi-cache mode. If that is >>> false we might need to prolong pypi-ext mode some more for them and >>> eventually switch them to pypi-only when we eventually decide to get >>> rid of external hosting. >> >> I asked VanL. His statement on re-hosting packages was: >> >> "We could do it if we had permission. The tricky part would be getting permission for already-existing packages." >> >> I'm pretty sure that emailing someone and assuming we have permission if they don't opt-out doesn't count as permission. > > Hum, i I saw Jesse Noller saying a few days ago "let them opt out". > But i guess VanL can trump that :) If that is true we could change the > notification to maintainers of B packages that hosting mode is going to > change to pypi-only, which would loose their release files unless they > opt-in to pypi-cache. As long as that is a no-brainer for them, we are > not asking for much and can count on most people's good will to not make > other people's installation life harder. > > Besides, admins could still set the "pypi-ext" mode if a maintainer can > explain why it's a problem for them to agree to "pypi-cache" or > "pypi-only". I'd really like to not have too many packages lingering > around in "pypi-ext" mode if it can be avoided. > >>> >>>> 2. The cache mechanism is going to be fragile, and in the long term leaves a window open for security issues. >>> >>> fragility: not sure it's too bad. Once the mode is activited release >>> registration ("submit" POST action on "/pypi" http endpoint) will only >>> succeed if according releases can be found through homepage/download. >>> Changing the mode to pypi-cache in the presence of historic release >>> files hosted elsewhere needs a good pypi.python.org UI interaction and >>> may take several tries if neccessary sites cannot be reached. Nevertheless, >>> this step is potentially fragile [X]. >> >> I see, so pypi-cache would only be triggered once during release creation. Cache makes it sound like we'd continuously monitor the given external urls instead of it actually being a pull based method of getting files. > > Right, we need to avoid cache invalidation problems by only allowing > updates at user-chosen point in times (there might also be an explicit > "update cache" button in case a maintainer pushes a egg/wheel later). > It's still technically a cache i think but the term "rehost" would > work as well i guess. > >> [...] >>> Back to pypi-cache: it is there to make it super-easy for package >>> maintainers. There are all kinds of release habits and scripts >>> pushing out things to google/bitbucket/github/other sites. With >>> "pypi-cache" they don't need to change any of that. They just need >>> to be fine with pypi.python.org pulling in the packages for caching. >> >> Yes I understand the goal here. The problem is that there's not really >> a good way to secure this without requiring changes to their workflow. >> At best they'll have to push information about every file so that PyPI >> is able to verify the files it is downloading, and if we are requiring >> them to push data about those files we might as well require them to >> push the files themselves. > > Is this about protection against package tampering? If so, I think a > proper solution involves maintainers signing their release files but > this is outside the intended scope of the PEP. > > Otherwise, the "re-hosting" process for pypi-cache mode is at least as > secure as currently where all hosts issuing pip/easy_install commands > visit external sites and can thus be MITM-attacked. For pypi-only > server packages it's safer because no crawling takes place. > > In any case, asking people to change their release process is not > a no-brainer. The PEP tries to avoid this source of friction. > That being said, i think we both agree to recommend maintainers to > (eventually) go for pypi-only and change their release processes > accordingly. This PEP is not the end of the story of evolving package > hosting and i'd like to be careful about asking maintainers to change > how they do things. > >> This also has the effect we can provide >> immediate feedback when files do not validate on PyPI. > > At release registration or switch-to-pypi-rehost time we could also do > package validation but i am inclined to see this as out of scope > for this PEP which tries to focus on the minimal steps to move > from pypi-ext to everything-hosted-through-pypi.python.org. > > cheers, > holger > >> >>> >>> We might think about phasing out pypi-cache after some larger time >>> frame so that we eventually only have pypi-only and things are eventually >>> simple and saner. >>> >>> best, >>> holger >>> >>> >>> >>>> These buttons would be one time and quit. Once your project has been switched to PyPI Only you cannot go back to Legacy mode. All new projects would be already switched to PyPI Only. After some amount of time switch all Projects to PyPI Only but _do not_ re-host their packages as we cannot legally do so without their permission. >>>> >>>> The above is simpler, still provides people an easy migration path, moves us to remove external hosting, and doesn't entangle us with legal issues. >>>> >>>> [1] There is still a small window here where someone could MITM PyPI fetching these files, however since it would be a one time and down deal this risk is minimal and is worth it to move to an pypi only solution. >>>> >>>> ----------------- >>>> Donald Stufft >>>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA >> >> >> ----------------- >> Donald Stufft >> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig From donald at stufft.io Sun Mar 10 21:59:14 2013 From: donald at stufft.io (Donald Stufft) Date: Sun, 10 Mar 2013 16:59:14 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> Message-ID: <73570046-C5E3-429B-B390-29C6578721A3@stufft.io> On Mar 10, 2013, at 3:41 PM, PJ Eby wrote: > On Sun, Mar 10, 2013 at 11:07 AM, holger krekel wrote: >> Philip, Marc-Andre, Richard (Jones), Nick and catalog-sig/distutils-sig: >> scrutiny and feedback welcome. > > Hi Holger. I'm having some difficulty interpreting your proposal > because it is leaving out some things, and in other places > contradicting what I know of how the tools work. It is also a bit at > odds with itself in some places. > > For instance, at the beginning, the PEP states its proposed solution > is to host all release files on PyPI, but then the problem section > describes the problems that arise from crawling external pages: > problems that can be solved without actually hosting the files on > PyPI. > > To me, it needs a clearer explanation of why the actual hosting part > also needs to be on PyPI, not just the links. In the threads to date, > people have argued about uptime, security, etc., and these points are > not covered by the PEP or even really touched on for the most part. > > (Actually, thinking about that makes me wonder.... Donald: did your > analysis collect any stats on *where* those externally hosted files > were hosted? My intuition says that the bulk of the files (by *file > count*) will come from a handful of highly-available domains, i.e. > sourceforge, github, that sort of thing, with actual self-hosting > being relatively rare *for the files themselves*, vs. a much wider > range of domains for the homepage/download URLs (especially because > those change from one release to the next.) If that's true, then most > complaints about availability are being caused by crawling multiple > not-highly-available HTML pages, *not* by the downloading of the > actual files. If my intuition about the distribution is wrong, OTOH, > it would provide a stronger argument for moving the files themselves > to PyPI as well.) No but it wouldn't be difficult to take the list of packages I generated and run another script to see where the files that aren't available on PyPI are actually located at. I'd like to emphasize again though that it doesn't really matter how good their uptime is, the best case scenario is it doens't hurt uptime, and worst case and typical case) is that it decreases it. A high uptime host will just decrease it _less_ than a low uptime host. > > Digression aside, this is one of things that needs to be clearer so > that there's a better explanation for package authors as to why > they're being asked to change. And although the base argument is good > ("specifying the "homepage" will slow down the installation process"), > it could be amplified further with an example of some project that has > had multiple homepages over its lifetime, listing all the URLs that > currently must be crawled before an installer can be sure it has found > all available versions, platforms, and formats of the that project. > > Okay, on to the Solution section. Again, your stated problem is to > fix crawling, but the solution is all about file hosting. Regardless > of which of these three "hosting modes" is selected, it remains an > option for the developer to host files elsewhere, and provide the > links in their description... unless of course you intended to rule > that out and forgot to mention it. (Or, I suppose, if you did *not* > intend to rule it out and intentionally omitted mention of that so the > rabid anti-externalists would think you were on their side and not > create further controversy... in which case I've now spoiled things. > Darn. ;-) ) > > Some technical details are also either incorrect or confusing. For > example, you state that "The original homepage/download links are > added as links without a ``rel`` attribute if they have the ``#egg`` > format". But if they are added without a rel attribute, it doesn't > *matter* whether they have an #egg marker or not. It is quite > possible for a PyPI package to have a download_url of say, > "http://sourceforge.net/download/someproject-1.2.tgz". > > Thus, I would suggest simply stating that changing hosting mode does > not actually remove any links from the /simple index, it just removes > the rel="" attributes from the "Home page" and "Download" links, thus > preventing them from being crawled in search of additional file links. In my opinion the final, PyPI only mode needs to remove all external links from the /simple/ index. > > With that out of the way, that brings me to the larger scope issue > with the modes as presented. Notice now that with this clarification, > there is no real difference in *state* between the "pypi-cache" and > "pypi-only" modes. There is only a *functional* difference... and > that function is underspecified in the PEP. > > What I mean is, in both pypi-cache and pypi-only, the *state* of > things is that rel="" attributes are gone, and there are links to > files on PyPI. The only difference is in *how* the files get there. > > And for the pypi-cache mode, this function is *really* > under-specified. Arguably, this is the meat of the proposal, but it > is entirely missing. There is nothing here about the frequency of > crawling, the methods used to select or validate files, whether there > is any expiration... it is all just magically assumed to happen > somehow. > > My suggestion would be to do two things: > > First, make the state a boolean: crawl external links, with the > current state yes and the future state no, with "no" simply meaning > that the rel="" attribute is removed from the links that currently > have it. > > Second, propose to offer tools in the PyPI interface (and command > line) to assist authors in making the transition, rather than > proposing a completely unspecified caching mechanism. Better to have > some vaguely specified tools than a completely unspecified caching > mechanism, and better still to spell out very precisely what those > tools do. > > Okay, on to the "Phases of transtion". This section gets a lot > simpler if there are only two stages. Specifically, we let everyone > know the change is going to happen, and how long they have, give 'em > links to migration tools. Done. ;-) This is my opinion as well. Though I think we differ in what the final stage should look like. > > (Okay, so analysis still makes sense: the people who don't have any > externally hosted files can get a different message, i.e., "Hey, we > notice that installing your package is slow because you have these > links that don't go anywhere. Click here if you'd like PyPI to stop > sending people on wild goose chases". The people who have external > hosted files will need a more involved message.) > > Whew. Okay, that ends my critique of the PEP as it sits. Now for an > outside-the-box suggestion. > > If you'd like to be able to transition people away from spidered links > in the fewest possible steps, with the least user action, no legal > issues, and in a completely automated way, note that this can be done > with a one-time spidering of the existing links to find the download > links, then adding those links directly to the /simple index, and > switching off the rel="" attributes. This can be done without > explicit user consent, though they can be given the chance to do it > manually, sooner. > > To implement this you'd need two project-level (*not* release-level) > fields: one to indicate whether the project is using rel="" or not, > and one to contain the list of external download links, which would be > user-editable. > > This overall approach I'm proposing can be extended to also support > mirroring, since it provides an explicit place to list what it is > you're mirroring. (At any rate, it's more explicitly specified than > any such place in the current PEP.) > > That field can also be fairly easily populated for any given project > in just a few lines of code: > > from pkg_resources import Requirement > pr = Requirement.parse('Projectname') > from setuptools.package_index import PackageIndex > pi = PackageIndex(search_path=[], python=None, platform=None) > pi.find_packages(pr) > all_urls = dist.location for dist in pi[pr.key] > external_urls = [ url for url in all_urls if not '//pypi.python.org' in url] > > (Although if you want more information, like what kind of link each > one is, the dist objects actually know a bit more than just the URL.) > > Anyway, I hope you found at least some of all this helpful. ;-) > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig I'm still against any off PyPI hosting of files. I call it "External links" a lot but in reality it's the requirement to contact any host other than PyPI to install a package. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From donald at stufft.io Sun Mar 10 22:16:24 2013 From: donald at stufft.io (Donald Stufft) Date: Sun, 10 Mar 2013 17:16:24 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <20130310195405.GI9677@merlinux.eu> References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> Message-ID: On Mar 10, 2013, at 3:54 PM, holger krekel wrote: > On Sun, Mar 10, 2013 at 14:29 -0400, Donald Stufft wrote: >> >> On Mar 10, 2013, at 2:18 PM, holger krekel wrote: >> >>> On Sun, Mar 10, 2013 at 13:35 -0400, Donald Stufft wrote: >>>> On Mar 10, 2013, at 11:07 AM, holger krekel wrote: >>>>> [...] >>>>> Transitioning to "pypi-cache" mode >>>>> ------------------------------------- >>>>> >>>>> When transitioning from the currently implicit "pypi-ext" mode to >>>>> "pypi-cache" for a given package, a package maintainer should >>>>> be able to retrieve/verify the historic release files which will >>>>> be cached from pypi.python.org. The UI should present this list >>>>> and have the maintainer accept it for completing the transition >>>>> to the "pypi-cache" mode. Upon future release registration actions, >>>>> pypi.python.org will perform crawling for the homepage/download sites >>>>> and cache release files *before* returning a success return code for >>>>> the release registration. >>>>> [...] >>>> >>>> Some concerns: >>>> >>>> 1. We cannot automatically switch people to pypi-cache. We _have_ to get explicit permission from them. >>> >>> Could you detail how you arrive at this conclusion? >>> (I've seen the claim before but not the underlying reasoning, maybe >>> i just missed it) >>> >>> There would be prior notifications to the package maintainers. If they >>> don't want to have their packages cached at pypi.python.org, they can set >>> the mode to "pypi-only" and leave manual instructions. I suspect there will >>> be very few people if anyone, objecting to pypi-cache mode. If that is >>> false we might need to prolong pypi-ext mode some more for them and >>> eventually switch them to pypi-only when we eventually decide to get >>> rid of external hosting. >> >> I asked VanL. His statement on re-hosting packages was: >> >> "We could do it if we had permission. The tricky part would be getting permission for already-existing packages." >> >> I'm pretty sure that emailing someone and assuming we have permission if they don't opt-out doesn't count as permission. > > Hum, i I saw Jesse Noller saying a few days ago "let them opt out". > But i guess VanL can trump that :) If that is true we could change the > notification to maintainers of B packages that hosting mode is going to > change to pypi-only, which would loose their release files unless they > opt-in to pypi-cache. As long as that is a no-brainer for them, we are > not asking for much and can count on most people's good will to not make > other people's installation life harder. > > Besides, admins could still set the "pypi-ext" mode if a maintainer can > explain why it's a problem for them to agree to "pypi-cache" or > "pypi-only". I'd really like to not have too many packages lingering > around in "pypi-ext" mode if it can be avoided. 0 packages allowing external links is the only useful end goal. > >>> >>>> 2. The cache mechanism is going to be fragile, and in the long term leaves a window open for security issues. >>> >>> fragility: not sure it's too bad. Once the mode is activited release >>> registration ("submit" POST action on "/pypi" http endpoint) will only >>> succeed if according releases can be found through homepage/download. >>> Changing the mode to pypi-cache in the presence of historic release >>> files hosted elsewhere needs a good pypi.python.org UI interaction and >>> may take several tries if neccessary sites cannot be reached. Nevertheless, >>> this step is potentially fragile [X]. >> >> I see, so pypi-cache would only be triggered once during release creation. Cache makes it sound like we'd continuously monitor the given external urls instead of it actually being a pull based method of getting files. > > Right, we need to avoid cache invalidation problems by only allowing > updates at user-chosen point in times (there might also be an explicit > "update cache" button in case a maintainer pushes a egg/wheel later). > It's still technically a cache i think but the term "rehost" would > work as well i guess. > >> [...] >>> Back to pypi-cache: it is there to make it super-easy for package >>> maintainers. There are all kinds of release habits and scripts >>> pushing out things to google/bitbucket/github/other sites. With >>> "pypi-cache" they don't need to change any of that. They just need >>> to be fine with pypi.python.org pulling in the packages for caching. >> >> Yes I understand the goal here. The problem is that there's not really >> a good way to secure this without requiring changes to their workflow. >> At best they'll have to push information about every file so that PyPI >> is able to verify the files it is downloading, and if we are requiring >> them to push data about those files we might as well require them to >> push the files themselves. > > Is this about protection against package tampering? If so, I think a > proper solution involves maintainers signing their release files but > this is outside the intended scope of the PEP. This part of it is yes, it's also about accidentally mirroring an unreleased file. We're going to need require certain information pushed to the PyPI requiring files pushed to PyPI in addition to that is not a big deal. Further more if people really really want this pull based behavior they can easily set it up outside of PyPI. > > Otherwise, the "re-hosting" process for pypi-cache mode is at least as > secure as currently where all hosts issuing pip/easy_install commands > visit external sites and can thus be MITM-attacked. For pypi-only > server packages it's safer because no crawling takes place. It's as least as secure as a completely insecure process. That's not setting a very high bar. > > In any case, asking people to change their release process is not > a no-brainer. The PEP tries to avoid this source of friction. > That being said, i think we both agree to recommend maintainers to > (eventually) go for pypi-only and change their release processes > accordingly. This PEP is not the end of the story of evolving package > hosting and i'd like to be careful about asking maintainers to change > how they do things. If someones release process forces PyPI to have security, uptime, and privacy issues then I'm very sorry but their release process is going to need to change. It's not fun, it's a shitty situation, but trying to bend over backwards to enable their current release processes is like trying to bend over backwards to enable people to still walk into their banks vault and grab a stack of currency. This isn't a case of "I don't like the way your process works, and I want you to change it". This is a case of "Your process actively causes the greater Python community to be vulnerable to a host of issues". > >> This also has the effect we can provide >> immediate feedback when files do not validate on PyPI. > > At release registration or switch-to-pypi-rehost time we could also do > package validation but i am inclined to see this as out of scope > for this PEP which tries to focus on the minimal steps to move > from pypi-ext to everything-hosted-through-pypi.python.org. > > cheers, > holger > >> >>> >>> We might think about phasing out pypi-cache after some larger time >>> frame so that we eventually only have pypi-only and things are eventually >>> simple and saner. >>> >>> best, >>> holger >>> >>> >>> >>>> These buttons would be one time and quit. Once your project has been switched to PyPI Only you cannot go back to Legacy mode. All new projects would be already switched to PyPI Only. After some amount of time switch all Projects to PyPI Only but _do not_ re-host their packages as we cannot legally do so without their permission. >>>> >>>> The above is simpler, still provides people an easy migration path, moves us to remove external hosting, and doesn't entangle us with legal issues. >>>> >>>> [1] There is still a small window here where someone could MITM PyPI fetching these files, however since it would be a one time and down deal this risk is minimal and is worth it to move to an pypi only solution. >>>> >>>> ----------------- >>>> Donald Stufft >>>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA >>>> >>> >>> >> >> >> ----------------- >> Donald Stufft >> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA >> > > There isn't a good middle ground here, any externally hosted or spidered file leads us back to at least 2 of the 3 major issues I outlined. The end goal *needs* to be that all external links are removed from PyPI's simple page, and only files hosted on PyPI are accepted there. The only real useful discussion is how do we get from where we are now, to the zero external links/files situation. At some point this is going to *require* breaking things for anyone who hasn't put their files on PyPI. Adding more steps only draws out the pain, like a bandaid it's best if it's ripped off quickly. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From pje at telecommunity.com Sun Mar 10 23:41:22 2013 From: pje at telecommunity.com (PJ Eby) Date: Sun, 10 Mar 2013 18:41:22 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> Message-ID: On Sun, Mar 10, 2013 at 5:16 PM, Donald Stufft wrote: > If someones release process forces PyPI to have security, uptime, and privacy issues then I'm very sorry but their release process is going to need to change. It's not fun, it's a shitty situation, but trying to bend over backwards to enable their current release processes is like trying to bend over backwards to enable people to still walk into their banks vault and grab a stack of currency. When people in group 1 express disapproval of people in group 2, this creates a rallying effect among members of group 1, and a *negative* counter-reaction in members of group 2. This is effective if, and *only* if, the people in group 2 have less power in the situation than the people in group 1. For example, if co-operation from the people in group 2 are not needed in order to carry out the wishes of group 1. However, in the situation under discussion, such co-operation is required, which means an alternative motivational strategy is indicated. That strategy involves giving persons in group 2 a better reason to care than "because we in group 1 think you group 2 people are thieves." And by better, I mean, a reason that *benefits group 2*, and more specifically, each individual in group 2 who chooses to co-operate. And ideally, you work also to lower the cost of that co-operation. That's what *this* thread was originally about (lowering the cost of co-operation), before these "burn the witch" sentiments started up again. So, why not just step aside and let the adults go back to working on the actual problem? Just kidding, of course. ;-) That's an example of me using the same type of communication style, in the opposite direction: spewing disapproval at something I don't like, instead of giving you a reason that benefits *you*, to do what I want. See how it feels, going the other direction? Did it motivate you to be helpful? I'm guessing not. ;-) Anyway, my point is this: people don't like it one bit when you tell them what to do. If you tell them, "you must do X", you get resistance. But if you offer them a choice, "Are you going to do X or Y?", there's much less resistance. And if one choice is less convenient than the other, most will pick the easier choice. So, would you rather fight with developers to make them do it your way, or have most of them do exactly what you want and most of the rest get pretty close, but not have to fight with them about it? Right now, the impression you and certain other people are giving me is that it is more important that whatever action we take be seen as censuring the practice of off-PyPI hosting, than that we actually fix the problems! And it's difficult to take such a position seriously, because the post-hoc rationalization of harms is, well, unconvincing at best to a neutral party. When PyPI was first built, it didn't *have* hosting, so there was nothing morally wrong about off-site hosting then. And when hosting was first added, automated downloading didn't exist yet, either. So it still wasn't wrong. And when I added automated downloading, I made the choice to encourage people to collaborate by making it as easy as possible. So offsite hosting still wasn't wrong, in fact it was a documented alternative. And that's been the case for, oh, 8 years now? So what you're actually doing isn't crusading against evil-doers, it's more like saying that every restaurant that isn't McDonalds should be immediately remodeled, because you have just noticed the shocking trend that hardly any of those restaurants will serve you food as quickly! And that of course, the restaurant owners should undertake the remodeling and procedure changes, retraining, retooling, etc. at *their* expense, on *your* timeline. Just so that *you*, who *chose to visit those restaurants in the first place*, can get your food a bit more quickly. Sure, I know that's not how *you* see it. But surely you can see that's how the *restaurant owners* are going to see it. And if you want them to co-operate, it's probably going to be in your interest to focus your attention on their side of the equation, rather than on yours. You already agree with your point of view. They don't. I realize that can be difficult to do when you have strong feelings about a subject. For example, as I write this I keep backing up and deleting all sorts of unhelpful things I find myself wanting to say. ;-) And I'm doing that because I'm consciously reminding myself that *getting to a solution* is more important to me than *making you feel bad* for being "wrong on the internet". What's more important to you? The *actual* state of PyPI, or the state of who is to be considered right or wrong? If it's the former, you would probably find it useful to your goals, to please refrain from calling me and that other 10% of PyPI thieves. Or really any other names whatsoever, explicitly OR implicitly. Thanks. From donald at stufft.io Mon Mar 11 01:25:16 2013 From: donald at stufft.io (Donald Stufft) Date: Sun, 10 Mar 2013 20:25:16 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> Message-ID: <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> On Mar 10, 2013, at 6:41 PM, PJ Eby wrote: > On Sun, Mar 10, 2013 at 5:16 PM, Donald Stufft wrote: >> If someones release process forces PyPI to have security, uptime, and privacy issues then I'm very sorry but their release process is going to need to change. It's not fun, it's a shitty situation, but trying to bend over backwards to enable their current release processes is like trying to bend over backwards to enable people to still walk into their banks vault and grab a stack of currency. > > When people in group 1 express disapproval of people in group 2, this > creates a rallying effect among members of group 1, and a *negative* > counter-reaction in members of group 2. > > This is effective if, and *only* if, the people in group 2 have less > power in the situation than the people in group 1. For example, if > co-operation from the people in group 2 are not needed in order to > carry out the wishes of group 1. > > However, in the situation under discussion, such co-operation is > required, which means an alternative motivational strategy is > indicated. > > That strategy involves giving persons in group 2 a better reason to > care than "because we in group 1 think you group 2 people are > thieves." > > And by better, I mean, a reason that *benefits group 2*, and more > specifically, each individual in group 2 who chooses to co-operate. > > And ideally, you work also to lower the cost of that co-operation. > > That's what *this* thread was originally about (lowering the cost of > co-operation), before these "burn the witch" sentiments started up > again. So, why not just step aside and let the adults go back to > working on the actual problem? > > Just kidding, of course. ;-) That's an example of me using the same > type of communication style, in the opposite direction: spewing > disapproval at something I don't like, instead of giving you a reason > that benefits *you*, to do what I want. See how it feels, going the > other direction? Did it motivate you to be helpful? I'm guessing > not. ;-) > > Anyway, my point is this: people don't like it one bit when you tell > them what to do. > > If you tell them, "you must do X", you get resistance. > > But if you offer them a choice, "Are you going to do X or Y?", there's > much less resistance. > > And if one choice is less convenient than the other, most will pick > the easier choice. > > So, would you rather fight with developers to make them do it your > way, or have most of them do exactly what you want and most of the > rest get pretty close, but not have to fight with them about it? > > Right now, the impression you and certain other people are giving me > is that it is more important that whatever action we take be seen as > censuring the practice of off-PyPI hosting, than that we actually fix > the problems! > > And it's difficult to take such a position seriously, because the > post-hoc rationalization of harms is, well, unconvincing at best to a > neutral party. When PyPI was first built, it didn't *have* hosting, > so there was nothing morally wrong about off-site hosting then. > > And when hosting was first added, automated downloading didn't exist > yet, either. So it still wasn't wrong. > > And when I added automated downloading, I made the choice to encourage > people to collaborate by making it as easy as possible. So offsite > hosting still wasn't wrong, in fact it was a documented alternative. > > And that's been the case for, oh, 8 years now? > > So what you're actually doing isn't crusading against evil-doers, it's > more like saying that every restaurant that isn't McDonalds should be > immediately remodeled, because you have just noticed the shocking > trend that hardly any of those restaurants will serve you food as > quickly! > > And that of course, the restaurant owners should undertake the > remodeling and procedure changes, retraining, retooling, etc. at > *their* expense, on *your* timeline. > > Just so that *you*, who *chose to visit those restaurants in the first > place*, can get your food a bit more quickly. > > Sure, I know that's not how *you* see it. > > But surely you can see that's how the *restaurant owners* are going to see it. > > And if you want them to co-operate, it's probably going to be in your > interest to focus your attention on their side of the equation, rather > than on yours. You already agree with your point of view. They > don't. > > I realize that can be difficult to do when you have strong feelings > about a subject. For example, as I write this I keep backing up and > deleting all sorts of unhelpful things I find myself wanting to say. > ;-) > > And I'm doing that because I'm consciously reminding myself that > *getting to a solution* is more important to me than *making you feel > bad* for being "wrong on the internet". > > What's more important to you? The *actual* state of PyPI, or the > state of who is to be considered right or wrong? > > If it's the former, you would probably find it useful to your goals, > to please refrain from calling me and that other 10% of PyPI thieves. > Or really any other names whatsoever, explicitly OR implicitly. > > Thanks. I don't think anyone is bad here, nor am I arguing against any particular person or group of people. I'm arguing against a practice and a system. You're going out of your way to find excuses to throw all sorts of stop energy here. All I said was that their process needed to change, I even expressed sympathy with the fact it did need to change. I've never called *anyone* on this list, or on PyPI a thief. My analogy served only to put into light that the system that I'm trying to change is insecure, just like allowing anyone to walk into a bank vault and pick up money would be insecure. I fully believe that the people using such a system are completely trustworthy people. But just because *they* are trustworthy doesn't mean that a system which allows *anyone* to attack other Python developers is *ok*. When discussing security of a system it's necessary to divorce yourself from the implementations of things. When you get wrapped up in the implementation you turn things into a Us vs Them game (as evidenced by several of your messages) instead of discussing the merits of the various systems and which ones serve the greatest needs of the community the best. I believe I've said it before, but if not here it is again: I will donate *my free time* to help ANYONE who is using a release process which this change would break to engineer a new release process that has as little impact on their actual process as possible and not have all these issues for the greater Python community. And let's just be clear, I'm offering to put aside a massive list of things I need to be doing to help the very folks you're saying i'm disparaging. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From pje at telecommunity.com Mon Mar 11 07:09:16 2013 From: pje at telecommunity.com (PJ Eby) Date: Mon, 11 Mar 2013 02:09:16 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> Message-ID: On Sun, Mar 10, 2013 at 8:25 PM, Donald Stufft wrote: > I don't think anyone is bad here, nor am I arguing against any particular person or group of people. I'm arguing against a practice and a system. You're going out of your way to find excuses to throw all sorts of stop energy here. Calling a legitimate disagreement with your point of view "stop energy" seems inappropriate to me, since my issue is with you derailing the topic of how to get people to *voluntarily* migrate to a better situation than the present one, and to develop tools for that process. The only thing I wish you to stop is the repeated assertion without proof that 1) external links must go *and* 2) this must be an enforced directive rather than a (highly-encouraged) option. I have even gone so far as to suggest, earlier in this thread, what evidence I would find at least suggestive of your POV. But your response to that and prior challenges to those assertions, has been simply to move your goalpost. E.g. from "current uptime is bad" to "any uptime lower than PyPI's is totally unacceptable". I, on the other hand, have moved in the direction of *your* proposals repeatedly, making adjustments as I find actually-convincing evidence and/or reasoning, or find ways to deal with the issues. I have compromised quite a bit. (And have already spent a fair amount of time writing setuptools code to lay a foundation for these changes.) You, as far as I can tell, have not moved your position in the slightest. Which of these is "stop energy"? It is not the case that external links must be removed from PyPI in order to ensure security, or uptime. And it is *especially* not the case that you are the BDFL of uptime. You're definitely not the BDFL of uptime for any given project hosted on PyPI, that you *voluntarily choose* to make a part of your build process. If your primary argument is that project X must host its files on PyPI because of your build process, then I think you misunderstand open source, and also the part where you *chose* to make it part of your build process. It certainly doesn't give you the right to force projects Y, Z, and Q -- that you don't even use! -- to also host their projects on PyPI, because project X -- the one you do use -- has a slow or unreliable file host! It seems disingenuous to then shfit the argument back to security when challenged on uptime, and back to uptime when challenged on security. We've looped back and forth over those for some time: when I point out that wheels have signatures which will make off-site hosting relatively unimportant to the security picture, you jump back to talking about uptime. When I point out that uptime is a consensual factor that in no way justifies legislating what other people can do with their projects, you go back to talking about security. Make up your mind. What problem are you actually trying to solve? (I expect your response on wheels to be that wheels aren't there yet, etc., but that isn't actually a response to the objection unless you're going to change your position to, "okay, external links to file formats that can be signed can stay," or something of that sort. Otherwise, you're not actually compromising, just using the fact that wheels aren't in common use yet as an argument to keep the position you started with.) > My analogy served only to put into light that the system that I'm trying to change is insecure, just like allowing anyone to walk into a bank vault and pick up money would be insecure. I fully believe that the people using such a system are completely trustworthy people. But just because *they* are trustworthy doesn't mean that a system which allows *anyone* to attack other Python developers is *ok*. And my analogy served only to put into light the part where you're insisting that one group of people change for the benefit of a group which is already benefiting from their pre-existing generosity. That being said, I do see that I could have misinterpreted the intent of your analogy -- it sounded like you were saying that the developers who host off-PyPI were thieves walking into your bank and taking your money (i.e., analogizing theft with inconveniencing you by making your builds fail or run slowly). Though to be honest, I still don't comprehend how else to make any kind of sense to that analogy in its original context. Who is the bank? Whose money is being taken? The whole thing is utterly confusing to me if I try to take it any other way than the way I did, because it doesn't seem to have any other simple 1:1 mapping to the situation, as far as I can see. Your explanation seems terribly abstract and tortured to me, as far as analogies go. > When discussing security of a system it's necessary to divorce yourself from the implementations of things. When you get wrapped up in the implementation you turn things into a Us vs Them game (as evidenced by several of your messages) instead of discussing the merits of the various systems and which ones serve the greatest needs of the community the best. I think you've got things backwards here. It's you who's been arguing that the solution to the problem of "improved uptime and security" is best implemented by "ban all non-PyPI hosting". It is I who has been arguing that this is a premature judgment and rush to implementation, without considering all of the design angles. And I am the one asking you to stop insisting on this one implementation and state your *actual* problem with external links. (By which I mean, a problem stated such that, if you're given a solution that *doesn't* involve banning them from PyPI, you aren't going to rejigger the problem statement so that it once again requires banning. That's moving the goalposts, and that's what keeps happening in this discussion, at least as far as I can see. I, on the other hand, have given you my actual problem with your proposal, and I have not moved *my* goalposts. Instead, I've moved towards your position, more than once. But I've moved as far towards it as I can go at this time, without you providing any additional evidence or explanation or *some* kind of engagement with the points that I've raised above that you've previously ignored, in this thread and others.) From regebro at gmail.com Mon Mar 11 07:23:35 2013 From: regebro at gmail.com (Lennart Regebro) Date: Mon, 11 Mar 2013 07:23:35 +0100 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> Message-ID: On Mon, Mar 11, 2013 at 7:09 AM, PJ Eby wrote: > I think you've got things backwards here. It's you who's been arguing > that the solution to the problem of "improved uptime and security" is > best implemented by "ban all non-PyPI hosting". The uptime problem is *only* solvable by minimizing the number of hosts involved. The minimum number of hosts is one. That means we should get all releases onto PyPI. This has been obvious for years, and I'm overjoyed to see that work is finally being done to make that happen. Discussion should be about how to best do that, not if we should do that or not. We can also discuss wordings. Nobody is for example trying to strictly speaking ban hosting on other hosts than PyPI. But if you do host on another server, your package will not be a part of the Python ecosystem, and it will not be installable by easy_install or pip or buildout, etc. You can call that a "ban" if you want, but maybe that causes negative connotations that are best avoided. But what ever you call it the end goal and result is the same. Packages not hosted on PyPI will not be easily installable. This is, and must be, the end goal. Now let's discuss how to get there instead. //Lennart From ronaldoussoren at mac.com Mon Mar 11 09:06:21 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Mon, 11 Mar 2013 09:06:21 +0100 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> Message-ID: <212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com> On 11 Mar, 2013, at 7:23, Lennart Regebro wrote: > On Mon, Mar 11, 2013 at 7:09 AM, PJ Eby wrote: >> I think you've got things backwards here. It's you who's been arguing >> that the solution to the problem of "improved uptime and security" is >> best implemented by "ban all non-PyPI hosting". > > The uptime problem is *only* solvable by minimizing the number of > hosts involved. The minimum number of hosts is one. I mostly agree when you change hosts to websites ;-). > That means we > should get all releases onto PyPI. But this isn't necessarily true, there is another solution: mirror your requirements locally. That way you don't have problems when the remote PyPI server is unreachable for some reason, and you can be sure that the exact version you tested with is available and used. > This has been obvious for years, > and I'm overjoyed to see that work is finally being done to make that > happen. Discussion should be about how to best do that, not if we > should do that or not. > > We can also discuss wordings. Nobody is for example trying to strictly > speaking ban hosting on other hosts than PyPI. But if you do host on > another server, your package will not be a part of the Python > ecosystem, and it will not be installable by easy_install or pip or > buildout, etc. You can call that a "ban" if you want, but maybe that > causes negative connotations that are best avoided. But what ever you > call it the end goal and result is the same. Packages not hosted on > PyPI will not be easily installable. This is, and must be, the end > goal. The end goal is to make it easy and safe to install packages. > > Now let's discuss how to get there instead. Is it even clear why numerous archives aren't hosted on PyPI? IMHO it would be better to remove barriers than force projects to host files on PyPI. Ronald > > //Lennart > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig From ronaldoussoren at mac.com Mon Mar 11 09:14:11 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Mon, 11 Mar 2013 09:14:11 +0100 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> Message-ID: On 10 Mar, 2013, at 22:16, Donald Stufft wrote: > > There isn't a good middle ground here, any externally hosted or spidered file leads us back to at least 2 of the 3 major issues I outlined. The end goal *needs* to be that all external links are removed from PyPI's simple page, and only files hosted on PyPI are accepted there. Why is that? It there something in the proposed package signing solution that won't work when files aren't on PyPI? If so, will it still be possible to run in-house package repositories (partial PyPI mirrors and/or repositories with non-public software)? Ronald From regebro at gmail.com Mon Mar 11 09:18:28 2013 From: regebro at gmail.com (Lennart Regebro) Date: Mon, 11 Mar 2013 09:18:28 +0100 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com> References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com> Message-ID: On Mon, Mar 11, 2013 at 9:06 AM, Ronald Oussoren wrote: > But this isn't necessarily true, there is another solution: mirror your requirements locally. I do that. This is not a solution, because your requirements yesterday is not your requirements tomorrow. > Is it even clear why numerous archives aren't hosted on PyPI? No, the only one that has mentioned why is Marc-Andr?, I think, whose eGenix packages are distributed as binary packages for loads of different platforms. It's unclear to me if all these binary packages should be uploaded to PyPI, and it is also unclear to me why they can't be, it seems to be mostly a case of it being too much work. He also mentioned the big Python distributions eGenix does as being too large for PyPI, but I don't really see the point of uploading Python distributions to PyPI, they can't be installed with Python installers anyway. > IMHO it would be better to remove barriers than force projects to host files on PyPI. Nobody has really been able to point out any real barriers, so we don't know what they are or if they exist. //Lennart From ronaldoussoren at mac.com Mon Mar 11 09:33:51 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Mon, 11 Mar 2013 09:33:51 +0100 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com> Message-ID: <826D31AF-BE1C-4FC3-8FF9-EAC3B7D6EA54@mac.com> On 11 Mar, 2013, at 9:18, Lennart Regebro wrote: > On Mon, Mar 11, 2013 at 9:06 AM, Ronald Oussoren wrote: >> But this isn't necessarily true, there is another solution: mirror your requirements locally. > > I do that. This is not a solution, because your requirements yesterday > is not your requirements tomorrow. So? When your requirements change you change the local mirror. > >> Is it even clear why numerous archives aren't hosted on PyPI? > > No, the only one that has mentioned why is Marc-Andr?, I think, whose > eGenix packages are distributed as binary packages for loads of > different platforms. It's unclear to me if all these binary packages > should be uploaded to PyPI, and it is also unclear to me why they > can't be, it seems to be mostly a case of it being too much work. > > He also mentioned the big Python distributions eGenix does as being > too large for PyPI, but I don't really see the point of uploading > Python distributions to PyPI, they can't be installed with Python > installers anyway. Some reasons I've seen mentioned in the past: * In some big companies it might be easier to publish archives on the company webserver than on PyPI due to truckloads of red tape on their part (not something we can fix) * It is easier to publish all related archives in the same place for projects where the python package is just one component (for example client libraries for a network server) * Authors might not know it is possible to upload archives to PyPI > >> IMHO it would be better to remove barriers than force projects to host files on PyPI. > > Nobody has really been able to point out any real barriers, so we > don't know what they are or if they exist. It may be as simple as lack of knowledge (e.g. "I didn't know I could host files on PyPI"), or unnecessary friction in the release proces. I guess the only way we will know why some authors don't upload archives to PyPI is to ask (some of) them. Ronald From mal at egenix.com Mon Mar 11 10:23:03 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 11 Mar 2013 10:23:03 +0100 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com> Message-ID: <513DA277.2010809@egenix.com> On 11.03.2013 09:18, Lennart Regebro wrote: > On Mon, Mar 11, 2013 at 9:06 AM, Ronald Oussoren wrote: >> But this isn't necessarily true, there is another solution: mirror your requirements locally. > > I do that. This is not a solution, because your requirements yesterday > is not your requirements tomorrow. > >> Is it even clear why numerous archives aren't hosted on PyPI? > > No, the only one that has mentioned why is Marc-Andr?, I think, whose > eGenix packages are distributed as binary packages for loads of > different platforms. It's unclear to me if all these binary packages > should be uploaded to PyPI, and it is also unclear to me why they > can't be, it seems to be mostly a case of it being too much work. I've listed all the reasons in one of the previous emails: http://mail.python.org/pipermail/catalog-sig/2013-March/005502.html Others will likely have additional reasons, like e.g. * the PyPI uploads not being compatible to their release process * not knowing that it's possible to host files on PyPI - after all it's an *index*, not a repository :-) * still believing that PyPI is an unreliable hosting provider due the many downtimes and problems it had in the past - which is no longer true today * not wanting to host and maintain files in several different places * not wanting to host release files at all, i.e. have people check out the version from a repository instead of doing the download, unzip, install dance * not wanting to separate associated library or product code from the Python wrapper code (think e.g. the Python interface for subversion) * not being allowed to upload files to external servers by company policy, or having to deal with a company policy that makes this difficult/unattractive * having issues with the added latency of PyPI downloads compared to a simple file based index hosted on a company web server * having a strong need to know the number of downloads per package and associated statistics such as downloads per country, per year/month/day/hour * not wanting to give up access to the download log files * having a requirement to restrict downloads on a per country basis, e.g. for export controlled software or software which may not be imported/used in certain countries * having PyPI not provide the technical means to host the release files, e.g. due to the releases using a format which is not supported by PyPI (e.g. all the ActiveState packages - http://code.activestate.com/pypm/) * user experience/support issues: if the package has external dependencies, or needs special setup, it may provide a better user experience to host the Python wrapper on the same page as the dependencies and instructions on how to install them; rather than having them on PyPI which lets people believe that a simple "pip install something" will get them a working setup Those are just a few things that come to mind. I'm sure there are more issues that keep authors from uploading their packages to PyPI. Overall, I think we should encourage people to make their code available through PyPI and make it attractive to them, but keep the possibility to continue using external hosting platforms, should they run into issues that PyPI cannot solve for them. > He also mentioned the big Python distributions eGenix does as being > too large for PyPI, but I don't really see the point of uploading > Python distributions to PyPI, they can't be installed with Python > installers anyway. Not sure what you mean here. PyPI is also used to index Python projects which are not Python packages to be installed by pip/easy_install/etc. Some of those may also want to >> IMHO it would be better to remove barriers than force projects to host files on PyPI. > > Nobody has really been able to point out any real barriers, so we > don't know what they are or if they exist. Again, please see the email where I listed the ones affecting at least eGenix. Most of those can be addressed in one way or another, e.g. by having PyPI cache the files, provide access to the download counts by country, provide a way to host separate indexes for UCS2/UCS4 egg files, etc. The only issues that need more investigation are the PyPI license terms and the general issue of not being able to host export regulated files on PyPI. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 11 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From regebro at gmail.com Mon Mar 11 10:31:45 2013 From: regebro at gmail.com (Lennart Regebro) Date: Mon, 11 Mar 2013 10:31:45 +0100 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <826D31AF-BE1C-4FC3-8FF9-EAC3B7D6EA54@mac.com> References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com> <826D31AF-BE1C-4FC3-8FF9-EAC3B7D6EA54@mac.com> Message-ID: On Mon, Mar 11, 2013 at 9:33 AM, Ronald Oussoren wrote: > > On 11 Mar, 2013, at 9:18, Lennart Regebro wrote: > >> On Mon, Mar 11, 2013 at 9:06 AM, Ronald Oussoren wrote: >>> But this isn't necessarily true, there is another solution: mirror your requirements locally. >> >> I do that. This is not a solution, because your requirements yesterday >> is not your requirements tomorrow. > > So? When your requirements change you change the local mirror. How? You can't mirror something that you can't reach. The only local solution to this is to mirror every file that is reachable via PyPI, in advance. That is obviously *not* a feasible solution. > I guess the only way we will know why some authors don't upload archives to > PyPI is to ask (some of) them. Right. I don't think it's feasible to discuss speculative reasons, and in any case I strongly believe that whatever reason people have, we still should not let the Python tools install packages from third-party hosts by default. If you have your own index (like Plone currently does, largely because of the problems caused by having packages on several different servers) that should of course be allowed. I have a list of emails already, if somebody wants to ask people. :-) It's 2651 emails though, and I think most of those people have registeres packages that doesn't actually have *any* distributions. :-) I didn't check for that. //Lennart From ronaldoussoren at mac.com Mon Mar 11 10:56:23 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Mon, 11 Mar 2013 10:56:23 +0100 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com> <826D31AF-BE1C-4FC3-8FF9-EAC3B7D6EA54@mac.com> Message-ID: On 11 Mar, 2013, at 10:31, Lennart Regebro wrote: > On Mon, Mar 11, 2013 at 9:33 AM, Ronald Oussoren wrote: >> >> On 11 Mar, 2013, at 9:18, Lennart Regebro wrote: >> >>> On Mon, Mar 11, 2013 at 9:06 AM, Ronald Oussoren wrote: >>>> But this isn't necessarily true, there is another solution: mirror your requirements locally. >>> >>> I do that. This is not a solution, because your requirements yesterday >>> is not your requirements tomorrow. >> >> So? When your requirements change you change the local mirror. > > How? You can't mirror something that you can't reach. Now I'm confused. You want to change a dependency without testing it before hand? I'm probably getting old, but for production software I tend to download and archive all versions used instead of assuming that all software can at all times easily be downloaded. When I want to update a dependency (new version, new external package) I first download and test, then add it to the local archive. Part of the reason for this is that the production site doesn't have a fast always on internet connection, another part is that the local archive ensures I can reproduce the exact installation on another server without cloning the first one. > The only local solution to this is to mirror every file that is > reachable via PyPI, in advance. That is obviously *not* a feasible > solution. > >> I guess the only way we will know why some authors don't upload archives to >> PyPI is to ask (some of) them. > > Right. I don't think it's feasible to discuss speculative reasons, and > in any case I strongly believe that whatever reason people have, we > still should not let the Python tools install packages from > third-party hosts by default. I don't have problems with installing from 3th-party hosts, as someone noted earlier some of those 3th-party hosts have very high uptimes themself (github, bitbucket, ...). The current way to get to those 3th-party hosts is hacky and could be changed, for example by adding a PyPI API for registering download links and other metadata for specific files (that is, a way to add items to the file list on PyPI that aren't hosted on PyPI). I don't know how feasible this would be when packages are signed using TUF, but it could work with Giovanni's proposal using PGP signatures. A problem with adding such an API is that there is no reason to assume that it would actually be used, using that API would be about as much work as using the upload API in the first place. > If you have your own index (like Plone > currently does, largely because of the problems caused by having > packages on several different servers) that should of course be > allowed. > > I have a list of emails already, if somebody wants to ask people. :-) That won't be me, I don't have enough time available to act upon the results. Ronald From holger at merlinux.eu Mon Mar 11 11:02:25 2013 From: holger at merlinux.eu (holger krekel) Date: Mon, 11 Mar 2013 10:02:25 +0000 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> Message-ID: <20130311100225.GL9677@merlinux.eu> Hi Philip, thanks for your helpful review, almost all makes sense to me ... some more inlined comments below. Up front, i am open to you co-authoring the PEP if you like and share the goal to find a minimum viable approach to speed up and simplify the interactions for installers. On Sun, Mar 10, 2013 at 15:41 -0400, PJ Eby wrote: > On Sun, Mar 10, 2013 at 11:07 AM, holger krekel wrote: > > Philip, Marc-Andre, Richard (Jones), Nick and catalog-sig/distutils-sig: > > scrutiny and feedback welcome. > > Hi Holger. I'm having some difficulty interpreting your proposal > because it is leaving out some things, and in other places > contradicting what I know of how the tools work. It is also a bit at > odds with itself in some places. Certainly, it was a quick draft to get the process going and useful feedback which worked so far :) > For instance, at the beginning, the PEP states its proposed solution > is to host all release files on PyPI, but then the problem section > describes the problems that arise from crawling external pages: > problems that can be solved without actually hosting the files on > PyPI. > > To me, it needs a clearer explanation of why the actual hosting part > also needs to be on PyPI, not just the links. In the threads to date, > people have argued about uptime, security, etc., and these points are > not covered by the PEP or even really touched on for the most part. Makes sense to clarify this more. > (Actually, thinking about that makes me wonder.... Donald: did your > analysis collect any stats on *where* those externally hosted files > were hosted? My intuition says that the bulk of the files (by *file > count*) will come from a handful of highly-available domains, i.e. > sourceforge, github, that sort of thing, with actual self-hosting > being relatively rare *for the files themselves*, vs. a much wider > range of domains for the homepage/download URLs (especially because > those change from one release to the next.) If that's true, then most > complaints about availability are being caused by crawling multiple > not-highly-available HTML pages, *not* by the downloading of the > actual files. If my intuition about the distribution is wrong, OTOH, > it would provide a stronger argument for moving the files themselves > to PyPI as well.) > > Digression aside, this is one of things that needs to be clearer so > that there's a better explanation for package authors as to why > they're being asked to change. And although the base argument is good > ("specifying the "homepage" will slow down the installation process"), > it could be amplified further with an example of some project that has > had multiple homepages over its lifetime, listing all the URLs that > currently must be crawled before an installer can be sure it has found > all available versions, platforms, and formats of the that project. Right, an example makes sense. > Okay, on to the Solution section. Again, your stated problem is to > fix crawling, but the solution is all about file hosting. Regardless > of which of these three "hosting modes" is selected, it remains an > option for the developer to host files elsewhere, and provide the > links in their description... unless of course you intended to rule > that out and forgot to mention it. (Or, I suppose, if you did *not* > intend to rule it out and intentionally omitted mention of that so the > rabid anti-externalists would think you were on their side and not > create further controversy... in which case I've now spoiled things. > Darn. ;-) ) To be honest, while drafting i forgot about the fact that the long_description can contain package links as well. > Some technical details are also either incorrect or confusing. For > example, you state that "The original homepage/download links are > added as links without a ``rel`` attribute if they have the ``#egg`` > format". But if they are added without a rel attribute, it doesn't > *matter* whether they have an #egg marker or not. It is quite > possible for a PyPI package to have a download_url of say, > "http://sourceforge.net/download/someproject-1.2.tgz". Right. I just wanted to clarify that the distutils metadata "download_url" can contain an #egg format link and that this link should still be served (without a rel). > Thus, I would suggest simply stating that changing hosting mode does > not actually remove any links from the /simple index, it just removes > the rel="" attributes from the "Home page" and "Download" links, thus > preventing them from being crawled in search of additional file links. That's certainly a better description of what effectively happens and avoids the special mentioning of #egg. > With that out of the way, that brings me to the larger scope issue > with the modes as presented. Notice now that with this clarification, > there is no real difference in *state* between the "pypi-cache" and > "pypi-only" modes. There is only a *functional* difference... and > that function is underspecified in the PEP. Agreed. > What I mean is, in both pypi-cache and pypi-only, the *state* of > things is that rel="" attributes are gone, and there are links to > files on PyPI. The only difference is in *how* the files get there. Yes. > And for the pypi-cache mode, this function is *really* > under-specified. Arguably, this is the meat of the proposal, but it > is entirely missing. There is nothing here about the frequency of > crawling, the methods used to select or validate files, whether there > is any expiration... it is all just magically assumed to happen > somehow. I'd like to avoid cache-invalidation issues by only performing cache updates upon three user actions: - when a release is registered for a package which is in "pypi-cache" hosting mode. - when a maintainer chooses to set "pypi-cache" - when a maintainer explicitely triggers a "cache" update All actions allow pypi.python.org to provide feedback / error codes so there is nothing hidden going on in regular intervals or so. > My suggestion would be to do two things: > > First, make the state a boolean: crawl external links, with the > current state yes and the future state no, with "no" simply meaning > that the rel="" attribute is removed from the links that currently > have it. > > Second, propose to offer tools in the PyPI interface (and command > line) to assist authors in making the transition, rather than > proposing a completely unspecified caching mechanism. Better to have > some vaguely specified tools than a completely unspecified caching > mechanism, and better still to spell out very precisely what those > tools do. This structure makes sense to me except that i see the need to start off with "pypi-ext", i.e. a third state which encodes the current behaviour. Thing is that the pypi.python.org doesn't have an extensive test suite and we will thus need to rely on a few early adopters using the tools/state-changes before starting phase 2 (mass mailings etc.). Also in case of problems we can always switch back packages to the safe "pypi-ext" mode. IOW, the motiviation for this third state is considering the actual implementation process. > Okay, on to the "Phases of transtion". This section gets a lot > simpler if there are only two stages. Specifically, we let everyone > know the change is going to happen, and how long they have, give 'em > links to migration tools. Done. ;-) > > (Okay, so analysis still makes sense: the people who don't have any > externally hosted files can get a different message, i.e., "Hey, we > notice that installing your package is slow because you have these > links that don't go anywhere. Click here if you'd like PyPI to stop > sending people on wild goose chases". The people who have external > hosted files will need a more involved message.) > > Whew. Okay, that ends my critique of the PEP as it sits. Now for an > outside-the-box suggestion. > > If you'd like to be able to transition people away from spidered links > in the fewest possible steps, with the least user action, no legal > issues, and in a completely automated way, note that this can be done > with a one-time spidering of the existing links to find the download > links, then adding those links directly to the /simple index, and > switching off the rel="" attributes. This can be done without > explicit user consent, though they can be given the chance to do it > manually, sooner. Right, my mail preceding the "pre-pep" one contained a "linkext" state which spidered the links and offered them directly. It's certainly possible and indeed would likely not have legal issues. It might have cache-invalidation issues and probably makes the pypi-side implementation more complex. Also it goes a bit against the current intention of the PEP to have pypi.python.org control all hosting of release files. > To implement this you'd need two project-level (*not* release-level) > fields: one to indicate whether the project is using rel="" or not, > and one to contain the list of external download links, which would be > user-editable. > > This overall approach I'm proposing can be extended to also support > mirroring, since it provides an explicit place to list what it is > you're mirroring. (At any rate, it's more explicitly specified than > any such place in the current PEP.) > > That field can also be fairly easily populated for any given project > in just a few lines of code: > > from pkg_resources import Requirement > pr = Requirement.parse('Projectname') > from setuptools.package_index import PackageIndex > pi = PackageIndex(search_path=[], python=None, platform=None) > pi.find_packages(pr) > all_urls = dist.location for dist in pi[pr.key] > external_urls = [ url for url in all_urls if not '//pypi.python.org' in url] > > (Although if you want more information, like what kind of link each > one is, the dist objects actually know a bit more than just the URL.) > > Anyway, I hope you found at least some of all this helpful. ;-) Certainly! Will try to do an update incorporating your suggestions in the next days. best, holger From regebro at gmail.com Mon Mar 11 11:44:31 2013 From: regebro at gmail.com (Lennart Regebro) Date: Mon, 11 Mar 2013 11:44:31 +0100 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com> <826D31AF-BE1C-4FC3-8FF9-EAC3B7D6EA54@mac.com> Message-ID: On Mon, Mar 11, 2013 at 10:56 AM, Ronald Oussoren wrote: > Now I'm confused. You want to change a dependency without testing it before hand? How do you test a dependency without changing it? How do you test a dependency that is unreachable? It seems to me you are arbitrarily limiting this discussion to problems installing software on production servers. This is not reasonably real-life limitation. If a server is unreachable, it is unreachable even if you aren't installing production software on a server. It's equally unreachable if I need to download something for testing on my local machine. That's now all the energy I'm willing to spend on discussing this topic. Third-party hosting needs to go. I believe there is a broad consensus on this. Let's instead discuss *how* to implement it. //Lennart From donald at stufft.io Mon Mar 11 12:14:14 2013 From: donald at stufft.io (Donald Stufft) Date: Mon, 11 Mar 2013 07:14:14 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> Message-ID: <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> On Mar 11, 2013, at 2:09 AM, PJ Eby wrote: > On Sun, Mar 10, 2013 at 8:25 PM, Donald Stufft wrote: >> I don't think anyone is bad here, nor am I arguing against any particular person or group of people. I'm arguing against a practice and a system. You're going out of your way to find excuses to throw all sorts of stop energy here. > > Calling a legitimate disagreement with your point of view "stop > energy" seems inappropriate to me, since my issue is with you > derailing the topic of how to get people to *voluntarily* migrate to a > better situation than the present one, and to develop tools for that > process. The only thing I wish you to stop is the repeated assertion > without proof that 1) external links must go *and* 2) this must be an > enforced directive rather than a (highly-encouraged) option. 1) Proof of what? That it's insecure? That it harms uptime? That it violates people's privacy? I don't understand what you want here, do you want me to go and find insecure hosts and start boosting malware onto peoples machines? 2) Even a single project remaining causes the entire thing to cascade, Weakest Link Theory. > > I have even gone so far as to suggest, earlier in this thread, what > evidence I would find at least suggestive of your POV. But your > response to that and prior challenges to those assertions, has been > simply to move your goalpost. E.g. from "current uptime is bad" to > "any uptime lower than PyPI's is totally unacceptable". I outlined all 3 of the major reasons in my very first email. I've never changed them. > > I, on the other hand, have moved in the direction of *your* proposals > repeatedly, making adjustments as I find actually-convincing evidence > and/or reasoning, or find ways to deal with the issues. I have > compromised quite a bit. (And have already spent a fair amount of > time writing setuptools code to lay a foundation for these changes.) > > You, as far as I can tell, have not moved your position in the slightest. > > Which of these is "stop energy"? I've not been willing to compromise because none of the solutions presented solves all the actual issues. They just rearrange deck chairs on the titanic. > > It is not the case that external links must be removed from PyPI in > order to ensure security, or uptime. And it is *especially* not the > case that you are the BDFL of uptime. You're definitely not the BDFL > of uptime for any given project hosted on PyPI, that you *voluntarily > choose* to make a part of your build process. If your primary > argument is that project X must host its files on PyPI because of your > build process, then I think you misunderstand open source, and also > the part where you *chose* to make it part of your build process. It > certainly doesn't give you the right to force projects Y, Z, and Q -- > that you don't even use! -- to also host their projects on PyPI, > because project X -- the one you do use -- has a slow or unreliable > file host! > > It seems disingenuous to then shfit the argument back to security when > challenged on uptime, and back to uptime when challenged on security. > We've looped back and forth over those for some time: when I point out > that wheels have signatures which will make off-site hosting > relatively unimportant to the security picture, you jump back to > talking about uptime. When I point out that uptime is a consensual > factor that in no way justifies legislating what other people can do > with their projects, you go back to talking about security. > > Make up your mind. What problem are you actually trying to solve? All of them, as outlined in my original email. > > (I expect your response on wheels to be that wheels aren't there yet, > etc., but that isn't actually a response to the objection unless > you're going to change your position to, "okay, external links to file > formats that can be signed can stay," or something of that sort. > Otherwise, you're not actually compromising, just using the fact that > wheels aren't in common use yet as an argument to keep the position > you started with.) Signed releases solve 1/3 of the original issues and bring with them their own. How do you transmit the signatures? How do you decide which signatures are valid for any given file? There's a pretty complicated system written called TUF which handles some of these issues (but again it only solves 1/3 of them) and until we get that transmission of the signatures in a sane way is unlikely. > > >> My analogy served only to put into light that the system that I'm trying to change is insecure, just like allowing anyone to walk into a bank vault and pick up money would be insecure. I fully believe that the people using such a system are completely trustworthy people. But just because *they* are trustworthy doesn't mean that a system which allows *anyone* to attack other Python developers is *ok*. > > And my analogy served only to put into light the part where you're > insisting that one group of people change for the benefit of a group > which is already benefiting from their pre-existing generosity. > > That being said, I do see that I could have misinterpreted the intent > of your analogy -- it sounded like you were saying that the developers > who host off-PyPI were thieves walking into your bank and taking your > money (i.e., analogizing theft with inconveniencing you by making your > builds fail or run slowly). > > Though to be honest, I still don't comprehend how else to make any > kind of sense to that analogy in its original context. Who is the > bank? Whose money is being taken? The whole thing is utterly > confusing to me if I try to take it any other way than the way I did, > because it doesn't seem to have any other simple 1:1 mapping to the > situation, as far as I can see. Your explanation seems terribly > abstract and tortured to me, as far as analogies go. Bank == PyPI, People insisting that the bank vault remain open so they can walk in and grab their own money because it's easier == folks arguing for the existing solution because they don't want to change their release process. Combined this leaves the bank (and in the actual situation, PyPI) open to a number of issues. > > >> When discussing security of a system it's necessary to divorce yourself from the implementations of things. When you get wrapped up in the implementation you turn things into a Us vs Them game (as evidenced by several of your messages) instead of discussing the merits of the various systems and which ones serve the greatest needs of the community the best. > > I think you've got things backwards here. It's you who's been arguing > that the solution to the problem of "improved uptime and security" is > best implemented by "ban all non-PyPI hosting". It is I who has been > arguing that this is a premature judgment and rush to implementation, > without considering all of the design angles. And I am the one asking > you to stop insisting on this one implementation and state your > *actual* problem with external links. Read my first email. Security, uptime, privacy. Note security isn't just about changing out files either, there's a whole host of possible problems most of them documented here: https://www.updateframework.com/wiki/Docs/Security . It's true that his won't solve all of those issues immediately but it moves us to a position where we can start trying. > > (By which I mean, a problem stated such that, if you're given a > solution that *doesn't* involve banning them from PyPI, you aren't > going to rejigger the problem statement so that it once again requires > banning. That's moving the goalposts, and that's what keeps happening > in this discussion, at least as far as I can see. I, on the other > hand, have given you my actual problem with your proposal, and I have > not moved *my* goalposts. Instead, I've moved towards your position, > more than once. But I've moved as far towards it as I can go at this > time, without you providing any additional evidence or explanation or > *some* kind of engagement with the points that I've raised above that > you've previously ignored, in this thread and others.) ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From donald at stufft.io Mon Mar 11 12:18:30 2013 From: donald at stufft.io (Donald Stufft) Date: Mon, 11 Mar 2013 07:18:30 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <826D31AF-BE1C-4FC3-8FF9-EAC3B7D6EA54@mac.com> References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com> <826D31AF-BE1C-4FC3-8FF9-EAC3B7D6EA54@mac.com> Message-ID: <259AAEEB-CCAB-438A-9A64-AFF2450AA7DF@stufft.io> On Mar 11, 2013, at 4:33 AM, Ronald Oussoren wrote: > > On 11 Mar, 2013, at 9:18, Lennart Regebro wrote: > >> On Mon, Mar 11, 2013 at 9:06 AM, Ronald Oussoren wrote: >>> But this isn't necessarily true, there is another solution: mirror your requirements locally. >> >> I do that. This is not a solution, because your requirements yesterday >> is not your requirements tomorrow. > > So? When your requirements change you change the local mirror. > >> >>> Is it even clear why numerous archives aren't hosted on PyPI? >> >> No, the only one that has mentioned why is Marc-Andr?, I think, whose >> eGenix packages are distributed as binary packages for loads of >> different platforms. It's unclear to me if all these binary packages >> should be uploaded to PyPI, and it is also unclear to me why they >> can't be, it seems to be mostly a case of it being too much work. >> >> He also mentioned the big Python distributions eGenix does as being >> too large for PyPI, but I don't really see the point of uploading >> Python distributions to PyPI, they can't be installed with Python >> installers anyway. > > Some reasons I've seen mentioned in the past: > > * In some big companies it might be easier to publish archives on the company webserver than on PyPI due to truckloads of red tape on their part (not something we can fix) Publish your own PyPI. It's easy to do. You can even list your project on PyPI with instructions on how to add your company wide PyPI to someones deployment process. People just won't be able to automatically install from PyPI your software. > > * It is easier to publish all related archives in the same place for projects where the python package is just one component (for example client libraries for a network server) If it's too hard for the hypothetical you to push just the Python parts to PyPI then same answer as above. > > * Authors might not know it is possible to upload archives to PyPI > >> >>> IMHO it would be better to remove barriers than force projects to host files on PyPI. >> >> Nobody has really been able to point out any real barriers, so we >> don't know what they are or if they exist. > > It may be as simple as lack of knowledge (e.g. "I didn't know I could host files on PyPI"), > or unnecessary friction in the release proces. > > I guess the only way we will know why some authors don't upload archives to > PyPI is to ask (some of) them. > > Ronald > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From donald at stufft.io Mon Mar 11 12:32:25 2013 From: donald at stufft.io (Donald Stufft) Date: Mon, 11 Mar 2013 07:32:25 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <513DA277.2010809@egenix.com> References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com> <513DA277.2010809@egenix.com> Message-ID: On Mar 11, 2013, at 5:23 AM, "M.-A. Lemburg" wrote: > On 11.03.2013 09:18, Lennart Regebro wrote: >> On Mon, Mar 11, 2013 at 9:06 AM, Ronald Oussoren wrote: >>> But this isn't necessarily true, there is another solution: mirror your requirements locally. >> >> I do that. This is not a solution, because your requirements yesterday >> is not your requirements tomorrow. >> >>> Is it even clear why numerous archives aren't hosted on PyPI? >> >> No, the only one that has mentioned why is Marc-Andr?, I think, whose >> eGenix packages are distributed as binary packages for loads of >> different platforms. It's unclear to me if all these binary packages >> should be uploaded to PyPI, and it is also unclear to me why they >> can't be, it seems to be mostly a case of it being too much work. > > I've listed all the reasons in one of the previous emails: > > http://mail.python.org/pipermail/catalog-sig/2013-March/005502.html > > Others will likely have additional reasons, like e.g. > > * the PyPI uploads not being compatible to their release process I've offered to donate my free time to anyone whose release process actually dictates they can't upload to PyPI to move to one that is as close as possible to their current solution while also not requiring external hosting. > > * not knowing that it's possible to host files on PyPI - after > all it's an *index*, not a repository :-) I know your joking but if this is an actual limiting factor my next proposal will be to change the name >:]. > > * still believing that PyPI is an unreliable hosting provider > due the many downtimes and problems it had in the past - which > is no longer true today > > * not wanting to host and maintain files in several different > places Publish their own repository, put instructions on their PyPI page how to add it to a potential users deploy. > > * not wanting to host release files at all, i.e. have people > check out the version from a repository instead of doing > the download, unzip, install dance So don't? Include instructions. No one's proposal prevents people from listing projects on PyPI that aren't hosted there. It just means that if you don't want to host your things on PyPI you'll need to provide instructions for getting your files. > > * not wanting to separate associated library or product > code from the Python wrapper code (think e.g. the > Python interface for subversion) Same answer as before, either separate it or provide instructions on your PyPI index page. > > * not being allowed to upload files to external servers > by company policy, or having to deal with a company > policy that makes this difficult/unattractive Again with include instructions in the PyPI description. > > * having issues with the added latency of PyPI downloads compared > to a simple file based index hosted on a company web server This seems backwards. If they are upset with the latency why aren't they just installing directly from the index on the company web server? Why are they hitting PyPI at all? > > * having a strong need to know the number of downloads per > package and associated statistics such as downloads per > country, per year/month/day/hour Daily stats are published per filename. Doesn't include breakdowns per country though. I will fight for any statistic people actually want that doesn't expose sensitive information. (No IP addresses etc. Countries are fine etc.). > > * not wanting to give up access to the download log files That runs counter to privacy concerns. If this is an actual blocker then I suggest they run their own index again. > > * having a requirement to restrict downloads on a per country > basis, e.g. for export controlled software or software which > may not be imported/used in certain countries Don't host the files on PyPI, publish instructions for installing your software on PyPI. > > * having PyPI not provide the technical means to host the > release files, e.g. due to the releases using a format > which is not supported by PyPI (e.g. all the ActiveState > packages - http://code.activestate.com/pypm/) Open a discussion here about including your format, open a ticket tracker about including your format, submit a PR about including your format, host your own repository if it makes sense for your format (See active state again). > > * user experience/support issues: > if the package has external dependencies, > or needs special setup, it may provide a better user experience > to host the Python wrapper on the same page as the dependencies > and instructions on how to install them; rather than having > them on PyPI which lets people believe that a simple > "pip install something" will get them a working setup So don't host the files on PyPI, include your instructions on PyPI. > > Those are just a few things that come to mind. I'm sure there > are more issues that keep authors from uploading their > packages to PyPI. > > Overall, I think we should encourage people to make their > code available through PyPI and make it attractive to them, > but keep the possibility to continue using external hosting > platforms, should they run into issues that PyPI cannot > solve for them. This is a nice thought, but it doesn't work in practice because of the "Weakest Link Theory". Basically you're only as strong as your weakest link. The weakest link is any external package. > >> He also mentioned the big Python distributions eGenix does as being >> too large for PyPI, but I don't really see the point of uploading >> Python distributions to PyPI, they can't be installed with Python >> installers anyway. > > Not sure what you mean here. > > PyPI is also used to index Python projects which are not Python > packages to be installed by pip/easy_install/etc. That's fine. No one's saying you can't list a package on PyPI that doesn't include files. Just the external links won't be available on /simple/. > > Some of those may also want to > >>> IMHO it would be better to remove barriers than force projects to host files on PyPI. >> >> Nobody has really been able to point out any real barriers, so we >> don't know what they are or if they exist. > > Again, please see the email where I listed the ones affecting > at least eGenix. > > Most of those can be addressed in one way or another, e.g. > by having PyPI cache the files, provide access to the download > counts by country, provide a way to host separate indexes for > UCS2/UCS4 egg files, etc. > > The only issues that need more investigation are the PyPI license > terms and the general issue of not being able to host export > regulated files on PyPI. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 11 2013) >>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From jnoller at gmail.com Mon Mar 11 12:33:18 2013 From: jnoller at gmail.com (Jesse Noller) Date: Mon, 11 Mar 2013 07:33:18 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> Message-ID: Couldn't have said it better Donald. +1 On Mar 11, 2013, at 7:14 AM, Donald Stufft wrote: > > On Mar 11, 2013, at 2:09 AM, PJ Eby wrote: > >> On Sun, Mar 10, 2013 at 8:25 PM, Donald Stufft wrote: >>> I don't think anyone is bad here, nor am I arguing against any particular person or group of people. I'm arguing against a practice and a system. You're going out of your way to find excuses to throw all sorts of stop energy here. >> >> Calling a legitimate disagreement with your point of view "stop >> energy" seems inappropriate to me, since my issue is with you >> derailing the topic of how to get people to *voluntarily* migrate to a >> better situation than the present one, and to develop tools for that >> process. The only thing I wish you to stop is the repeated assertion >> without proof that 1) external links must go *and* 2) this must be an >> enforced directive rather than a (highly-encouraged) option. > > 1) Proof of what? That it's insecure? That it harms uptime? That it violates people's privacy? I don't understand what you want here, do you want me to go and find insecure hosts and start boosting malware onto peoples machines? > 2) Even a single project remaining causes the entire thing to cascade, Weakest Link Theory. > >> >> I have even gone so far as to suggest, earlier in this thread, what >> evidence I would find at least suggestive of your POV. But your >> response to that and prior challenges to those assertions, has been >> simply to move your goalpost. E.g. from "current uptime is bad" to >> "any uptime lower than PyPI's is totally unacceptable". > > I outlined all 3 of the major reasons in my very first email. I've never changed them. > >> >> I, on the other hand, have moved in the direction of *your* proposals >> repeatedly, making adjustments as I find actually-convincing evidence >> and/or reasoning, or find ways to deal with the issues. I have >> compromised quite a bit. (And have already spent a fair amount of >> time writing setuptools code to lay a foundation for these changes.) >> >> You, as far as I can tell, have not moved your position in the slightest. >> >> Which of these is "stop energy"? > > I've not been willing to compromise because none of the solutions presented solves all the actual issues. They just rearrange deck chairs on the titanic. > >> >> It is not the case that external links must be removed from PyPI in >> order to ensure security, or uptime. And it is *especially* not the >> case that you are the BDFL of uptime. You're definitely not the BDFL >> of uptime for any given project hosted on PyPI, that you *voluntarily >> choose* to make a part of your build process. If your primary >> argument is that project X must host its files on PyPI because of your >> build process, then I think you misunderstand open source, and also >> the part where you *chose* to make it part of your build process. It >> certainly doesn't give you the right to force projects Y, Z, and Q -- >> that you don't even use! -- to also host their projects on PyPI, >> because project X -- the one you do use -- has a slow or unreliable >> file host! >> >> It seems disingenuous to then shfit the argument back to security when >> challenged on uptime, and back to uptime when challenged on security. >> We've looped back and forth over those for some time: when I point out >> that wheels have signatures which will make off-site hosting >> relatively unimportant to the security picture, you jump back to >> talking about uptime. When I point out that uptime is a consensual >> factor that in no way justifies legislating what other people can do >> with their projects, you go back to talking about security. >> >> Make up your mind. What problem are you actually trying to solve? > > All of them, as outlined in my original email. > >> >> (I expect your response on wheels to be that wheels aren't there yet, >> etc., but that isn't actually a response to the objection unless >> you're going to change your position to, "okay, external links to file >> formats that can be signed can stay," or something of that sort. >> Otherwise, you're not actually compromising, just using the fact that >> wheels aren't in common use yet as an argument to keep the position >> you started with.) > > Signed releases solve 1/3 of the original issues and bring with them their own. How do you transmit the signatures? How do you decide which signatures are valid for any given file? There's a pretty complicated system written called TUF which handles some of these issues (but again it only solves 1/3 of them) and until we get that transmission of the signatures in a sane way is unlikely. > >> >> >>> My analogy served only to put into light that the system that I'm trying to change is insecure, just like allowing anyone to walk into a bank vault and pick up money would be insecure. I fully believe that the people using such a system are completely trustworthy people. But just because *they* are trustworthy doesn't mean that a system which allows *anyone* to attack other Python developers is *ok*. >> >> And my analogy served only to put into light the part where you're >> insisting that one group of people change for the benefit of a group >> which is already benefiting from their pre-existing generosity. >> >> That being said, I do see that I could have misinterpreted the intent >> of your analogy -- it sounded like you were saying that the developers >> who host off-PyPI were thieves walking into your bank and taking your >> money (i.e., analogizing theft with inconveniencing you by making your >> builds fail or run slowly). >> >> Though to be honest, I still don't comprehend how else to make any >> kind of sense to that analogy in its original context. Who is the >> bank? Whose money is being taken? The whole thing is utterly >> confusing to me if I try to take it any other way than the way I did, >> because it doesn't seem to have any other simple 1:1 mapping to the >> situation, as far as I can see. Your explanation seems terribly >> abstract and tortured to me, as far as analogies go. > > Bank == PyPI, People insisting that the bank vault remain open so they can walk in and grab their own money because it's easier == folks arguing for the existing solution because they don't want to change their release process. Combined this leaves the bank (and in the actual situation, PyPI) open to a number of issues. > >> >> >>> When discussing security of a system it's necessary to divorce yourself from the implementations of things. When you get wrapped up in the implementation you turn things into a Us vs Them game (as evidenced by several of your messages) instead of discussing the merits of the various systems and which ones serve the greatest needs of the community the best. >> >> I think you've got things backwards here. It's you who's been arguing >> that the solution to the problem of "improved uptime and security" is >> best implemented by "ban all non-PyPI hosting". It is I who has been >> arguing that this is a premature judgment and rush to implementation, >> without considering all of the design angles. And I am the one asking >> you to stop insisting on this one implementation and state your >> *actual* problem with external links. > > Read my first email. Security, uptime, privacy. Note security isn't just about changing out files either, there's a whole host of possible problems most of them documented here: https://www.updateframework.com/wiki/Docs/Security . It's true that his won't solve all of those issues immediately but it moves us to a position where we can start trying. > >> >> (By which I mean, a problem stated such that, if you're given a >> solution that *doesn't* involve banning them from PyPI, you aren't >> going to rejigger the problem statement so that it once again requires >> banning. That's moving the goalposts, and that's what keeps happening >> in this discussion, at least as far as I can see. I, on the other >> hand, have given you my actual problem with your proposal, and I have >> not moved *my* goalposts. Instead, I've moved towards your position, >> more than once. But I've moved as far towards it as I can go at this >> time, without you providing any additional evidence or explanation or >> *some* kind of engagement with the points that I've raised above that >> you've previously ignored, in this thread and others.) > > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig From ncoghlan at gmail.com Mon Mar 11 12:55:38 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 11 Mar 2013 21:55:38 +1000 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com> <513DA277.2010809@egenix.com> Message-ID: On Mon, Mar 11, 2013 at 9:32 PM, Donald Stufft wrote: > I know your joking but if this is an actual limiting factor my next proposal will be to change the name >:]. PyPR would not only be more accurate, it would actually get rid of the confusion with PyPy. We'd get a new pronunciation argument (Pie-pee-arr vs Pie-per) to corresponding with the existing one, though (Pie-pee-eye vs Pie-pie) Hell, the next generation of PyPI is going to have a different enough architecture for metadata distribution that a name change may be entirely appropriate :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From holger at merlinux.eu Mon Mar 11 12:57:57 2013 From: holger at merlinux.eu (holger krekel) Date: Mon, 11 Mar 2013 11:57:57 +0000 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <20130311100225.GL9677@merlinux.eu> References: <20130310150740.GE9677@merlinux.eu> <20130311100225.GL9677@merlinux.eu> Message-ID: <20130311115757.GP9677@merlinux.eu> Hi again, A correction on one point of my last mail to you, On Mon, Mar 11, 2013 at 10:02 +0000, holger krekel wrote: > > My suggestion would be to do two things: > > > > First, make the state a boolean: crawl external links, with the > > current state yes and the future state no, with "no" simply meaning > > that the rel="" attribute is removed from the links that currently > > have it. > > > > Second, propose to offer tools in the PyPI interface (and command > > line) to assist authors in making the transition, rather than > > proposing a completely unspecified caching mechanism. Better to have > > some vaguely specified tools than a completely unspecified caching > > mechanism, and better still to spell out very precisely what those > > tools do. > > This structure makes sense to me except that i see the need to start off with > "pypi-ext", i.e. a third state which encodes the current behaviour. Wait, your suggestion of a boolean "crawl external" set to yes would encode the current behaviour, so my "except" is invalid. > Thing is that the pypi.python.org doesn't have an extensive test > suite and we will thus need to rely on a few early adopters > using the tools/state-changes before starting phase 2 (mass mailings etc.). > Also in case of problems we can always switch back packages to the safe > "pypi-ext" mode. IOW, the motiviation for this third state is considering > the actual implementation process. This can also be done with your two-state suggestion (switching back to crawl=yes). So no disagreement on that either. best, holger From regebro at gmail.com Mon Mar 11 13:33:07 2013 From: regebro at gmail.com (Lennart Regebro) Date: Mon, 11 Mar 2013 13:33:07 +0100 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com> <513DA277.2010809@egenix.com> Message-ID: On Mon, Mar 11, 2013 at 12:55 PM, Nick Coghlan wrote: > On Mon, Mar 11, 2013 at 9:32 PM, Donald Stufft wrote: >> I know your joking but if this is an actual limiting factor my next proposal will be to change the name >:]. > > PyPR would not only be more accurate, it would actually get rid of the > confusion with PyPy. We'd get a new pronunciation argument > (Pie-pee-arr vs Pie-per) to corresponding with the existing one, > though (Pie-pee-eye vs Pie-pie) Hey! Are you a piper that is trying to lure us poor rats away from the cheeseshop? :-) //Lennart From rasky at develer.com Mon Mar 11 14:34:48 2013 From: rasky at develer.com (Giovanni Bajo) Date: Mon, 11 Mar 2013 14:34:48 +0100 Subject: [Catalog-sig] PyPI/pip security: waiting for input Message-ID: <5BB62E84-97E1-4C35-97D5-8F52095A348B@develer.com> Hi Justin, just a quick reminder that we are still waiting for you guys to move over and start actually doing something. Are you going to draft a document on how exactly we can use TUF within the context of pip + PyPI, with all the different concerns and thread models handled in my document? Thanks! -- Giovanni Bajo :: rasky at develer.com Develer S.r.l. :: http://www.develer.com My Blog: http://giovanni.bajo.it -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4346 bytes Desc: not available URL: From dholth at gmail.com Mon Mar 11 15:06:34 2013 From: dholth at gmail.com (Daniel Holth) Date: Mon, 11 Mar 2013 10:06:34 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com> <513DA277.2010809@egenix.com> Message-ID: It will probably wind up working more like every other package manager I'm familiar with, where you have a "sources.d" that lists the repositories you would like to search. Use Plone, add their repository to the list. We also seem to be making good progress on "contact the central repository much less often" by keeping local copies of the packages you actually need. The most frustrating thing about pypi being down was that you already had a virtualenv with all the packages you actually needed, but maybe you couldn't re-install them elsewhere without contacting pypi again. Wheel signatures are handy because they travel with the archive but the eventual security system will probably look very different, at most taking advantage of the feature when available but doing something else for sdists. The trust chain is the tricky part. From jcappos at poly.edu Mon Mar 11 15:17:36 2013 From: jcappos at poly.edu (Justin Cappos) Date: Mon, 11 Mar 2013 10:17:36 -0400 Subject: [Catalog-sig] PyPI/pip security: waiting for input In-Reply-To: <5BB62E84-97E1-4C35-97D5-8F52095A348B@develer.com> References: <5BB62E84-97E1-4C35-97D5-8F52095A348B@develer.com> Message-ID: Yes, we're finishing this up now. We have a working demo with TUF signing PyPI metadata and pip (integrated with TUF) correctly checking signatures, etc. Trishank: when do you plan to share this? Does Kon still have some integration tests to write to show we meet the use cases from Giovanni's document? Thanks, Justin On Mon, Mar 11, 2013 at 9:34 AM, Giovanni Bajo wrote: > Hi Justin, > > just a quick reminder that we are still waiting for you guys to move over > and start actually doing something. Are you going to draft a document on > how exactly we can use TUF within the context of pip + PyPI, with all the > different concerns and thread models handled in my document? > > Thanks! > -- > Giovanni Bajo :: rasky at develer.com > Develer S.r.l. :: http://www.develer.com > > My Blog: http://giovanni.bajo.it > > > > > > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rasky at develer.com Mon Mar 11 15:31:00 2013 From: rasky at develer.com (Giovanni Bajo) Date: Mon, 11 Mar 2013 15:31:00 +0100 Subject: [Catalog-sig] PyPI/pip security: waiting for input In-Reply-To: References: <5BB62E84-97E1-4C35-97D5-8F52095A348B@develer.com> Message-ID: <80DFF1BA-A6E1-48D6-AB51-FEAE07E20B6A@develer.com> Il giorno 11/mar/2013, alle ore 15:17, Justin Cappos ha scritto: > Yes, we're finishing this up now. We have a working demo with TUF signing PyPI metadata and pip (integrated with TUF) correctly checking signatures, etc. > > Trishank: when do you plan to share this? Does Kon still have some integration tests to write to show we meet the use cases from Giovanni's document? While the code is great, I'm mainly concerned with documenting the workflow and making sure it matches the proposed requirements: how to create a key, how to revoke it, how to use an offline list of authorized keys for installation of packages, etc. As I mentioned before, my proposal would only take me a few days to prototype (repeating this in case someone thinks that my proposal requires millions of man hours for any reason); I held it off waiting for a discussion with you. Relink to my proposal: https://docs.google.com/a/develer.com/document/d/1DgQdDCZY5LiTY5mvfxVVE4MTWiaqIGccK3QCUI8np4k/edit -- Giovanni Bajo :: rasky at develer.com Develer S.r.l. :: http://www.develer.com My Blog: http://giovanni.bajo.it -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4346 bytes Desc: not available URL: From jcappos at poly.edu Mon Mar 11 15:33:42 2013 From: jcappos at poly.edu (Justin Cappos) Date: Mon, 11 Mar 2013 10:33:42 -0400 Subject: [Catalog-sig] PyPI/pip security: waiting for input In-Reply-To: <80DFF1BA-A6E1-48D6-AB51-FEAE07E20B6A@develer.com> References: <5BB62E84-97E1-4C35-97D5-8F52095A348B@develer.com> <80DFF1BA-A6E1-48D6-AB51-FEAE07E20B6A@develer.com> Message-ID: Yep, we have the doc mostly together and are finishing it up / polishing it. We'll have something to you soon. We have a lightning talk set up at PyCon and will post all then at the latest. We do want to announce / share before then though. Justin On Mon, Mar 11, 2013 at 10:31 AM, Giovanni Bajo wrote: > Il giorno 11/mar/2013, alle ore 15:17, Justin Cappos > ha scritto: > > Yes, we're finishing this up now. We have a working demo with TUF > signing PyPI metadata and pip (integrated with TUF) correctly checking > signatures, etc. > > Trishank: when do you plan to share this? Does Kon still have some > integration tests to write to show we meet the use cases from Giovanni's > document? > > > While the code is great, I'm mainly concerned with documenting the > workflow and making sure it matches the proposed requirements: how to > create a key, how to revoke it, how to use an offline list of authorized > keys for installation of packages, etc. > > As I mentioned before, my proposal would only take me a few days to > prototype (repeating this in case someone thinks that my proposal requires > millions of man hours for any reason); I held it off waiting for a > discussion with you. > > Relink to my proposal: > > https://docs.google.com/a/develer.com/document/d/1DgQdDCZY5LiTY5mvfxVVE4MTWiaqIGccK3QCUI8np4k/edit > -- > Giovanni Bajo :: rasky at develer.com > Develer S.r.l. :: http://www.develer.com > > My Blog: http://giovanni.bajo.it > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholth at gmail.com Mon Mar 11 15:52:46 2013 From: dholth at gmail.com (Daniel Holth) Date: Mon, 11 Mar 2013 10:52:46 -0400 Subject: [Catalog-sig] PyPI/pip security: waiting for input In-Reply-To: <80DFF1BA-A6E1-48D6-AB51-FEAE07E20B6A@develer.com> References: <5BB62E84-97E1-4C35-97D5-8F52095A348B@develer.com> <80DFF1BA-A6E1-48D6-AB51-FEAE07E20B6A@develer.com> Message-ID: Super impressed after reading all the TUF papers and comparing it to my own feeble proposal, they had addressed a whole bevy of problems that I hadn't even thought of - infinite-length download attacks, server-asserted timestamps, quorum signatures, sophisticated trust delegation, consistency of all the metadata all the time ... From donald at stufft.io Mon Mar 11 15:53:38 2013 From: donald at stufft.io (Donald Stufft) Date: Mon, 11 Mar 2013 10:53:38 -0400 Subject: [Catalog-sig] PyPI/pip security: waiting for input In-Reply-To: References: <5BB62E84-97E1-4C35-97D5-8F52095A348B@develer.com> <80DFF1BA-A6E1-48D6-AB51-FEAE07E20B6A@develer.com> Message-ID: <933A6E76-7631-44CB-BD28-C4123C24E2CD@stufft.io> On Mar 11, 2013, at 10:52 AM, Daniel Holth wrote: > Super impressed after reading all the TUF papers and comparing it to > my own feeble proposal, they had addressed a whole bevy of problems > that I hadn't even thought of - infinite-length download attacks, > server-asserted timestamps, quorum signatures, sophisticated trust > delegation, consistency of all the metadata all the time ... > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig Agreed, and they've been very helpful with questions when asked. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From pje at telecommunity.com Mon Mar 11 17:12:12 2013 From: pje at telecommunity.com (PJ Eby) Date: Mon, 11 Mar 2013 12:12:12 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> Message-ID: On Mon, Mar 11, 2013 at 7:14 AM, Donald Stufft wrote: > 1) Proof of what? That it's insecure? That it harms uptime? That it violates people's privacy? That any of those things apply to anybody who *isn't using those packages*. Without this, you are only providing a reason to encourage people to change, not to force them to do so. > 2) Even a single project remaining causes the entire thing to cascade Cascade *how*? Please explain. From tseaver at palladion.com Mon Mar 11 17:21:02 2013 From: tseaver at palladion.com (Tres Seaver) Date: Mon, 11 Mar 2013 12:21:02 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/11/2013 02:23 AM, Lennart Regebro wrote: > The uptime problem is *only* solvable by minimizing the number of > hosts involved. The minimum number of hosts is one. That means we > should get all releases onto PyPI. Uptime for *production* use is a red herring here. Anybody who needs uptime should be maintaining their own deployment-specific index, or paying somebody to do that for them. Period. Anybody who needs that kind of uptime *also* needs insulation from other factors which PyPI project authors can inject into the equation, regardless of PyPIs uptime (or any external host). - - Uploading undocumented backward-incompatible changes in third-dot releases. - - Uploading a new feature release which injects new security vulnerabilities (think of the Ruby-YAML stuff). - - Deleting distributions or releases. - - Re-uploading a *different* tarball over the top of an existing one (wihtout bumping the version). Not to mention the possibility of uploaded trojans / malware when a developer loses control of his laptop / keys, etc. to a hostile actor. PyPI's uptime is primarily important for *development* use cases, not for deployment / operations, and in those cases convenience, safety, and community building are as important as uptime (consumers of FLOSS don't have any SLA with the producers). At a sprint, for instance, it is obnoxious to have a dependency with external files on a slow or hanging hsot: it breaks the repeatability of builds, as well as damaging the velocity of the sprint. But the sprinters do *not* have recourse (other than complaining loudly) for such cases, where they have chosen to rely on PyPI or the external sites for quick and convenient discovery of those dependencies, instead of going to the trouble to create a curated index for their own use. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iEYEARECAAYFAlE+BG4ACgkQ+gerLs4ltQ5bpgCgzT12UDoqjsaXTBWS5CYuglkI n0wAnjl0+b/9RZpaUetSBDPovg9fGY+I =G56Q -----END PGP SIGNATURE----- From regebro at gmail.com Mon Mar 11 17:45:15 2013 From: regebro at gmail.com (Lennart Regebro) Date: Mon, 11 Mar 2013 17:45:15 +0100 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> Message-ID: On Mon, Mar 11, 2013 at 5:12 PM, PJ Eby wrote: > On Mon, Mar 11, 2013 at 7:14 AM, Donald Stufft wrote: >> 1) Proof of what? That it's insecure? That it harms uptime? That it violates people's privacy? > > That any of those things apply to anybody who *isn't using those packages*. If nobody is using the packages, it does indeed harm no-one. //Lennart From tk47 at students.poly.edu Mon Mar 11 18:09:41 2013 From: tk47 at students.poly.edu (Trishank Karthik Kuppusamy) Date: Mon, 11 Mar 2013 13:09:41 -0400 Subject: [Catalog-sig] PyPI/pip security: waiting for input In-Reply-To: References: <5BB62E84-97E1-4C35-97D5-8F52095A348B@develer.com> Message-ID: <513E0FD5.9020000@students.poly.edu> Hello everyone, On 3/11/13 10:17 AM, Justin Cappos wrote: > Yes, we're finishing this up now. We have a working demo with TUF > signing PyPI metadata and pip (integrated with TUF) correctly checking > signatures, etc. Yes, and we are excited to be sharing this very soon! > Trishank: when do you plan to share this? Does Kon still have some > integration tests to write to show we meet the use cases from Giovanni's > document? I have the demo up and running, and I just need to get the documentation together. Complicating this is that I have a midterm tomorrow, but I should have the basic documentation together by today. Let me get back to you then! Thanks, Trishank From pje at telecommunity.com Mon Mar 11 18:42:29 2013 From: pje at telecommunity.com (PJ Eby) Date: Mon, 11 Mar 2013 13:42:29 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> Message-ID: On Mon, Mar 11, 2013 at 12:45 PM, Lennart Regebro wrote: > On Mon, Mar 11, 2013 at 5:12 PM, PJ Eby wrote: >> On Mon, Mar 11, 2013 at 7:14 AM, Donald Stufft wrote: >>> 1) Proof of what? That it's insecure? That it harms uptime? That it violates people's privacy? >> >> That any of those things apply to anybody who *isn't using those packages*. > > If nobody is using the packages, it does indeed harm no-one. Then there is no reason to ban them. From regebro at gmail.com Mon Mar 11 18:45:37 2013 From: regebro at gmail.com (Lennart Regebro) Date: Mon, 11 Mar 2013 18:45:37 +0100 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> Message-ID: On Mon, Mar 11, 2013 at 6:42 PM, PJ Eby wrote: > On Mon, Mar 11, 2013 at 12:45 PM, Lennart Regebro wrote: >> On Mon, Mar 11, 2013 at 5:12 PM, PJ Eby wrote: >>> On Mon, Mar 11, 2013 at 7:14 AM, Donald Stufft wrote: >>>> 1) Proof of what? That it's insecure? That it harms uptime? That it violates people's privacy? >>> >>> That any of those things apply to anybody who *isn't using those packages*. >> >> If nobody is using the packages, it does indeed harm no-one. > > Then there is no reason to ban them. So, we should not remove the links for external packages until somebody traverses those links? But as soon as somebody asks for those links, we should remove them? In fact before we give them the link? That to me, is indistinguishable from removing the links. //Lennart From pje at telecommunity.com Mon Mar 11 20:57:30 2013 From: pje at telecommunity.com (PJ Eby) Date: Mon, 11 Mar 2013 15:57:30 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> Message-ID: On Mon, Mar 11, 2013 at 1:45 PM, Lennart Regebro wrote: > So, we should not remove the links for external packages until > somebody traverses those links? But as soon as somebody asks for those > links, we should remove them? In fact before we give them the link? I'm saying that if someone objects to the presence of links they don't actually use, they are speaking nonsense. Might as well ask to ban all packages from PyPI that they don't personally like -- it's the same request. Nobody is forcing you to depend on packages that don't host on PyPI, so there is no point to the censorship. If you don't use the links, you can't argue that their presence is causing you harm. From carl at oddbird.net Mon Mar 11 21:07:50 2013 From: carl at oddbird.net (Carl Meyer) Date: Mon, 11 Mar 2013 14:07:50 -0600 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> Message-ID: <513E3996.10203@oddbird.net> On 03/11/2013 01:57 PM, PJ Eby wrote: > I'm saying that if someone objects to the presence of links they > don't actually use, they are speaking nonsense. Might as well ask to > ban all packages from PyPI that they don't personally like -- it's the > same request. Nobody is forcing you to depend on packages that don't > host on PyPI, so there is no point to the censorship. > > If you don't use the links, you can't argue that their presence is > causing you harm. You can, of course, argue that the mere presence of those links (combined with the current behavior of easy_install/pip) is an "attractive nuisance" that indirectly causes harm to unsuspecting new users of Python who never even consider the possibility that tools like easy_install and pip might spider off PyPI to arbitrary websites (a reasonable assumption based on experience with automatic installation toolchains and software repositories in other communities). I've talked to many such users, so there is no question that they exist, and I think probably in significant numbers. Carl From pje at telecommunity.com Mon Mar 11 22:15:08 2013 From: pje at telecommunity.com (PJ Eby) Date: Mon, 11 Mar 2013 17:15:08 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <513E3996.10203@oddbird.net> References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513E3996.10203@oddbird.net> Message-ID: On Mon, Mar 11, 2013 at 4:07 PM, Carl Meyer wrote: > On 03/11/2013 01:57 PM, PJ Eby wrote: >> I'm saying that if someone objects to the presence of links they >> don't actually use, they are speaking nonsense. Might as well ask to >> ban all packages from PyPI that they don't personally like -- it's the >> same request. Nobody is forcing you to depend on packages that don't >> host on PyPI, so there is no point to the censorship. >> >> If you don't use the links, you can't argue that their presence is >> causing you harm. > > You can, of course, argue that the mere presence of those links > (combined with the current behavior of easy_install/pip) is an > "attractive nuisance" that indirectly causes harm to unsuspecting new > users of Python who never even consider the possibility that tools like > easy_install and pip might spider off PyPI to arbitrary websites Which is why I think removing rel="" spidering is a good idea. In fact, I'm the one who suggested that. I also suggested moving to turning it off by default in future versions of easy_install, adding warnings, etc. But that's not the same thing as agreeing that it should be *banned* for people to publish machine-readable download information on PyPI for a file that's hosted off-PyPI. ISTM that Python's "consenting adults" standard sets a higher bar for banning a feature than it does for marking it, "here there be dragons" and offering a better alternative. Heck, even in Python the language, the mere removal of a feature in a new version of Python, doesn't stop people from continuing to use the old one. Here we're talking about infrastructure that everybody uses; it's not like there's a PyPI X.1 that people can keep using if X.2 comes out. From donald at stufft.io Mon Mar 11 22:26:39 2013 From: donald at stufft.io (Donald Stufft) Date: Mon, 11 Mar 2013 17:26:39 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <513E3996.10203@oddbird.net> References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513E3996.10203@oddbird.net> Message-ID: <51132ED7-CBD5-441A-838D-D83B50A3C983@stufft.io> On Mar 11, 2013, at 4:07 PM, Carl Meyer wrote: > On 03/11/2013 01:57 PM, PJ Eby wrote: >> I'm saying that if someone objects to the presence of links they >> don't actually use, they are speaking nonsense. Might as well ask to >> ban all packages from PyPI that they don't personally like -- it's the >> same request. Nobody is forcing you to depend on packages that don't >> host on PyPI, so there is no point to the censorship. >> >> If you don't use the links, you can't argue that their presence is >> causing you harm. > > You can, of course, argue that the mere presence of those links > (combined with the current behavior of easy_install/pip) is an > "attractive nuisance" that indirectly causes harm to unsuspecting new > users of Python who never even consider the possibility that tools like > easy_install and pip might spider off PyPI to arbitrary websites (a > reasonable assumption based on experience with automatic installation > toolchains and software repositories in other communities). I've talked > to many such users, so there is no question that they exist, and I think > probably in significant numbers. > > Carl > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig Since it was asked I had ran a script to see which projects/versions that my earlier script had identified as not being hosted on PyPI to determine _where_ people are hosting these files. These statistics include dev releases. There are 10538 total external file links that pip locates that do not exist on PyPI. Of these here is the top 20: (u'downloads.tryton.org', 1201), (u'github.com', 811), (u'bitbucket.org', 428), (u'launchpad.net', 279), (u'www.doughellmann.com', 255), (u'walco.n--tree.net', 161), (u'prdownloads.sourceforge.net', 156), (u'infrae.com', 150), (u'downloads.sourceforge.net', 139), (u'keepnote.org', 138), (u'downloads.reviewboard.org', 124), (u'tilestache.org', 121), (u'mercurial.selenic.com', 120), (u'www.defuze.org', 85), (u'www.vicbioinformatics.com', 74), (u'downloads.review-board.org', 70), (u'samba.org', 70), (u'python-graph.googlecode.com', 67), (u'cyberelk.net', 65), (u'tuohela.net', 61), I suspect that a lot of the github, bitbucket etc links are dev links (of which there are roughly 420 total). Here is the complete listing: https://gist.github.com/dstufft/5137885 I ran a minor bit of heuristics to see how many were not hosted in one of the big name hosting sites, >>> sum([x[1] for x in b if not "github.com" in x[0] and "bitbucket.org" not in x[0] and "google" not in x[0] and "sourceforge" not in x[0]]) 7097 ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From tk47 at students.poly.edu Mon Mar 11 23:18:37 2013 From: tk47 at students.poly.edu (Trishank Karthik Kuppusamy) Date: Mon, 11 Mar 2013 18:18:37 -0400 Subject: [Catalog-sig] PyPI/pip security: waiting for input In-Reply-To: <513E0FD5.9020000@students.poly.edu> References: <5BB62E84-97E1-4C35-97D5-8F52095A348B@develer.com> <513E0FD5.9020000@students.poly.edu> Message-ID: <513E583D.4080306@students.poly.edu> On 03/11/2013 01:09 PM, Trishank Karthik Kuppusamy wrote: > > I have the demo up and running, and I just need to get the documentation > together. Complicating this is that I have a midterm tomorrow, but I > should have the basic documentation together by today. Let me get back > to you then! We are working on the documentation and integration tests, and barring unexpected circumstances, we hope to show you a well-documented demo of PyPI + TUF + pip tomorrow. Actually, many pieces of the documentation and tests are already online, but we want to glue them all together and complete the missing pieces before showing them to you. We thank you for your patience and continued interest. -Trishank From pje at telecommunity.com Tue Mar 12 00:04:27 2013 From: pje at telecommunity.com (PJ Eby) Date: Mon, 11 Mar 2013 19:04:27 -0400 Subject: [Catalog-sig] A 90% Solution Message-ID: Just a thought, but... If 90% of PyPI projects do not have any external files to download, then, wouldn't it make sense to: 1. Add a project-level option to enable or disable the adding of the rel="" attribute to /simple links (but not affecting the links in any other way) 2. Default it to disabled for new projects, and 3. Set it to disabled *now* for the 90% of projects that *don't have external files*? If the arguments about banning external links are as valid and important as some people claim, wouldn't it make sense to do this part *now*, without first requiring a commitment to force the switch to a disabled state in the future? Immediately, 90% of the problem goes away - no random spidering of stuff that doesn't contain a link now, but which could be taken over by a malicious party in the future, and 90% fewer sites having to be up in order for you to build something from PyPI. Seems like a serious win to me -- and one that might not even need a PEP. Next steps after this would be providing tools to help people move their files and links, promoting that people switch it off if they no longer support the offsite links, educating about security concerns, etc. I really don't understand why the 90% solution isn't *already* the consensus position, since it doesn't preclude follow-on efforts towards reducing the 10% towards 0%. And if the problem is so important, why must we keep 90% of the problems in place, just so we can keep arguing about censoring the 10%? That doesn't make sense to me. To me, if somebody's injured, the first thing you do is clean and close the wound, not argue about whether it's a complete solution and what might happen days or weeks later. Just a thought. From donald at stufft.io Tue Mar 12 00:39:50 2013 From: donald at stufft.io (Donald Stufft) Date: Mon, 11 Mar 2013 19:39:50 -0400 Subject: [Catalog-sig] A 90% Solution In-Reply-To: References: Message-ID: On Mar 11, 2013, at 7:04 PM, PJ Eby wrote: > Just a thought, but... > > If 90% of PyPI projects do not have any external files to download, > then, wouldn't it make sense to: To be accurate it's 90% don't have any files/release available *only* externally. Most have external files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't. > > 1. Add a project-level option to enable or disable the adding of the > rel="" attribute to /simple links (but not affecting the links in any > other way) > 2. Default it to disabled for new projects, and > 3. Set it to disabled *now* for the 90% of projects that *don't have > external files*? +1 except 1. should be to remove the links entirely from the /simple/ index, not to just remove the rel attribute. > > If the arguments about banning external links are as valid and > important as some people claim, wouldn't it make sense to do this part > *now*, without first requiring a commitment to force the switch to a > disabled state in the future? > > Immediately, 90% of the problem goes away - no random spidering of > stuff that doesn't contain a link now, but which could be taken over > by a malicious party in the future, and 90% fewer sites having to be > up in order for you to build something from PyPI. > > Seems like a serious win to me -- and one that might not even need a PEP. Absolutely, and similar to something I asked Richard at the start of this, I'm waiting on an OK from someone with authority that they'd merge such a change and I'll have a PR out for it asap after that. > > Next steps after this would be providing tools to help people move > their files and links, promoting that people switch it off if they no > longer support the offsite links, educating about security concerns, > etc. > > I really don't understand why the 90% solution isn't *already* the > consensus position, since it doesn't preclude follow-on efforts > towards reducing the 10% towards 0%. > > And if the problem is so important, why must we keep 90% of the > problems in place, just so we can keep arguing about censoring the > 10%? That doesn't make sense to me. > > To me, if somebody's injured, the first thing you do is clean and > close the wound, not argue about whether it's a complete solution and > what might happen days or weeks later. Like I said above, I'm just waiting on an ok that this has a chance of landing before bothering to implement it. > > Just a thought. > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From ncoghlan at gmail.com Tue Mar 12 00:50:52 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 12 Mar 2013 09:50:52 +1000 Subject: [Catalog-sig] A 90% Solution In-Reply-To: References: Message-ID: Richard's in transit at the moment and I'm about to be, but this sounds worth doing to me. I say send the pull request :) Cheers, Nick. On 12 Mar 2013 09:42, "Donald Stufft" wrote: > > On Mar 11, 2013, at 7:04 PM, PJ Eby wrote: > > > Just a thought, but... > > > > If 90% of PyPI projects do not have any external files to download, > > then, wouldn't it make sense to: > > To be accurate it's 90% don't have any files/release available *only* > externally. Most have external files to download because it's very rare > that a project doesn't include an home_page or a download_url, especially > since distutils complains if you don't. > > > > > 1. Add a project-level option to enable or disable the adding of the > > rel="" attribute to /simple links (but not affecting the links in any > > other way) > > 2. Default it to disabled for new projects, and > > 3. Set it to disabled *now* for the 90% of projects that *don't have > > external files*? > > +1 except 1. should be to remove the links entirely from the /simple/ > index, not to just remove the rel attribute. > > > > > If the arguments about banning external links are as valid and > > important as some people claim, wouldn't it make sense to do this part > > *now*, without first requiring a commitment to force the switch to a > > disabled state in the future? > > > > Immediately, 90% of the problem goes away - no random spidering of > > stuff that doesn't contain a link now, but which could be taken over > > by a malicious party in the future, and 90% fewer sites having to be > > up in order for you to build something from PyPI. > > > > Seems like a serious win to me -- and one that might not even need a PEP. > > Absolutely, and similar to something I asked Richard at the start of this, > I'm waiting on an OK from someone with authority that they'd merge such a > change and I'll have a PR out for it asap after that. > > > > > Next steps after this would be providing tools to help people move > > their files and links, promoting that people switch it off if they no > > longer support the offsite links, educating about security concerns, > > etc. > > > > I really don't understand why the 90% solution isn't *already* the > > consensus position, since it doesn't preclude follow-on efforts > > towards reducing the 10% towards 0%. > > > > And if the problem is so important, why must we keep 90% of the > > problems in place, just so we can keep arguing about censoring the > > 10%? That doesn't make sense to me. > > > > To me, if somebody's injured, the first thing you do is clean and > > close the wound, not argue about whether it's a complete solution and > > what might happen days or weeks later. > > Like I said above, I'm just waiting on an ok that this has a chance of > landing before bothering to implement it. > > > > > Just a thought. > > _______________________________________________ > > Catalog-SIG mailing list > > Catalog-SIG at python.org > > http://mail.python.org/mailman/listinfo/catalog-sig > > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 > DCFA > > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Tue Mar 12 01:12:09 2013 From: pje at telecommunity.com (PJ Eby) Date: Mon, 11 Mar 2013 20:12:09 -0400 Subject: [Catalog-sig] A 90% Solution In-Reply-To: References: Message-ID: On Mon, Mar 11, 2013 at 7:39 PM, Donald Stufft wrote: > > On Mar 11, 2013, at 7:04 PM, PJ Eby wrote: > >> Just a thought, but... >> >> If 90% of PyPI projects do not have any external files to download, >> then, wouldn't it make sense to: > > To be accurate it's 90% don't have any files/release available *only* externally. Most have external files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't. So what is the % of projects for whom the option can be disabled automatically, *without* disabling automated downloadability of a project's externally hosted files? Your statement is confusing to me, because the having of a home page or download URL doesn't have anything to do with whether that page has any files to download from it. I am saying that if a project has no *downloadable* files (not web pages) whose links can only be found by spidering, then we can turn off the rel attribute. How many projects do not have any download links listed on their rel=""-linked pages? >> 1. Add a project-level option to enable or disable the adding of the >> rel="" attribute to /simple links (but not affecting the links in any >> other way) >> 2. Default it to disabled for new projects, and >> 3. Set it to disabled *now* for the 90% of projects that *don't have >> external files*? > > +1 except 1. should be to remove the links entirely from the /simple/ > index, not to just remove the rel attribute. -1, since sometimes download links are in fact *download links*. So this design choice would unncessarily limit the number of projects for whom the option could be applied automatically and immediately. That is, a project with a download link of "foobar.com/foobar-1.2.tgz" would no longer be usable if you removed the download link from the /simple index, but would remain usable if the rel attribute were removed. From donald at stufft.io Tue Mar 12 01:23:12 2013 From: donald at stufft.io (Donald Stufft) Date: Mon, 11 Mar 2013 20:23:12 -0400 Subject: [Catalog-sig] A 90% Solution In-Reply-To: References: Message-ID: <7981DD31-7868-4E24-94E2-1F80ACCB46C8@stufft.io> On Mar 11, 2013, at 8:12 PM, PJ Eby wrote: > On Mon, Mar 11, 2013 at 7:39 PM, Donald Stufft wrote: >> >> On Mar 11, 2013, at 7:04 PM, PJ Eby wrote: >> >>> Just a thought, but... >>> >>> If 90% of PyPI projects do not have any external files to download, >>> then, wouldn't it make sense to: >> >> To be accurate it's 90% don't have any files/release available *only* externally. Most have external files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't. > > So what is the % of projects for whom the option can be disabled > automatically, *without* disabling automated downloadability of a > project's externally hosted files? > > Your statement is confusing to me, because the having of a home page > or download URL doesn't have anything to do with whether that page has > any files to download from it. I didn't differentiate between spidering or direct links to external files. I simply iterated over all files that the pip PackageFinder was able to find, figured out the version for each url, and stored if that version came a link to a pypi.python.org resource or a different domain. I then diffed the two lists to get a list of versions that are _only_ installable externally. That 90% is 90% who can have *all* links what so ever besides ones hosted on PyPI itself removed and not have any versions be no longer installable. > > I am saying that if a project has no *downloadable* files (not web > pages) whose links can only be found by spidering, then we can turn > off the rel attribute. > > How many projects do not have any download links listed on their > rel=""-linked pages? > > >>> 1. Add a project-level option to enable or disable the adding of the >>> rel="" attribute to /simple links (but not affecting the links in any >>> other way) >>> 2. Default it to disabled for new projects, and >>> 3. Set it to disabled *now* for the 90% of projects that *don't have >>> external files*? >> >> +1 except 1. should be to remove the links entirely from the /simple/ >> index, not to just remove the rel attribute. > > -1, since sometimes download links are in fact *download links*. So > this design choice would unncessarily limit the number of projects for > whom the option could be applied automatically and immediately. > > That is, a project with a download link of "foobar.com/foobar-1.2.tgz" > would no longer be usable if you removed the download link from the > /simple index, but would remain usable if the rel attribute were > removed. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From mal at egenix.com Tue Mar 12 01:28:30 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 12 Mar 2013 01:28:30 +0100 Subject: [Catalog-sig] A 90% Solution In-Reply-To: References: Message-ID: <513E76AE.10601@egenix.com> On 12.03.2013 00:39, Donald Stufft wrote: > > On Mar 11, 2013, at 7:04 PM, PJ Eby wrote: > >> Just a thought, but... >> >> If 90% of PyPI projects do not have any external files to download, >> then, wouldn't it make sense to: > > To be accurate it's 90% don't have any files/release available *only* externally. Most have external files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't. How are you going to verify that disabling the links on those projects won't make certain release versions of those packages unavailable for pip/easy_install ? How are you planing to inform the package authors of that change, so that they can take corrective action ? Which options would be available for authors ? PyPI is a much too important Python resource to play around with. We need a good understanding of the effects a change may have and provide ways to deal with them, before putting a change, which potentially breaks hundreds of packages, into production. So yeah, just a thought ;-) >> 1. Add a project-level option to enable or disable the adding of the >> rel="" attribute to /simple links (but not affecting the links in any >> other way) >> 2. Default it to disabled for new projects, and >> 3. Set it to disabled *now* for the 90% of projects that *don't have >> external files*? > > +1 except 1. should be to remove the links entirely from the /simple/ > index, not to just remove the rel attribute. Removing those links removes the possibility of tools to still download or display information based on those links, e.g. to build a semantic web of Python resources. Please remember that the /simple/ index is part of the PyPI API, so it needs to be handled with the same care as the rest of the PyPI APIs. If you want to experiment with new ways of building the index, I'd suggest to first experiment with a new index, say /simple-v2/, before touching the main /simple/ index. Regarding the links, it's probably better to not remove the rel="" attributes but instead change them from rel="download" to e.g. rel="external-download"; or to keep the old index semantics around as /simple-v1/. This keeps the valuable semantic relation available for tools that want to use it. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 12 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Tue Mar 12 01:32:47 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 12 Mar 2013 01:32:47 +0100 Subject: [Catalog-sig] A 90% Solution In-Reply-To: <7981DD31-7868-4E24-94E2-1F80ACCB46C8@stufft.io> References: <7981DD31-7868-4E24-94E2-1F80ACCB46C8@stufft.io> Message-ID: <513E77AF.6050408@egenix.com> On 12.03.2013 01:23, Donald Stufft wrote: > > On Mar 11, 2013, at 8:12 PM, PJ Eby wrote: > >> On Mon, Mar 11, 2013 at 7:39 PM, Donald Stufft wrote: >>> >>> On Mar 11, 2013, at 7:04 PM, PJ Eby wrote: >>> >>>> Just a thought, but... >>>> >>>> If 90% of PyPI projects do not have any external files to download, >>>> then, wouldn't it make sense to: >>> >>> To be accurate it's 90% don't have any files/release available *only* externally. Most have external files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't. >> >> So what is the % of projects for whom the option can be disabled >> automatically, *without* disabling automated downloadability of a >> project's externally hosted files? >> >> Your statement is confusing to me, because the having of a home page >> or download URL doesn't have anything to do with whether that page has >> any files to download from it. > > I didn't differentiate between spidering or direct links to external files. I simply iterated over all files that the pip PackageFinder was able to find, figured out the version for each url, and stored if that version came a link to a pypi.python.org resource or a different domain. I then diffed the two lists to get a list of versions that are _only_ installable externally. That 90% is 90% who can have *all* links what so ever besides ones hosted on PyPI itself removed and not have any versions be no longer installable. Which kinds of distribution files can pip's PackageFinder find ? Does it find MSIs, EXEs, egg files ? AFAIK, it only supports .tar.gz and .zip files, but no binary files (except for the new .whl binary format). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 12 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From pje at telecommunity.com Tue Mar 12 03:46:15 2013 From: pje at telecommunity.com (PJ Eby) Date: Mon, 11 Mar 2013 22:46:15 -0400 Subject: [Catalog-sig] A 90% Solution In-Reply-To: <513E76AE.10601@egenix.com> References: <513E76AE.10601@egenix.com> Message-ID: On Mon, Mar 11, 2013 at 8:28 PM, M.-A. Lemburg wrote: > On 12.03.2013 00:39, Donald Stufft wrote: >> >> On Mar 11, 2013, at 7:04 PM, PJ Eby wrote: >> >>> Just a thought, but... >>> >>> If 90% of PyPI projects do not have any external files to download, >>> then, wouldn't it make sense to: >> >> To be accurate it's 90% don't have any files/release available *only* externally. Most have external files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't. > > How are you going to verify that disabling the links > on those projects won't make certain release versions of > those packages unavailable for pip/easy_install ? I'm not sure if you're asking Donald or me here. My proposal was to only automatically disable the rel attributes for links to pages that do *not* contain any easy_install or pip-able download links. So, by definition, this would not make any releases unavailable. As for what Donald is proposing, I honestly have no idea what he's talking about, or whether the 90% statistic actually applies for what I'm proposing. So it's possible that it might be a lot less than 90% that my proposal would be able to affect *instantly*, without contacting any authors. > How are you planing to inform the package authors of that > change, so that they can take corrective action ? > > Which options would be available for authors ? Do see my proposal again, which was simply that there be a switch to enable or disable the rel attributes, that it default off for new packages, and be switched to off for exactly that set of packages which would not result in the loss of access to any download files. There is, at this point, the question of how to handle projects that have some of their releases hosted externally, or with some of the files external and some not. I would prefer that any automated changeover apply only to packages where the set of discoverable links is exactly equal to the links found on the project's /simple page. > Regarding the links, it's probably better to not > remove the rel="" attributes but instead change them > from rel="download" to e.g. rel="external-download"; > or to keep the old index semantics around as /simple-v1/. > This keeps the valuable semantic relation available for > tools that want to use it. For what? If you must keep them, rel="disabled-homepage" etc. would get the message across. But I really don't see the point, and I *invented* the bloody things. Frankly, I'm more than prepared to toss the rel attributes altogether, after adequate notice is given for people to move their files or links to the files. I just don't want any changes in the *rest* of the /simple generation algorithm. From regebro at gmail.com Tue Mar 12 06:20:20 2013 From: regebro at gmail.com (Lennart Regebro) Date: Tue, 12 Mar 2013 06:20:20 +0100 Subject: [Catalog-sig] A 90% Solution In-Reply-To: References: Message-ID: On Tue, Mar 12, 2013 at 12:04 AM, PJ Eby wrote: > Just a thought, but... > > If 90% of PyPI projects do not have any external files to download, > then, wouldn't it make sense to: > > 1. Add a project-level option to enable or disable the adding of the > rel="" attribute to /simple links (but not affecting the links in any > other way) > 2. Default it to disabled for new projects, and > 3. Set it to disabled *now* for the 90% of projects that *don't have > external files*? That doesn't solve the problem, but it would make easy_install faster, so +1 > Immediately, 90% of the problem goes away That's not 90% of the problem. The problem with externally hosted files is not primarily that easy_install gets slower. > stuff that doesn't contain a link now, but which could be taken over > by a malicious party in the future, and 90% fewer sites having to be > up in order for you to build something from PyPI. Well, if the sites that do not contain the packages are down, that only results in the build be *really* slow, it doesn't fail. It's when the sites which *are* hosting packages are down that the build fails. //Lennart From regebro at gmail.com Tue Mar 12 06:25:08 2013 From: regebro at gmail.com (Lennart Regebro) Date: Tue, 12 Mar 2013 06:25:08 +0100 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> Message-ID: On Mon, Mar 11, 2013 at 8:57 PM, PJ Eby wrote: > On Mon, Mar 11, 2013 at 1:45 PM, Lennart Regebro wrote: >> So, we should not remove the links for external packages until >> somebody traverses those links? But as soon as somebody asks for those >> links, we should remove them? In fact before we give them the link? > > I'm saying that if someone objects to the presence of links they > don't actually use, they are speaking nonsense. Might as well ask to > ban all packages from PyPI that they don't personally like -- it's the > same request. Nobody is forcing you to depend on packages that don't > host on PyPI, so there is no point to the censorship. > > If you don't use the links, you can't argue that their presence is > causing you harm. Externally hosted files are a real world actual problem. We can only solve it by not having externally hosted files. This discussion has since a long time gone past reason into pure stop energy. I'm not wasting more energy on it. //Lennart From mal at egenix.com Tue Mar 12 08:57:22 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 12 Mar 2013 08:57:22 +0100 Subject: [Catalog-sig] A 90% Solution In-Reply-To: References: <513E76AE.10601@egenix.com> Message-ID: <513EDFE2.2000907@egenix.com> On 12.03.2013 03:46, PJ Eby wrote: > On Mon, Mar 11, 2013 at 8:28 PM, M.-A. Lemburg wrote: >> On 12.03.2013 00:39, Donald Stufft wrote: >>> >>> On Mar 11, 2013, at 7:04 PM, PJ Eby wrote: >>> >>>> Just a thought, but... >>>> >>>> If 90% of PyPI projects do not have any external files to download, >>>> then, wouldn't it make sense to: >>> >>> To be accurate it's 90% don't have any files/release available *only* externally. Most have external files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't. >> >> How are you going to verify that disabling the links >> on those projects won't make certain release versions of >> those packages unavailable for pip/easy_install ? > > I'm not sure if you're asking Donald or me here. I was asking Donald, since he came up with the list. Given that he was using the pip PackageFinder, it is not clear whether this actually covers all easy_install'able packages as well (most likely not, since pip doesn't support e.g. egg files). > My proposal was to > only automatically disable the rel attributes for links to pages that > do *not* contain any easy_install or pip-able download links. So, by > definition, this would not make any releases unavailable. Ok. > As for what Donald is proposing, I honestly have no idea what he's > talking about, or whether the 90% statistic actually applies for what > I'm proposing. > > So it's possible that it might be a lot less than 90% that my proposal > would be able to affect *instantly*, without contacting any authors. We'd still need to inform authors that we changed a setting in their package, since they may want to use the feature to host packages or releases off-PyPI again in the future. >> How are you planing to inform the package authors of that >> change, so that they can take corrective action ? >> >> Which options would be available for authors ? > > Do see my proposal again, which was simply that there be a switch to > enable or disable the rel attributes, that it default off for new > packages, and be switched to off for exactly that set of packages > which would not result in the loss of access to any download files. Yes, I saw that, but was putting up the questions in the context of Donald's idea to remove the links altogether. > There is, at this point, the question of how to handle projects that > have some of their releases hosted externally, or with some of the > files external and some not. I would prefer that any automated > changeover apply only to packages where the set of discoverable links > is exactly equal to the links found on the project's /simple page. That would be safer, yes. >> Regarding the links, it's probably better to not >> remove the rel="" attributes but instead change them >> from rel="download" to e.g. rel="external-download"; >> or to keep the old index semantics around as /simple-v1/. >> This keeps the valuable semantic relation available for >> tools that want to use it. > > For what? If you must keep them, rel="disabled-homepage" etc. would > get the message across. But I really don't see the point, and I > *invented* the bloody things. True, but they are now part of the PyPI API and thus cannot be changed or removed easily. The rel="" attributes provide extra information to tools using the /simple/ index as (static) API and losing such information would break the API. You're only thinking about installers using the /simple/ API, but there may very well also be e.g. researchers interested in scanning the index for homepages to find out where Python software lives, how the community is connected, which preferences for hosting and developing Python software there are, etc. That's a different context and in that context, the rel="" attributes play a different role. Removing them would make such research impossible to implement using the /simple/ index and researchers would have to either go with the XML-RPC API (which is slow compared to /simple/, puts a lot of load on the PyPI server and cannot be placed on a CDN) or revert to the old-style scanning of the PyPI package pages. > Frankly, I'm more than prepared to toss the rel attributes altogether, > after adequate notice is given for people to move their files or links > to the files. I just don't want any changes in the *rest* of the > /simple generation algorithm. See above. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 12 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From holger at merlinux.eu Tue Mar 12 09:21:18 2013 From: holger at merlinux.eu (holger krekel) Date: Tue, 12 Mar 2013 08:21:18 +0000 Subject: [Catalog-sig] A 90% Solution In-Reply-To: References: Message-ID: <20130312082118.GY9677@merlinux.eu> On Mon, Mar 11, 2013 at 19:04 -0400, PJ Eby wrote: > Just a thought, but... > > If 90% of PyPI projects do not have any external files to download, > then, wouldn't it make sense to: sidenote: we need to verify and clarify the 90/10 ratio. It would be the basis for action/changing pypi-state so we need to have this accurate and double-checked. > 1. Add a project-level option to enable or disable the adding of the > rel="" attribute to /simple links (but not affecting the links in any > other way) > 2. Default it to disabled for new projects, and > 3. Set it to disabled *now* for the 90% of projects that *don't have > external files*? > > If the arguments about banning external links are as valid and > important as some people claim, wouldn't it make sense to do this part > *now*, without first requiring a commitment to force the switch to a > disabled state in the future? Pre-announcing the step to maintainers is good communication style. There is always the issue of bugs in your determination of "external hosting" or tools that rely on "rel" attributes without us knowing etc. > Immediately, 90% of the problem goes away - no random spidering of > stuff that doesn't contain a link now, but which could be taken over > by a malicious party in the future, and 90% fewer sites having to be > up in order for you to build something from PyPI. > > Seems like a serious win to me -- and one that might not even need a PEP. Yes and no: a PEP-like document is a good place to point people to. > Next steps after this would be providing tools to help people move > their files and links, promoting that people switch it off if they no > longer support the offsite links, educating about security concerns, > etc. > > I really don't understand why the 90% solution isn't *already* the > consensus position, since it doesn't preclude follow-on efforts > towards reducing the 10% towards 0%. > > And if the problem is so important, why must we keep 90% of the > problems in place, just so we can keep arguing about censoring the > 10%? That doesn't make sense to me. The idea for only changing the pypi-server side only evolved last week - so we are not that slow in moving on here :) cheers, holger > > To me, if somebody's injured, the first thing you do is clean and > close the wound, not argue about whether it's a complete solution and > what might happen days or weeks later. > > Just a thought. > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From jnoller at gmail.com Tue Mar 12 10:14:07 2013 From: jnoller at gmail.com (Jesse Noller) Date: Tue, 12 Mar 2013 05:14:07 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> Message-ID: On Mar 12, 2013, at 1:25 AM, Lennart Regebro wrote: > On Mon, Mar 11, 2013 at 8:57 PM, PJ Eby wrote: >> On Mon, Mar 11, 2013 at 1:45 PM, Lennart Regebro wrote: >>> So, we should not remove the links for external packages until >>> somebody traverses those links? But as soon as somebody asks for those >>> links, we should remove them? In fact before we give them the link? >> >> I'm saying that if someone objects to the presence of links they >> don't actually use, they are speaking nonsense. Might as well ask to >> ban all packages from PyPI that they don't personally like -- it's the >> same request. Nobody is forcing you to depend on packages that don't >> host on PyPI, so there is no point to the censorship. >> >> If you don't use the links, you can't argue that their presence is >> causing you harm. > > Externally hosted files are a real world actual problem. We can only > solve it by not having externally hosted files. This discussion has > since a long time gone past reason into pure stop energy. I'm not > wasting more energy on it. > > //Lennart > Likewise. I'd like to see a pull requests cleaning things up in a reasonable way we can discuss with Richard at pycon From jnoller at gmail.com Tue Mar 12 10:20:11 2013 From: jnoller at gmail.com (Jesse Noller) Date: Tue, 12 Mar 2013 05:20:11 -0400 Subject: [Catalog-sig] A 90% Solution In-Reply-To: <513EDFE2.2000907@egenix.com> References: <513E76AE.10601@egenix.com> <513EDFE2.2000907@egenix.com> Message-ID: <2A3B41AD-BDF2-481A-8830-F6E26E1D17BC@gmail.com> On Mar 12, 2013, at 3:57 AM, "M.-A. Lemburg" wrote: > On 12.03.2013 03:46, PJ Eby wrote: >> On Mon, Mar 11, 2013 at 8:28 PM, M.-A. Lemburg wrote: >>> On 12.03.2013 00:39, Donald Stufft wrote: >>>> >>>> On Mar 11, 2013, at 7:04 PM, PJ Eby wrote: >>>> >>>>> Just a thought, but... >>>>> >>>>> If 90% of PyPI projects do not have any external files to download, >>>>> then, wouldn't it make sense to: >>>> >>>> To be accurate it's 90% don't have any files/release available *only* externally. Most have external files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't. >>> >>> How are you going to verify that disabling the links >>> on those projects won't make certain release versions of >>> those packages unavailable for pip/easy_install ? >> >> I'm not sure if you're asking Donald or me here. > > I was asking Donald, since he came up with the list. Given that > he was using the pip PackageFinder, it is not clear whether this > actually covers all easy_install'able packages as well (most likely > not, since pip doesn't support e.g. egg files). > >> My proposal was to >> only automatically disable the rel attributes for links to pages that >> do *not* contain any easy_install or pip-able download links. So, by >> definition, this would not make any releases unavailable. > > Ok. > >> As for what Donald is proposing, I honestly have no idea what he's >> talking about, or whether the 90% statistic actually applies for what >> I'm proposing. >> >> So it's possible that it might be a lot less than 90% that my proposal >> would be able to affect *instantly*, without contacting any authors. > > We'd still need to inform authors that we changed a setting > in their package, since they may want to use the feature > to host packages or releases off-PyPI again in the future. > >>> How are you planing to inform the package authors of that >>> change, so that they can take corrective action ? >>> >>> Which options would be available for authors ? >> >> Do see my proposal again, which was simply that there be a switch to >> enable or disable the rel attributes, that it default off for new >> packages, and be switched to off for exactly that set of packages >> which would not result in the loss of access to any download files. > > Yes, I saw that, but was putting up the questions in the context > of Donald's idea to remove the links altogether. > >> There is, at this point, the question of how to handle projects that >> have some of their releases hosted externally, or with some of the >> files external and some not. I would prefer that any automated >> changeover apply only to packages where the set of discoverable links >> is exactly equal to the links found on the project's /simple page. > > That would be safer, yes. > >>> Regarding the links, it's probably better to not >>> remove the rel="" attributes but instead change them >>> from rel="download" to e.g. rel="external-download"; >>> or to keep the old index semantics around as /simple-v1/. >>> This keeps the valuable semantic relation available for >>> tools that want to use it. >> >> For what? If you must keep them, rel="disabled-homepage" etc. would >> get the message across. But I really don't see the point, and I >> *invented* the bloody things. > > True, but they are now part of the PyPI API and thus cannot be > changed or removed easily. > > The rel="" attributes provide extra information to tools > using the /simple/ index as (static) API and losing such > information would break the API. > > You're only thinking about installers using the /simple/ > API, but there may very well also be e.g. researchers interested > in scanning the index for homepages to find out where Python > software lives, how the community is connected, which > preferences for hosting and developing Python software > there are, etc. > > That's a different context and in that context, the rel="" > attributes play a different role. > > Removing them would make such research impossible to implement > using the /simple/ index and researchers would have to either go > with the XML-RPC API (which is slow compared to /simple/, puts a > lot of load on the PyPI server and cannot be placed on a CDN) > or revert to the old-style scanning of the PyPI package pages. > So because of hypothetical researchers we can't make the system better. >> Frankly, I'm more than prepared to toss the rel attributes altogether, >> after adequate notice is given for people to move their files or links >> to the files. I just don't want any changes in the *rest* of the >> /simple generation algorithm. > > See above. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 12 2013) >>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig From mal at egenix.com Tue Mar 12 10:50:23 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 12 Mar 2013 10:50:23 +0100 Subject: [Catalog-sig] A 90% Solution In-Reply-To: <2A3B41AD-BDF2-481A-8830-F6E26E1D17BC@gmail.com> References: <513E76AE.10601@egenix.com> <513EDFE2.2000907@egenix.com> <2A3B41AD-BDF2-481A-8830-F6E26E1D17BC@gmail.com> Message-ID: <513EFA5F.5000302@egenix.com> On 12.03.2013 10:20, Jesse Noller wrote: > > > On Mar 12, 2013, at 3:57 AM, "M.-A. Lemburg" wrote: > >> On 12.03.2013 03:46, PJ Eby wrote: >>> On Mon, Mar 11, 2013 at 8:28 PM, M.-A. Lemburg wrote: >>>> On 12.03.2013 00:39, Donald Stufft wrote: >>>>> >>>>> On Mar 11, 2013, at 7:04 PM, PJ Eby wrote: >>>>> >>>>>> Just a thought, but... >>>>>> >>>>>> If 90% of PyPI projects do not have any external files to download, >>>>>> then, wouldn't it make sense to: >>>>> >>>>> To be accurate it's 90% don't have any files/release available *only* externally. Most have external files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't. >>>> >>>> How are you going to verify that disabling the links >>>> on those projects won't make certain release versions of >>>> those packages unavailable for pip/easy_install ? >>> >>> I'm not sure if you're asking Donald or me here. >> >> I was asking Donald, since he came up with the list. Given that >> he was using the pip PackageFinder, it is not clear whether this >> actually covers all easy_install'able packages as well (most likely >> not, since pip doesn't support e.g. egg files). >> >>> My proposal was to >>> only automatically disable the rel attributes for links to pages that >>> do *not* contain any easy_install or pip-able download links. So, by >>> definition, this would not make any releases unavailable. >> >> Ok. >> >>> As for what Donald is proposing, I honestly have no idea what he's >>> talking about, or whether the 90% statistic actually applies for what >>> I'm proposing. >>> >>> So it's possible that it might be a lot less than 90% that my proposal >>> would be able to affect *instantly*, without contacting any authors. >> >> We'd still need to inform authors that we changed a setting >> in their package, since they may want to use the feature >> to host packages or releases off-PyPI again in the future. >> >>>> How are you planing to inform the package authors of that >>>> change, so that they can take corrective action ? >>>> >>>> Which options would be available for authors ? >>> >>> Do see my proposal again, which was simply that there be a switch to >>> enable or disable the rel attributes, that it default off for new >>> packages, and be switched to off for exactly that set of packages >>> which would not result in the loss of access to any download files. >> >> Yes, I saw that, but was putting up the questions in the context >> of Donald's idea to remove the links altogether. >> >>> There is, at this point, the question of how to handle projects that >>> have some of their releases hosted externally, or with some of the >>> files external and some not. I would prefer that any automated >>> changeover apply only to packages where the set of discoverable links >>> is exactly equal to the links found on the project's /simple page. >> >> That would be safer, yes. >> >>>> Regarding the links, it's probably better to not >>>> remove the rel="" attributes but instead change them >>>> from rel="download" to e.g. rel="external-download"; >>>> or to keep the old index semantics around as /simple-v1/. >>>> This keeps the valuable semantic relation available for >>>> tools that want to use it. >>> >>> For what? If you must keep them, rel="disabled-homepage" etc. would >>> get the message across. But I really don't see the point, and I >>> *invented* the bloody things. >> >> True, but they are now part of the PyPI API and thus cannot be >> changed or removed easily. >> >> The rel="" attributes provide extra information to tools >> using the /simple/ index as (static) API and losing such >> information would break the API. >> >> You're only thinking about installers using the /simple/ >> API, but there may very well also be e.g. researchers interested >> in scanning the index for homepages to find out where Python >> software lives, how the community is connected, which >> preferences for hosting and developing Python software >> there are, etc. >> >> That's a different context and in that context, the rel="" >> attributes play a different role. >> >> Removing them would make such research impossible to implement >> using the /simple/ index and researchers would have to either go >> with the XML-RPC API (which is slow compared to /simple/, puts a >> lot of load on the PyPI server and cannot be placed on a CDN) >> or revert to the old-style scanning of the PyPI package pages. >> > > So because of hypothetical researchers we can't make the system better. Of course we can, but just like with Python itself, we have to pay attention to backwards compatibility. Not hard to do: we'd just need to keep the old index in place using a different URL, e.g. /simple-v1/. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 12 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From holger at merlinux.eu Tue Mar 12 12:38:17 2013 From: holger at merlinux.eu (holger krekel) Date: Tue, 12 Mar 2013 11:38:17 +0000 Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI Message-ID: <20130312113817.GA9677@merlinux.eu> Hi all, below is the new PEP pre-submit version (V2) which incorporates the latest suggestions and aims at a rapidly deployable solution. Thanks in particular to Philip, Donald and Marc-Andre. I also added a few notes on how installers should behave with respect to non-PYPI crawling. I think a PEP like doc is warranted and that we should not silently change things without proper communication to maintainers and pre-planning the implementation/change process. Arguably, the changes are more invasive than "oh, let's just do a http->https redirect" which didn't work too well either. Now, if there is some agreement, i can submit this PEP officially tomorrow, and given agreement/refinments from the Pycon folks and the likes of Richard, we may be able to get going very shortly after Pycon. cheers, holger PEP-draft: transitioning to release-file hosting on PYPI ==================================================================== Status ----------- PRE-SUBMIT-v2 Abstract ------------ This PEP proposes a backward-compatible transition process to speed up, simplify and robustify installing from the pypi.python.org (PYPI) package index. The initial transition will put most packages on PYPI automatically in a configuration mode which will prevent client-side crawling from installers. To ease automatic transition and minimize client-side friction, **no changes to distutils or installation tools** are required. Instead, the transition is implemented by modifying PYPI to serve links from ``simple/`` pages in a configurable way, preventing or allowing crawling of non-PYPI sites for detecting release files. Maintainers of all PYPI packages will be notified ahead of those changes. Maintainers of packages which currently are hosted on non-PYPI sites shall receive instructions and tools to ease "re-hosting" of their historic and future package release files. The implementation of such tools is NOT required for implementing the initial automatic transition. Installation tools like pip and easy_install shall warn about crawling non-PYPI sites and later default to disallow it and only allow it with an explicit option. History and motivations for external hosting ------------------------------------------------ When PYPI went online, it offered release registration but had no facility to host release files itself. When hosting was added, no automated downloading tool existed yet. When Philip Eby implemented automated downloading (through setuptools), he made the choice to allow people to use download hosts of their choice. This was implemented by the PYPI ``simple/`` index containing links of type ``rel=homepage`` or ``rel=download`` which are crawled by installation tools to discover package links. As of March 2013, a substantial part of packages (estimated to about 10%) make use of this mechanism to host files on github, bitbucket, sourceforge or own hosting sites like ``mercurial.selenic.com``, to just name a few. There are many reasons [2]_ why people choose to use external hosting, to cite just a few: - release processes and scripts have been developed already and upload to external sites - it takes too long to upload large files from some places in the world - export restrictions e.g. for crypto-related software - company policies which prescribe offering open source packages through own sites - problems with integrating uploading to PYPI into one's release process (because of release policies) - perceived bad reliability of PYPI - missing knowlege you can upload files Irrespective of the present-day validity of these reasons, there clearly is a history why people choose to host files externally and it even was for some time the only way you could do things. Problem --------------- **Today, python package installers (pip and easy_install) often need to query non-PYPI sites even if there are no externally hosted files**. Apart from querying pypi.python.org's simple index pages, also all homepages and download pages ever specified with any release of a package are crawled by an installer. The need for installers to crawl 3rd party sites slows down installation and makes for a brittle unreliable installation process. Those sites and packages also don't take part in the :pep:`381` mirroring infrastructure, further decreasing reliability and speed of automated installation processes around the world. Roughly 90% of packages are hosted directly on pypi.python.org [1]_. Even for them installers still need to crawl the homepage(s) of a package. Many package uploaders are particularly not aware that specifying the "homepage" in their release process will slow down the installation process for all its users. Relying on third party sites also opens up more attack vectors for injecting malicious packages into sites using automated installs. A simple attack might just involve getting hold of an old now-unused homepage domain and placing mailicious packages there. Moreover, performing a Man-in-The-Middle (MITM) attack between an installation site and any of the download sites can inject mailicious packages on the installation site. As many homepages and download locations are using HTTP and not proper HTTPS, such attacks are not very hard to launch. Such MITM attacks can happen even for packages which never intended to host files externally as their homepages are contacted by installers anyway. There is currently no way for package maintainers to avoid 3rd party crawling, other than removing all homepage/download url metadata for all historic releases. While a script [3]_ has been written to perform this action, it is not a good general solution because it removes semantic information like the "homepage" specification from PYPI packages. Solution ----------- The proposed solution consists of the following implementation and communication steps: - determine which packages have releases files only on PYPI (group A) and which have externally hosted release files (group B). - Prepare PYPI implementation to allow a per-project "hosting mode", effectively enabling or disabling external crawling. When enabled nothing changes from the current situation of producing ``rel=download`` and ``rel=homepage`` attributed links on ``simple/`` pages, causing installers to crawl those sites. When disabled, the attributions of links will change to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to avoid crawling 3rd party sites. Retaining the meta-information allows tools to still make use of the semantic information. - send mail to maintainers of A that their project is going to be automatically configured to "disable crawling" in one week and encourage them to set this mode earlier to help all of their users. - send mail to maintainers of B that their package hosting mode is "crawling enabled", and list the sites which currently are crawled, and suggest that they re-host their packages directly on PYPI and then switch the hosting-mode "disable crawling". Provide instructions and at best tools to help with this "re-uploading" process. In addition, maintainers of installation tools are asked to release two updates. The first one shall provide clear warnings if external crawling needs to happen, for which projects and URLS exactly this happens, and that in the future crawling will be disabled by default. The next update shall change the default to disallow crawling and allow crawling only with an explicit option like ``--crawl-externals`` and another option allowing to limit which hosts are allowed to be crawled at all. Hosting-Mode state transitions ---------------------------------- 1. At the outset, we set hosting-mode to "notset" for all packages. This will not change any link served via the simple index and thus no bad effects are expected. Early adopters and testers may now change the mode to either "crawl" or "nocrawl" to help with streamlining issues in the PYPI implementation. 2. When maintainers of B packages are mailed their mode is directly set to "crawl". 3. When maintainers of A are mailed we leave the mode at "notset" to allow people to change it to "nocrawl" themselves or to set it to "crawl" if they think they are wrongly in the "A" group. After a week all "notset" modes are set to "nocrawl". A week after the mailings all packages will be in "crawl" or "nocrawl" hosting mode. It is then a matter of good tools and reaching out to maintainers of B packages to increase the A/B ratio. Open questions ---------------------- - Should the support tools for "rehosting" packages be implemented on the server side or on the client side? Implementing it on the client side probably is quicker to get right and less fatal in terms of failures. - double-check if ``rel=newhomepage`` and ``rel=newdownload`` cause the desired behaviour of pip and easy_install (both the distribute and setuptools based one) to not crawl those pages. - are the "support tools" for re-hosting outside the scope of this PEP? - Think some more about pip/easy_install "allow-hosts" mode etc. References ------------ .. [1] Donald Stufft, ratio of externally hosted versus pypi-hosted, http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html .. [2] Marc-Andre Lemburg, reasons for external hosting, http://mail.python.org/pipermail/catalog-sig/2013-March/005626.html .. [3] Holger Krekel, Script to remove homepage/download metadata for all releases http://mail.python.org/pipermail/catalog-sig/2013-February/005423.html Acknowledgments ---------------------- Philip Eby for precise information and the basic ideas to implement the transition via server-side changes only. Donald Stufft for pushing away from external hosting and doing the 90/10 % statistics script and offering to implement a PR. Marc-Andre Lemburg, Nick Coghlan and catalog-sig for thinking through issues regarding getting rid of "external hosting". Copyright ----------------- This document has been placed in the public domain. From ncoghlan at gmail.com Tue Mar 12 16:19:32 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 13 Mar 2013 01:19:32 +1000 Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI In-Reply-To: <20130312113817.GA9677@merlinux.eu> References: <20130312113817.GA9677@merlinux.eu> Message-ID: That looks pretty good to me. My only comment is that qualifiers like "new" don't age well in an API. The explicit "nocrawlhomepage" and "nocrawldownload" might be a better choice. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Tue Mar 12 16:28:54 2013 From: pje at telecommunity.com (PJ Eby) Date: Tue, 12 Mar 2013 11:28:54 -0400 Subject: [Catalog-sig] A 90% Solution In-Reply-To: <513EFA5F.5000302@egenix.com> References: <513E76AE.10601@egenix.com> <513EDFE2.2000907@egenix.com> <2A3B41AD-BDF2-481A-8830-F6E26E1D17BC@gmail.com> <513EFA5F.5000302@egenix.com> Message-ID: On Tue, Mar 12, 2013 at 5:50 AM, M.-A. Lemburg wrote: > Not hard to do: we'd just need to keep the old index in place > using a different URL, e.g. /simple-v1/. That's not necessary: the XML-RPC API lets you query those URLs directly. They're part of the metadata standard, after all... which means you can *also* access them by downloading the DOAP records, browsing the PyPI pages directly, etc. There are plenty of ways to get that data, no point adding another one. From pje at telecommunity.com Tue Mar 12 16:38:22 2013 From: pje at telecommunity.com (PJ Eby) Date: Tue, 12 Mar 2013 11:38:22 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> Message-ID: On Tue, Mar 12, 2013 at 1:25 AM, Lennart Regebro wrote: > Externally hosted files are a real world actual problem. You're leaving out some important words from that sentence. Words like, "for some people" and "who choose to depend on projects using them". PyPI isn't your private personal playground. Other people have rights, too. > This discussion has since a long time gone past reason into pure stop energy. I agree - hardly anyone is giving any reasoning that justifies why one group of people should have their projects censored to benefit a few blowhards on Catalog-SIG. Carl's the only person who's even *tried* giving a justification. Everyone else just shuts up or changes the subject when I ask that question. I'll ask it again: why should *thousands* of projects be censored or made to change their release processes, because *you* can't be bothered to cache the distributions of the projects you depend on? Not, why would it be a good idea for them to change anyway. Why should they be *forced* to do it? Bonus points: answer why, *every time* somebody proposes a way of improving things that doesn't *ban* external hosting, you guys go all stop energy on that and derail the discussion with why it has to be total. AFAICT, you're the ones stopping things moving forward here, filibustering against every possible compromise. From jacob at jacobian.org Tue Mar 12 16:42:28 2013 From: jacob at jacobian.org (Jacob Kaplan-Moss) Date: Tue, 12 Mar 2013 10:42:28 -0500 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> Message-ID: On Tue, Mar 12, 2013 at 10:38 AM, PJ Eby wrote: > I'll ask it again: why should *thousands* of projects be censored or > made to change their release processes, because *you* can't be > bothered to cache the distributions of the projects you depend on? Because externally-hosted files are a security risk, one that most users don't realize exists. We can either fix this problem now, or we can wait until someone is compromised using PyPI as a vector. Jacob From jacob at jacobian.org Tue Mar 12 16:44:17 2013 From: jacob at jacobian.org (Jacob Kaplan-Moss) Date: Tue, 12 Mar 2013 10:44:17 -0500 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> Message-ID: On Tue, Mar 12, 2013 at 10:38 AM, PJ Eby wrote: > AFAICT, you're the ones stopping things moving forward here, > filibustering against every possible compromise. Sorry, one more thing: I'm interested in what your comprise would be. Can you write up a counter-proposal to Holger's? Jacob From pje at telecommunity.com Tue Mar 12 16:53:10 2013 From: pje at telecommunity.com (PJ Eby) Date: Tue, 12 Mar 2013 11:53:10 -0400 Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI In-Reply-To: <20130312113817.GA9677@merlinux.eu> References: <20130312113817.GA9677@merlinux.eu> Message-ID: On Tue, Mar 12, 2013 at 7:38 AM, holger krekel wrote: > In addition, maintainers of installation tools are asked to release > two updates. The first one shall provide clear warnings if external > crawling needs to happen, A clarification here: "needs to happen" is not well-specified. An installer tasked with finding the latest or best-matching version of a package must currently *always* crawl. So the warning would be always. The strategy I originally chose for making this change in easy_install is to warn once at the beginning that --allow-hosts has not been set, and thus packages might be downloaded from anywhere on the internet. I've since become uncertain that this change is actually workable in the short term, since until most of the packages are actually moved onto PyPI, a lot of installs will fail if somebody changes their configuration to be more secure. So I'm thinking the warning needs to be deferred until at least the more popular packages have moved to PyPI. > Now, if there is some agreement, i can submit this PEP officially tomorrow, > and given agreement/refinments from the Pycon folks and the likes of > Richard, we may be able to get going very shortly after Pycon. I'd like to suggest that the PEP should be explicit that no other changes to the /simple generation algorithm are being made, just the removal or alteration of rel="" attributes. i.e., it will still be possible -- at least in the near term -- for projects to include explicit download links to files made available elsewhere. Changing that situation is more controversial and will require wider community participation than has occurred to date. It might also be good to suggest that authors of PyPI clones plan their own phase-out of rel="" attributes. From m.van.rees at zestsoftware.nl Tue Mar 12 17:04:52 2013 From: m.van.rees at zestsoftware.nl (Maurits van Rees) Date: Tue, 12 Mar 2013 17:04:52 +0100 Subject: [Catalog-sig] Inconsistency on f.pypi.python.org with Products.PluggableAuthService In-Reply-To: References: Message-ID: Op 05-03-13 16:34, Christian Theune schreef: > Hi, > > > it seems my fight to keep f.pypi.python.org is at least keeping the > pypi-mirrors.org page happy. > > > Unfortunately one ouf our users detected another inconsistency that the > mirror script doesn't find or clean up by itself. I also don't know how > to get this back in line. > > > If you compare those pages: > > > http://f.pypi.python.org/packages/source/P/Products.PluggableAuthService/ > > http://f.pypi.python.org/simple/Products.PluggableAuthService > > http://pypi.python.org/simple/Products.PluggableAuthService > > > > There's definitely something wrong. > > > Suggestions? I meant to look at this earlier, as I noticed it too. Apparently it has not solved itself. The latest release is 1.10.0, which was uploaded on 19 February, which is the day that PyPI switched to https. My guess is that some mirrors did an update at a point in time when PyPI had problems because of this switch and that those mirrors somehow got affected by this. Let's look at the state of the various mirros=rs. http://a.pypi.python.org is perfect. http://b.pypi.python.org says "Package Products.PluggableAuthService does not exist", which should not be true, as this package has existed for years. Also, http://b.pypi.python.org/packages/source/P/ does list Products.PluggableAuthService, but that page has an empty html body. http://c.pypi.python.org/simple/Products.PluggableAuthService says: "The requested URL /simple/Products.PluggableAuthService/ was not found on this server." http://c.pypi.python.org/packages/source/P/Products.PluggableAuthService/ does exist and lists all except the last release. d.pypi.python.org is unavailable. e.pypi.python.org is perfect. f.pypi.python.org: same as c. g.pypi.python.org works, but has not been updated in over a month so it misses the latest release. I guess this one would work if it got updated again. pypi.crate.io is fine. So b, c and f have a problem. http://www.pypi-mirrors.org lists these respectively as old, aging and fresh. If anyone knows what could be done to solve this, that would be good. -- Maurits van Rees: http://maurits.vanrees.org/ Zest Software: http://zestsoftware.nl From mal at egenix.com Tue Mar 12 17:06:26 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 12 Mar 2013 17:06:26 +0100 Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI In-Reply-To: <20130312113817.GA9677@merlinux.eu> References: <20130312113817.GA9677@merlinux.eu> Message-ID: <513F5282.3010206@egenix.com> On 12.03.2013 12:38, holger krekel wrote: > Hi all, > > below is the new PEP pre-submit version (V2) which incorporates the > latest suggestions and aims at a rapidly deployable solution. Thanks in > particular to Philip, Donald and Marc-Andre. I also added a few notes > on how installers should behave with respect to non-PYPI crawling. > > I think a PEP like doc is warranted and that we should not silently > change things without proper communication to maintainers and pre-planning > the implementation/change process. Arguably, the changes are more > invasive than "oh, let's just do a http->https redirect" which didn't > work too well either. > > Now, if there is some agreement, i can submit this PEP officially tomorrow, > and given agreement/refinments from the Pycon folks and the likes of > Richard, we may be able to get going very shortly after Pycon. > > cheers, > holger > > > PEP-draft: transitioning to release-file hosting on PYPI > ==================================================================== > > Status > ----------- > > PRE-SUBMIT-v2 > > Abstract > ------------ > > This PEP proposes a backward-compatible transition process to speed up, > simplify and robustify installing from the pypi.python.org (PYPI) > package index. The initial transition will put most packages on PYPI > automatically in a configuration mode which will prevent client-side > crawling from installers. To ease automatic transition and minimize > client-side friction, **no changes to distutils or installation tools** are > required. Instead, the transition is implemented by modifying PYPI to > serve links from ``simple/`` pages in a configurable way, preventing or > allowing crawling of non-PYPI sites for detecting release files. > Maintainers of all PYPI packages will be notified ahead of those > changes. > > Maintainers of packages which currently are hosted on non-PYPI sites > shall receive instructions and tools to ease "re-hosting" of their > historic and future package release files. The implementation of such > tools is NOT required for implementing the initial automatic transition. > > Installation tools like pip and easy_install shall warn about crawling > non-PYPI sites and later default to disallow it and only allow it with > an explicit option. > > > History and motivations for external hosting > ------------------------------------------------ > > When PYPI went online, it offered release registration but had no > facility to host release files itself. When hosting was added, no > automated downloading tool existed yet. When Philip Eby implemented > automated downloading (through setuptools), he made the choice > to allow people to use download hosts of their choice. This was > implemented by the PYPI ``simple/`` index containing links of type > ``rel=homepage`` or ``rel=download`` which are crawled by installation > tools to discover package links. As of March 2013, a substantial part > of packages (estimated to about 10%) make use of this mechanism to host > files on github, bitbucket, sourceforge or own hosting sites like > ``mercurial.selenic.com``, to just name a few. > > There are many reasons [2]_ why people choose to use external hosting, > to cite just a few: > > - release processes and scripts have been developed already and > upload to external sites > > - it takes too long to upload large files from some places in the world > > - export restrictions e.g. for crypto-related software > > - company policies which prescribe offering open source packages through > own sites > > - problems with integrating uploading to PYPI into one's release process > (because of release policies) > > - perceived bad reliability of PYPI > > - missing knowlege you can upload files > > Irrespective of the present-day validity of these reasons, there clearly > is a history why people choose to host files externally and it even was > for some time the only way you could do things. > > > Problem > --------------- > > **Today, python package installers (pip and easy_install) often need to > query non-PYPI sites even if there are no externally hosted files**. > Apart from querying pypi.python.org's simple index pages, also all > homepages and download pages ever specified with any release of a > package are crawled by an installer. The need for installers to > crawl 3rd party sites slows down installation and makes for a brittle > unreliable installation process. Those sites and packages also don't > take part in the :pep:`381` mirroring infrastructure, further decreasing > reliability and speed of automated installation processes around the world. > > Roughly 90% of packages are hosted directly on pypi.python.org [1]_. > Even for them installers still need to crawl the homepage(s) of a > package. Many package uploaders are particularly not aware that > specifying the "homepage" in their release process will slow down > the installation process for all its users. > > Relying on third party sites also opens up more attack vectors > for injecting malicious packages into sites using automated installs. > A simple attack might just involve getting hold of an old now-unused > homepage domain and placing mailicious packages there. Moreover, > performing a Man-in-The-Middle (MITM) attack between an installation > site and any of the download sites can inject mailicious packages on the > installation site. As many homepages and download locations are using > HTTP and not proper HTTPS, such attacks are not very hard to launch. > Such MITM attacks can happen even for packages which never intended to > host files externally as their homepages are contacted by installers > anyway. > > There is currently no way for package maintainers to avoid 3rd party > crawling, other than removing all homepage/download url metadata > for all historic releases. While a script [3]_ has been written to > perform this action, it is not a good general solution because it removes > semantic information like the "homepage" specification from PYPI packages. > > > Solution > ----------- > > The proposed solution consists of the following implementation and > communication steps: > > - determine which packages have releases files only on PYPI (group A) > and which have externally hosted release files (group B). > > - Prepare PYPI implementation to allow a per-project "hosting mode", > effectively enabling or disabling external crawling. When enabled > nothing changes from the current situation of producing ``rel=download`` > and ``rel=homepage`` attributed links on ``simple/`` pages, > causing installers to crawl those sites. > When disabled, the attributions of links will change > to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to > avoid crawling 3rd party sites. Retaining the meta-information allows > tools to still make use of the semantic information. Please start using versioned APIs for these things. The old style index should still be available under some URL, e.g. /simple-v1/ or /v1/simple/ or /1/simple/ > - send mail to maintainers of A that their project is going to be > automatically configured to "disable crawling" in one week > and encourage them to set this mode earlier to help all of > their users. One week ? That's a somewhat unrealistic timeframe. I'm also missing some real-life tests to see what the effect are on actual users, e.g. setup the new index using a URL /simple-v2/ and let users play with it for a month before making /simple/ == /simple-v2/. > - send mail to maintainers of B that their package hosting mode > is "crawling enabled", and list the sites which currently are crawled, > and suggest that they re-host their packages directly on PYPI and > then switch the hosting-mode "disable crawling". Provide instructions > and at best tools to help with this "re-uploading" process. That email should clearly state the PyPI terms to not cause surprises among the maintainers. I'd wait with this step until we've sorted out the PyPI terms issues on the python-legal list, to not cause a an uproar from people who get to read the terms for the first time ;-) > In addition, maintainers of installation tools are asked to release > two updates. The first one shall provide clear warnings if external > crawling needs to happen, for which projects and URLS exactly > this happens, and that in the future crawling will be disabled by default. > The next update shall change the default to disallow crawling and allow > crawling only with an explicit option like ``--crawl-externals`` and > another option allowing to limit which hosts are allowed to be crawled > at all. AFAIK, both already exist in easy_install. Not sure about pip. They are not enable per default, though. > Hosting-Mode state transitions > ---------------------------------- > > 1. At the outset, we set hosting-mode to "notset" for all packages. > This will not change any link served via the simple index and thus > no bad effects are expected. Early adopters and testers may now > change the mode to either "crawl" or "nocrawl" to help with > streamlining issues in the PYPI implementation. > > 2. When maintainers of B packages are mailed their mode is directly > set to "crawl". > > 3. When maintainers of A are mailed we leave the mode at "notset" to allow > people to change it to "nocrawl" themselves or to set it to "crawl" > if they think they are wrongly in the "A" group. After a week > all "notset" modes are set to "nocrawl". > > A week after the mailings all packages will be in "crawl" or "nocrawl" > hosting mode. It is then a matter of good tools and reaching out to > maintainers of B packages to increase the A/B ratio. > > Open questions > ---------------------- > > - Should the support tools for "rehosting" packages be implemented on the > server side or on the client side? Implementing it on the client > side probably is quicker to get right and less fatal in terms of failures. Not sure what you mean here. Your are also completely leaving out the idea to only cache distribution files on the PyPI CDN, without having to actually upload them. > - double-check if ``rel=newhomepage`` and ``rel=newdownload`` cause the > desired behaviour of pip and easy_install (both the distribute and > setuptools based one) to not crawl those pages. Indeed :-) Note that it will still be possible to add links to the distribution files in the long description of the package. Those links also show up on the /simple/ index page and will then get used, regardless of whether they have a rel attribute set or not. > - are the "support tools" for re-hosting outside the scope of this PEP? As with any PEP proposing an API change or a new API, it has to provide a reference implementation. The current distutils upload command is geared towards uploading files at release time. While it is possible to trick it into uploading existing distribution files, it is not at all obvious how this is done. > - Think some more about pip/easy_install "allow-hosts" mode etc. Note that tools such as zc.buildout provide easy ways of adding extra indexes and external URLs to scan for distribution files. I'm not sure how the above would fit such use cases, i.e. if setuptools were to stop crawling external links per default, this could mean that user hosted PyPI-style indexes stop working with newer releases. Here's an example list of indexes used in Plone 4.2: # Add additional egg download sources here. dist.plone.org contains archives # of Plone packages. find-links = http://dist.plone.org http://download.zope.org/ppix/ http://download.zope.org/distribution/ http://effbot.org/downloads http://dist.plone.org/release/4.2 None of these seem to use the rel attribute feature, so those will likely continue to work fine. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 12 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Tue Mar 12 17:19:34 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 12 Mar 2013 17:19:34 +0100 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> Message-ID: <513F5596.5090302@egenix.com> On 12.03.2013 16:42, Jacob Kaplan-Moss wrote: > On Tue, Mar 12, 2013 at 10:38 AM, PJ Eby wrote: >> I'll ask it again: why should *thousands* of projects be censored or >> made to change their release processes, because *you* can't be >> bothered to cache the distributions of the projects you depend on? > > Because externally-hosted files are a security risk, one that most > users don't realize exists. > > We can either fix this problem now, or we can wait until someone is > compromised using PyPI as a vector. We can fix this problem, yes, but we need to do this right and try not to break things. I don't see the need to rush this, just to address some perceived high risk. Files hosted on PyPI are just as risky to use as files on any other server. The only way to minimize the risk is by downloading all the packages you need, do reviews of all of them and each time a new release is published. If you then point your installers only to the repository where you keep your reviewed files, then you can feel safer. In reality, this doesn't happen, though, so a lot of the stuff we're talking about here is security theater, no matter how much crypto/signing/hashing/hosting/CDN we throw at it :-) So let's do this carefully and find a good solution before jumping to conclusions. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 12 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From holger at merlinux.eu Tue Mar 12 17:20:28 2013 From: holger at merlinux.eu (holger krekel) Date: Tue, 12 Mar 2013 16:20:28 +0000 Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI In-Reply-To: References: <20130312113817.GA9677@merlinux.eu> Message-ID: <20130312162028.GE9677@merlinux.eu> On Wed, Mar 13, 2013 at 01:19 +1000, Nick Coghlan wrote: > That looks pretty good to me. My only comment is that qualifiers like "new" > don't age well in an API. The explicit "nocrawlhomepage" and > "nocrawldownload" might be a better choice. Right, we might also consider dropping rel-attributing given that you can indeed access release metadata via the xmlrpc or json API. best, holger > Cheers, > Nick. From jacob at jacobian.org Tue Mar 12 17:29:45 2013 From: jacob at jacobian.org (Jacob Kaplan-Moss) Date: Tue, 12 Mar 2013 11:29:45 -0500 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <513F5596.5090302@egenix.com> References: <20130310150740.GE9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> Message-ID: On Tue, Mar 12, 2013 at 11:19 AM, M.-A. Lemburg wrote: > So let's do this carefully and find a good solution before > jumping to conclusions. Completely agreed; rushing is a bad idea. But so is not starting. What I'm seeing ? as a total outsider, a user of these tools, not someone who creates them ? is that a bunch of people (Holger, Donald, Richard, the pip maintainers, etc.) have the beginnings of a solution ready to go *right now*, and I want to capture that energy and enthusiasm before it evaporates. This isn't an academic situation; I've seen companies decline to adopt Python over this exact security issue. I can't share details in writing but ask me at PyCon and I can tell you some stories. Externally-hosted packages are a security risk, full stop. There's likely a even better solution involving strong cryptography and such, but there's also an incremental improvement on the table right now. Nobody's suggesting that we do this hastily or all at once, but there *is* a proposal to get the process started right now. Why shouldn't we get going while there's momentum? Jacob From holger at merlinux.eu Tue Mar 12 17:33:39 2013 From: holger at merlinux.eu (holger krekel) Date: Tue, 12 Mar 2013 16:33:39 +0000 Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI In-Reply-To: References: <20130312113817.GA9677@merlinux.eu> Message-ID: <20130312163339.GF9677@merlinux.eu> On Tue, Mar 12, 2013 at 11:53 -0400, PJ Eby wrote: > On Tue, Mar 12, 2013 at 7:38 AM, holger krekel wrote: > > In addition, maintainers of installation tools are asked to release > > two updates. The first one shall provide clear warnings if external > > crawling needs to happen, > > A clarification here: "needs to happen" is not well-specified. An > installer tasked with finding the latest or best-matching version of a > package must currently *always* crawl. So the warning would be > always. Not after the initial automatic PYPI transition. For the 90% of the packages you wouldn't see the warning then. > The strategy I originally chose for making this change in easy_install > is to warn once at the beginning that --allow-hosts has not been set, > and thus packages might be downloaded from anywhere on the internet. >From a UI perspective i'd like to see a summary of actually consulted but non-specified websites (including if it was http or https) at the very end of an installers output. With "non-specified" i mean sites that weren't specified as an indexserver or allow-host. > I've since become uncertain that this change is actually workable in > the short term, since until most of the packages are actually moved > onto PyPI, a lot of installs will fail if somebody changes their > configuration to be more secure. So I'm thinking the warning needs to > be deferred until at least the more popular packages have moved to > PyPI. I think it's fine to wait until after the initial "hosting-mode" transition. > > Now, if there is some agreement, i can submit this PEP officially tomorrow, > > and given agreement/refinments from the Pycon folks and the likes of > > Richard, we may be able to get going very shortly after Pycon. > > I'd like to suggest that the PEP should be explicit that no other > changes to the /simple generation algorithm are being made, just the > removal or alteration of rel="" attributes. i.e., it will still be > possible -- at least in the near term -- for projects to include > explicit download links to files made available elsewhere. Changing > that situation is more controversial and will require wider community > participation than has occurred to date. I kind of agree. To transition forward , we should leave out the question of further modifying the "simple/" pages at the moment. Mentioning that this means you can put "http://PKGNAME-VER.tar.gz" in your PKGNAME long_description or download_url metadata makes sense. For that, the installers will give warnings, however, and eventually change defaults according to the PEP draft. > It might also be good to suggest that authors of PyPI clones plan > their own phase-out of rel="" attributes. Most alternative servers i've seen don't use the "rel" attribution but it's good to mention it. best, holger From mal at egenix.com Tue Mar 12 17:41:31 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 12 Mar 2013 17:41:31 +0100 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> Message-ID: <513F5ABB.9030006@egenix.com> On 12.03.2013 17:29, Jacob Kaplan-Moss wrote: > On Tue, Mar 12, 2013 at 11:19 AM, M.-A. Lemburg wrote: >> So let's do this carefully and find a good solution before >> jumping to conclusions. > > Completely agreed; rushing is a bad idea. > > But so is not starting. What I'm seeing ? as a total outsider, a user > of these tools, not someone who creates them ? is that a bunch of > people (Holger, Donald, Richard, the pip maintainers, etc.) have the > beginnings of a solution ready to go *right now*, and I want to > capture that energy and enthusiasm before it evaporates. > > This isn't an academic situation; I've seen companies decline to adopt > Python over this exact security issue. I can't share details in > writing but ask me at PyCon and I can tell you some stories. > Externally-hosted packages are a security risk, full stop. > > There's likely a even better solution involving strong cryptography > and such, but there's also an incremental improvement on the table > right now. Nobody's suggesting that we do this hastily or all at once, > but there *is* a proposal to get the process started right now. Why > shouldn't we get going while there's momentum? Sure; I'm just saying that we need to test drive the proposal before actually adopting it. I'm also trying to get some of the more radical unneeded changes reconsidered. We don't need to break things just because we can - let's leave that to our kids ;-) Holger has already addressed much of this in his V2 proposal and apart from the time frame and some details, it looks good. Meanwhile, I've been playing around with the earlier proposal I put forward: http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal to secure external links and found several issues while implementing it. It's easy to draw up a design, but you only get down to the problems when actually trying to implement it. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 12 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From carl at oddbird.net Tue Mar 12 17:48:08 2013 From: carl at oddbird.net (Carl Meyer) Date: Tue, 12 Mar 2013 10:48:08 -0600 Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI In-Reply-To: <20130312113817.GA9677@merlinux.eu> References: <20130312113817.GA9677@merlinux.eu> Message-ID: <513F5C48.3070602@oddbird.net> Hi Holger, I am confused about the discrepancy between the title of this pre-PEP ("transition to release file hosting on PyPI") and the contents of the PEP, which describe a transition to not crawling _HTML pages_ on external sites looking for distribution download links. These are not the same thing at all. Current installer tools will only crawl external HTML pages if they are rel="download" or rel="homepage", but they will use any link they find in the simple index (regardless of rel attr) if the target of the link appears to be a distribution file (as determined by filename pattern-matching or #egg fragment). At the end of the process you describe, if all packages migrate to "nocrawl", the rel-link HTML spidering will no longer happen. This is a good first step: it will speed up installation somewhat, and reduce the frustration of some package owners when installers find files linked from their project homepage that they never intended for automated installation. But installers will still find and download release packages that are not hosted on PyPI, if those package files are linked directly in the simple index. This is still surprising behavior to many new Python users, and still carries the security and reliability concerns that this PEP claims to address. I'm honestly not sure whether the title or the content more accurately reflects the intent of this PEP; depending which it is, I suggest one of the following: 1) Add to the PEP a description of a further step in the migration process, which actually does transition away from automated installation of non-PyPI-hosted release files (as the default behavior of installation tools); or 2) Change the title of the PEP to something like "Transitioning away from non-PyPI HTML crawling" and add a paragraph to the PEP clarifying that this PEP does not address the issue of actual release files hosted off-PyPI. Carl From holger at merlinux.eu Tue Mar 12 18:05:08 2013 From: holger at merlinux.eu (holger krekel) Date: Tue, 12 Mar 2013 17:05:08 +0000 Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI In-Reply-To: <513F5282.3010206@egenix.com> References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com> Message-ID: <20130312170508.GG9677@merlinux.eu> Hi Marc-Andre, all, On Tue, Mar 12, 2013 at 17:06 +0100, M.-A. Lemburg wrote: > On 12.03.2013 12:38, holger krekel wrote: > > Hi all, > > > > below is the new PEP pre-submit version (V2) which incorporates the > > latest suggestions and aims at a rapidly deployable solution. Thanks in > > particular to Philip, Donald and Marc-Andre. I also added a few notes > > on how installers should behave with respect to non-PYPI crawling. > > > > I think a PEP like doc is warranted and that we should not silently > > change things without proper communication to maintainers and pre-planning > > the implementation/change process. Arguably, the changes are more > > invasive than "oh, let's just do a http->https redirect" which didn't > > work too well either. > > > > Now, if there is some agreement, i can submit this PEP officially tomorrow, > > and given agreement/refinments from the Pycon folks and the likes of > > Richard, we may be able to get going very shortly after Pycon. > > > > cheers, > > holger > > > > > > PEP-draft: transitioning to release-file hosting on PYPI > > ==================================================================== > > > > Status > > ----------- > > > > PRE-SUBMIT-v2 > > > > Abstract > > ------------ > > > > This PEP proposes a backward-compatible transition process to speed up, > > simplify and robustify installing from the pypi.python.org (PYPI) > > package index. The initial transition will put most packages on PYPI > > automatically in a configuration mode which will prevent client-side > > crawling from installers. To ease automatic transition and minimize > > client-side friction, **no changes to distutils or installation tools** are > > required. Instead, the transition is implemented by modifying PYPI to > > serve links from ``simple/`` pages in a configurable way, preventing or > > allowing crawling of non-PYPI sites for detecting release files. > > Maintainers of all PYPI packages will be notified ahead of those > > changes. > > > > Maintainers of packages which currently are hosted on non-PYPI sites > > shall receive instructions and tools to ease "re-hosting" of their > > historic and future package release files. The implementation of such > > tools is NOT required for implementing the initial automatic transition. > > > > Installation tools like pip and easy_install shall warn about crawling > > non-PYPI sites and later default to disallow it and only allow it with > > an explicit option. > > > > > > History and motivations for external hosting > > ------------------------------------------------ > > > > When PYPI went online, it offered release registration but had no > > facility to host release files itself. When hosting was added, no > > automated downloading tool existed yet. When Philip Eby implemented > > automated downloading (through setuptools), he made the choice > > to allow people to use download hosts of their choice. This was > > implemented by the PYPI ``simple/`` index containing links of type > > ``rel=homepage`` or ``rel=download`` which are crawled by installation > > tools to discover package links. As of March 2013, a substantial part > > of packages (estimated to about 10%) make use of this mechanism to host > > files on github, bitbucket, sourceforge or own hosting sites like > > ``mercurial.selenic.com``, to just name a few. > > > > There are many reasons [2]_ why people choose to use external hosting, > > to cite just a few: > > > > - release processes and scripts have been developed already and > > upload to external sites > > > > - it takes too long to upload large files from some places in the world > > > > - export restrictions e.g. for crypto-related software > > > > - company policies which prescribe offering open source packages through > > own sites > > > > - problems with integrating uploading to PYPI into one's release process > > (because of release policies) > > > > - perceived bad reliability of PYPI > > > > - missing knowlege you can upload files > > > > Irrespective of the present-day validity of these reasons, there clearly > > is a history why people choose to host files externally and it even was > > for some time the only way you could do things. > > > > > > Problem > > --------------- > > > > **Today, python package installers (pip and easy_install) often need to > > query non-PYPI sites even if there are no externally hosted files**. > > Apart from querying pypi.python.org's simple index pages, also all > > homepages and download pages ever specified with any release of a > > package are crawled by an installer. The need for installers to > > crawl 3rd party sites slows down installation and makes for a brittle > > unreliable installation process. Those sites and packages also don't > > take part in the :pep:`381` mirroring infrastructure, further decreasing > > reliability and speed of automated installation processes around the world. > > > > Roughly 90% of packages are hosted directly on pypi.python.org [1]_. > > Even for them installers still need to crawl the homepage(s) of a > > package. Many package uploaders are particularly not aware that > > specifying the "homepage" in their release process will slow down > > the installation process for all its users. > > > > Relying on third party sites also opens up more attack vectors > > for injecting malicious packages into sites using automated installs. > > A simple attack might just involve getting hold of an old now-unused > > homepage domain and placing mailicious packages there. Moreover, > > performing a Man-in-The-Middle (MITM) attack between an installation > > site and any of the download sites can inject mailicious packages on the > > installation site. As many homepages and download locations are using > > HTTP and not proper HTTPS, such attacks are not very hard to launch. > > Such MITM attacks can happen even for packages which never intended to > > host files externally as their homepages are contacted by installers > > anyway. > > > > There is currently no way for package maintainers to avoid 3rd party > > crawling, other than removing all homepage/download url metadata > > for all historic releases. While a script [3]_ has been written to > > perform this action, it is not a good general solution because it removes > > semantic information like the "homepage" specification from PYPI packages. > > > > > > Solution > > ----------- > > > > The proposed solution consists of the following implementation and > > communication steps: > > > > - determine which packages have releases files only on PYPI (group A) > > and which have externally hosted release files (group B). > > > > - Prepare PYPI implementation to allow a per-project "hosting mode", > > effectively enabling or disabling external crawling. When enabled > > nothing changes from the current situation of producing ``rel=download`` > > and ``rel=homepage`` attributed links on ``simple/`` pages, > > causing installers to crawl those sites. > > When disabled, the attributions of links will change > > to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to > > avoid crawling 3rd party sites. Retaining the meta-information allows > > tools to still make use of the semantic information. > > Please start using versioned APIs for these things. The > old style index should still be available under some > URL, e.g. /simple-v1/ or /v1/simple/ or /1/simple/ Not sure it is neccessary in this case. I would think it makes the implementation harder and it would probably break PEP381 (mirroring infrastructure) as well. > > - send mail to maintainers of A that their project is going to be > > automatically configured to "disable crawling" in one week > > and encourage them to set this mode earlier to help all of > > their users. > > One week ? That's a somewhat unrealistic timeframe. Assuming we get our initial analysis correct, it's not a super-critical change. Also very easy to switch it back on a per-project basis. I suggest we refine and repeat Donald's script from multiple places in the world and merge the results to get a consolidated set of "needs-no-crawling" packages. If in doubt, we put a project into the "needs-crawl" category. Therefore, we can assume our set of "needs-no-crawling" packages to be safe enough to perform the switching. The one week is just there as an additional safety net, to give the authors a chance for acting if they thing we did wrong. I don't think we end up with many problems and they will be localized to very very few packages. Extending the time frame will not help to significantly reduce this number. The main problem will be mails not reaching a human, i suspect. > I'm also missing some real-life tests to see what the effect > are on actual users, e.g. setup the new index using a > URL /simple-v2/ and let users play with it for a month > before making /simple/ == /simple-v2/. Preparation time is specified in the PEP by bringing the PYPI changes online and asking _some_ people to set their hosting-mode. As of know, the changes to PYPI are fairly trivial. > > - send mail to maintainers of B that their package hosting mode > > is "crawling enabled", and list the sites which currently are crawled, > > and suggest that they re-host their packages directly on PYPI and > > then switch the hosting-mode "disable crawling". Provide instructions > > and at best tools to help with this "re-uploading" process. > > That email should clearly state the PyPI terms to not > cause surprises among the maintainers. Can't the PYPI TOS be referenced from that mail? And an address where they can get back in case of questions? > I'd wait with this step until we've sorted out the PyPI terms > issues on the python-legal list, to not cause a an uproar > from people who get to read the terms for the first time ;-) We could postpone the B packages maintainers mailing if there is a legal need. We can still migrate "A" packages already. > > In addition, maintainers of installation tools are asked to release > > two updates. The first one shall provide clear warnings if external > > crawling needs to happen, for which projects and URLS exactly > > this happens, and that in the future crawling will be disabled by default. > > The next update shall change the default to disallow crawling and allow > > crawling only with an explicit option like ``--crawl-externals`` and > > another option allowing to limit which hosts are allowed to be crawled > > at all. > > AFAIK, both already exist in easy_install. Not sure about pip. > They are not enable per default, though. Right, i didn't investigage in detail the current cmdline options. To keep things simple i'd like to just specify the meta-level of (a) giving warnings and b) changing the default. > > Hosting-Mode state transitions > > ---------------------------------- > > > > 1. At the outset, we set hosting-mode to "notset" for all packages. > > This will not change any link served via the simple index and thus > > no bad effects are expected. Early adopters and testers may now > > change the mode to either "crawl" or "nocrawl" to help with > > streamlining issues in the PYPI implementation. > > > > 2. When maintainers of B packages are mailed their mode is directly > > set to "crawl". > > > > 3. When maintainers of A are mailed we leave the mode at "notset" to allow > > people to change it to "nocrawl" themselves or to set it to "crawl" > > if they think they are wrongly in the "A" group. After a week > > all "notset" modes are set to "nocrawl". > > > > A week after the mailings all packages will be in "crawl" or "nocrawl" > > hosting mode. It is then a matter of good tools and reaching out to > > maintainers of B packages to increase the A/B ratio. > > > > Open questions > > ---------------------- > > > > - Should the support tools for "rehosting" packages be implemented on the > > server side or on the client side? Implementing it on the client > > side probably is quicker to get right and less fatal in terms of failures. > > Not sure what you mean here. "Rehosting" tools help to transfer release files to PYPI which are currently served on non-PYPI sites through the "crawling" algo. This could be done via a server-side interface or via client-side tools. I prefer the latter because i'd like to keep changes on the PYPI server minimal. I am sure Richard agrees :) > Your are also completely leaving out the idea to only cache > distribution files on the PyPI CDN, without having to actually > upload them. Not sure what you mean. FWIW, how PYPI hosts packages itself is completely left out of this PEP on purpose. PYPI might evolve to offer packages on a CDN or improve the existing PEP381 infrastructure or introduce simple "rsync-ability" (like CPAN). IOW, this "no crawling" PEP is orthogonal to this question. > > - double-check if ``rel=newhomepage`` and ``rel=newdownload`` cause the > > desired behaviour of pip and easy_install (both the distribute and > > setuptools based one) to not crawl those pages. > > Indeed :-) We might just avoid rel-attributions and point to the XMLRPC/JSON API - i am sure this works with easy_install and pip :) > Note that it will still be possible to add links to the > distribution files in the long description of the package. > Those links also show up on the /simple/ index page and > will then get used, regardless of whether they have a rel > attribute set or not. Yes, this should be noted. > > - are the "support tools" for re-hosting outside the scope of this PEP? > > As with any PEP proposing an API change or a new API, it > has to provide a reference implementation. The re-hosting tools are NOT required for the "transition" part of the PEP. The PYPI implementation changes are required, of course. Donald offered to help with a PYPI PR and the PEP tries to minimize the neccessary changes. > The current distutils upload command is geared towards > uploading files at release time. While it is possible > to trick it into uploading existing distribution files, > it is not at all obvious how this is done. Right, but i've written the code for that in another project. Unless someone (probably Donald) else beats me to it, i can try to help with writing such a re-hosting tool. > > - Think some more about pip/easy_install "allow-hosts" mode etc. > > Note that tools such as zc.buildout provide easy ways > of adding extra indexes and external URLs to scan for > distribution files. > > I'm not sure how the above would fit such use cases, > i.e. if setuptools were to stop crawling external > links per default, this could mean that user hosted > PyPI-style indexes stop working with newer releases. > > Here's an example list of indexes used in Plone 4.2: > > # Add additional egg download sources here. dist.plone.org contains archives > # of Plone packages. > find-links = > http://dist.plone.org > http://download.zope.org/ppix/ > http://download.zope.org/distribution/ > http://effbot.org/downloads > http://dist.plone.org/release/4.2 > > None of these seem to use the rel attribute feature, so those > will likely continue to work fine. I am not surprised. I don't know of alternative PYPI implementations that actually implement "rel" attribution. Most of them have the purpose of controling which packages are installed in company environments and thus have no need to implement this crawling mechanism but rather always host files in their database. cheers, holger > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 12 2013) > >>> Python Projects, Consulting and Support ... http://www.egenix.com/ > >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > From holger at merlinux.eu Tue Mar 12 18:17:40 2013 From: holger at merlinux.eu (holger krekel) Date: Tue, 12 Mar 2013 17:17:40 +0000 Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI In-Reply-To: <513F5C48.3070602@oddbird.net> References: <20130312113817.GA9677@merlinux.eu> <513F5C48.3070602@oddbird.net> Message-ID: <20130312171740.GH9677@merlinux.eu> Hi Carl, On Tue, Mar 12, 2013 at 10:48 -0600, Carl Meyer wrote: > Hi Holger, > > I am confused about the discrepancy between the title of this pre-PEP > ("transition to release file hosting on PyPI") and the contents of the > PEP, which describe a transition to not crawling _HTML pages_ on > external sites looking for distribution download links. These are not > the same thing at all. I agree the title is not quite right at the moment. > Current installer tools will only crawl external HTML pages if they are > rel="download" or rel="homepage", but they will use any link they find > in the simple index (regardless of rel attr) if the target of the link > appears to be a distribution file (as determined by filename > pattern-matching or #egg fragment). Right. > At the end of the process you describe, if all packages migrate to > "nocrawl", the rel-link HTML spidering will no longer happen. This is a > good first step: it will speed up installation somewhat, and reduce the > frustration of some package owners when installers find files linked > from their project homepage that they never intended for automated > installation. But installers will still find and download release > packages that are not hosted on PyPI, if those package files are linked > directly in the simple index. This is still surprising behavior to many > new Python users, and still carries the security and reliability > concerns that this PEP claims to address. Yes, and here the installers should move to give clear warnings and change defaults. > I'm honestly not sure whether the title or the content more accurately > reflects the intent of this PEP; depending which it is, I suggest one of > the following: > > 1) Add to the PEP a description of a further step in the migration > process, which actually does transition away from automated installation > of non-PyPI-hosted release files (as the default behavior of > installation tools); or This makes sense to me. Do you feel like opening a pull request on https://bitbucket.org/hpk42/pep-pypi to help refine this aspect? I am also on IRC for co-ordination (also about the title) as i intend to create the PEP submission for python-ideas and maybe already the pep-editors (?!). In any case, it wouldn't mean the PEP's discussion is finalized, of course, and i'd continue to post here new versions and ask for feedback. cheers, holger > 2) Change the title of the PEP to something like "Transitioning away > from non-PyPI HTML crawling" and add a paragraph to the PEP clarifying > that this PEP does not address the issue of actual release files hosted > off-PyPI. > Carl > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From pje at telecommunity.com Tue Mar 12 18:18:05 2013 From: pje at telecommunity.com (PJ Eby) Date: Tue, 12 Mar 2013 13:18:05 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> Message-ID: On Tue, Mar 12, 2013 at 12:29 PM, Jacob Kaplan-Moss wrote: > On Tue, Mar 12, 2013 at 11:19 AM, M.-A. Lemburg wrote: >> So let's do this carefully and find a good solution before >> jumping to conclusions. > > Completely agreed; rushing is a bad idea. > > But so is not starting. What I'm seeing ? as a total outsider, a user > of these tools, not someone who creates them ? is that a bunch of > people (Holger, Donald, Richard, the pip maintainers, etc.) have the > beginnings of a solution ready to go *right now*, and I want to > capture that energy and enthusiasm before it evaporates. > > This isn't an academic situation; I've seen companies decline to adopt > Python over this exact security issue. Nobody told them about how to configure a restricted, site-wide default --allow-hosts setting? ( http://peak.telecommunity.com/DevCenter/EasyInstall#restricting-downloads-with-allow-hosts and http://docs.python.org/2/install/index.html#location-and-names-of-config-files ) (FWIW, --allow-hosts was added in setuptools 0.6a6 -- *years* before the distribute fork or the existence of pip, and pip offers the same option.) I've already agreed to change setuptools to default this option to only allow downloads from the same host as its index URL, in a future release. (i.e. to default --allow-hosts to the host of the --index-url option), and I support the removing of rel="" spidering from PyPI (which will significantly mitigate the immediate speed and security issues). Heck, I've been the one who'se repeatedly proposed various ways of cutting back or removing rel="" attributes from the /simple index. The result of these two changes will actually have the same net effect that people are being asking for here: you'll only be able to download stuff hosted on PyPI, unless you explicitly override the --allow-hosts to get a wider range of packages. Already today, when a URL is blocked by --allow-hosts, it's announced as part of easy_install's output, so you can see exactly how much wider you need to extend your trust for the download to succeed. The *only* thing I object to is removing the ability for people to *choose* their own levels of trust. And I have not yet seen an argument that justifies removing people's ability to *choose* to be more inclusive in their downloads. And I've put multiple compromise proposals out there to begin mitigating the problem *now* (i.e. for non-updated versions of setuptools), and every time, the objection is, "no, we need to ban it all now, no discussion, no re-evaluation, no personal choice, everyone must do as we say, no argument". And I don't understand that, at all. From holger at merlinux.eu Tue Mar 12 18:22:26 2013 From: holger at merlinux.eu (holger krekel) Date: Tue, 12 Mar 2013 17:22:26 +0000 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <513F5596.5090302@egenix.com> Message-ID: <20130312172226.GI9677@merlinux.eu> On Tue, Mar 12, 2013 at 13:18 -0400, PJ Eby wrote: > On Tue, Mar 12, 2013 at 12:29 PM, Jacob Kaplan-Moss wrote: > > On Tue, Mar 12, 2013 at 11:19 AM, M.-A. Lemburg wrote: > >> So let's do this carefully and find a good solution before > >> jumping to conclusions. > > > > Completely agreed; rushing is a bad idea. > > > > But so is not starting. What I'm seeing ? as a total outsider, a user > > of these tools, not someone who creates them ? is that a bunch of > > people (Holger, Donald, Richard, the pip maintainers, etc.) have the > > beginnings of a solution ready to go *right now*, and I want to > > capture that energy and enthusiasm before it evaporates. > > > > This isn't an academic situation; I've seen companies decline to adopt > > Python over this exact security issue. > > Nobody told them about how to configure a restricted, site-wide > default --allow-hosts setting? ( > http://peak.telecommunity.com/DevCenter/EasyInstall#restricting-downloads-with-allow-hosts > and http://docs.python.org/2/install/index.html#location-and-names-of-config-files > ) > > (FWIW, --allow-hosts was added in setuptools 0.6a6 -- *years* before > the distribute fork or the existence of pip, and pip offers the same > option.) > > I've already agreed to change setuptools to default this option to > only allow downloads from the same host as its index URL, in a future > release. (i.e. to default --allow-hosts to the host of the > --index-url option), and I support the removing of rel="" spidering > from PyPI (which will significantly mitigate the immediate speed and > security issues). Heck, I've been the one who'se repeatedly proposed > various ways of cutting back or removing rel="" attributes from the > /simple index. > > The result of these two changes will actually have the same net effect > that people are being asking for here: you'll only be able to download > stuff hosted on PyPI, unless you explicitly override the --allow-hosts > to get a wider range of packages. > > Already today, when a URL is blocked by --allow-hosts, it's announced > as part of easy_install's output, so you can see exactly how much > wider you need to extend your trust for the download to succeed. > > The *only* thing I object to is removing the ability for people to > *choose* their own levels of trust. > > And I have not yet seen an argument that justifies removing people's > ability to *choose* to be more inclusive in their downloads. > > And I've put multiple compromise proposals out there to begin > mitigating the problem *now* (i.e. for non-updated versions of > setuptools), and every time, the objection is, "no, we need to ban it > all now, no discussion, no re-evaluation, no personal choice, everyone > must do as we say, no argument". FWIW, the PEP draft in V2 doesn't take this approach and i don't plan to introduce it in subsequent versions. IOW, i agree that we should keep things backward-compatible in the sense that users can choose to use non-default settings to get the current behaviour (which might make their installation process less reliable/secure, but that's their choice). cheers, holger > And I don't understand that, at all. > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From jnoller at gmail.com Tue Mar 12 18:33:55 2013 From: jnoller at gmail.com (Jesse Noller) Date: Tue, 12 Mar 2013 13:33:55 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> Message-ID: <2564CB0F5D96477E86F655096A06941E@gmail.com> > > And I've put multiple compromise proposals out there to begin > mitigating the problem *now* (i.e. for non-updated versions of > setuptools), and every time, the objection is, "no, we need to ban it > all now, no discussion, no re-evaluation, no personal choice, everyone > must do as we say, no argument". > > And I don't understand that, at all. There's not much to understand: external hosting of packages is *actively harmful*, period. End users of easy_install and pip *don't even realize* 99% of the time that these tools are following links off of PyPi and installing packages from random, probably insecure/non https locations all over the internet. Once they realize it they recoil in terror if they have any understanding of the implications. Let me put this in different terms: out of the packages using external hosting: can you prove to me that 100% of them aren't compromised machines serving malware, performing MITM attacks, etc? The fact that the end user tools support this is a bug, but one from history. The fact that PyPI continues to support external links on simple/ is inexcusable given that we know that they are an attack vector. A simple proof of concept on a popular package hosted off site deployed during PyCon would be terrible, it was bad enough that last year people were trying to MITM due to lack of SSL. jesse From pje at telecommunity.com Tue Mar 12 18:54:25 2013 From: pje at telecommunity.com (PJ Eby) Date: Tue, 12 Mar 2013 13:54:25 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <2564CB0F5D96477E86F655096A06941E@gmail.com> References: <20130310150740.GE9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> <2564CB0F5D96477E86F655096A06941E@gmail.com> Message-ID: On Tue, Mar 12, 2013 at 1:33 PM, Jesse Noller wrote: > There's not much to understand: external hosting of packages is *actively harmful*, period. End users of easy_install and pip *don't even realize* 99% of the time that these tools are following links off of PyPi and installing packages from random, probably insecure/non https locations all over the internet. Once they realize it they recoil in terror if they have any understanding of the implications. This is a rationale for secure defaults for various options, like the ones I outlined in the portions of my post that you *didn't* quote. It's not a rationale for removing the options themselves. From mal at egenix.com Tue Mar 12 19:00:21 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 12 Mar 2013 19:00:21 +0100 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <2564CB0F5D96477E86F655096A06941E@gmail.com> References: <20130310150740.GE9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> <2564CB0F5D96477E86F655096A06941E@gmail.com> Message-ID: <513F6D35.2030707@egenix.com> On 12.03.2013 18:33, Jesse Noller wrote: > >> >> And I've put multiple compromise proposals out there to begin >> mitigating the problem *now* (i.e. for non-updated versions of >> setuptools), and every time, the objection is, "no, we need to ban it >> all now, no discussion, no re-evaluation, no personal choice, everyone >> must do as we say, no argument". >> >> And I don't understand that, at all. > > There's not much to understand: external hosting of packages is *actively harmful*, period. End users of easy_install and pip *don't even realize* 99% of the time that these tools are following links off of PyPi and installing packages from random, probably insecure/non https locations all over the internet. Once they realize it they recoil in terror if they have any understanding of the implications. > > Let me put this in different terms: out of the packages using external hosting: can you prove to me that 100% of them aren't compromised machines serving malware, performing MITM attacks, etc? The fact that the end user tools support this is a bug, but one from history. The fact that PyPI continues to support external links on simple/ is inexcusable given that we know that they are an attack vector. > > A simple proof of concept on a popular package hosted off site deployed during PyCon would be terrible, it was bad enough that last year people were trying to MITM due to lack of SSL. Let's please not exaggerate all this. It's not like PyPI is the only server out there implementing HTTPS, ye know ;-) A single package uploaded on PyPI with os.system('rm -rf') in its setup.py could easily ruin all this and no HTTPS in this world would stop it from showing its ugly face. The whole Python package eco-system works based on trust and injecting fear into this system is not helpful, IMO. People need to understand the possible issues, we need to make things safer from both the client and the server side and improve the tool chain. There's really nothing new here. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 12 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Tue Mar 12 19:07:28 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 12 Mar 2013 19:07:28 +0100 Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI In-Reply-To: <20130312170508.GG9677@merlinux.eu> References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com> <20130312170508.GG9677@merlinux.eu> Message-ID: <513F6EE0.6080503@egenix.com> Just a quick note (more later, if time permits)... On 12.03.2013 18:05, holger krekel wrote: > Hi Marc-Andre, all, > >>> - Prepare PYPI implementation to allow a per-project "hosting mode", >>> effectively enabling or disabling external crawling. When enabled >>> nothing changes from the current situation of producing ``rel=download`` >>> and ``rel=homepage`` attributed links on ``simple/`` pages, >>> causing installers to crawl those sites. >>> When disabled, the attributions of links will change >>> to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to >>> avoid crawling 3rd party sites. Retaining the meta-information allows >>> tools to still make use of the semantic information. >> >> Please start using versioned APIs for these things. The >> old style index should still be available under some >> URL, e.g. /simple-v1/ or /v1/simple/ or /1/simple/ > > Not sure it is neccessary in this case. I would think it makes > the implementation harder and it would probably break PEP381 (mirroring > infrastructure) as well. Here's what I meant: We publish the current implementation of the /simple/ index API under a new URL /simple-v1/, so that people that want to use the old API can continue to do so. Then we setup a new /simple-v2/ index API with your proposed change, perhaps even dropping the rel attribute altogether. During testing, we'd then have: /simple/ - same as /simple-v1/ /simple-v1/ - old API with rel attributes always set /simple-v2/ - new API with your changes (rel attributes only set in some cases) After a month or so of testing, we then switch this to: /simple/ - same as /simple-v2/ /simple-v1/ - old API with rel attributes always set /simple-v2/ - new API with your changes (rel attributes only set in some cases) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 12 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Tue Mar 12 19:15:17 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 12 Mar 2013 19:15:17 +0100 Subject: [Catalog-sig] setuptools/distribute/easy_install/pkg_resource sorting algorithm Message-ID: <513F70B5.5030501@egenix.com> I've run into a weird issue with easy_install, that I'm trying to solve: If I place two files named egenix_mxodbc_connect_client-2.0.2-py2.6.egg egenix-mxodbc-connect-client-2.0.2.win32-py2.6.prebuilt.zip into the same directory and let easy_install running on Linux scan this, it considers the second file for Windows as best match. Is the algorithm used for determining the best match documented somewhere ? I've had a look at the implementation, but this left me rather clueless. I thought that setuptools would prefer the .egg file over the prebuilt .zip file - binary files being easier to install than "source" files. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 12 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From donald at stufft.io Tue Mar 12 19:17:55 2013 From: donald at stufft.io (Donald Stufft) Date: Tue, 12 Mar 2013 14:17:55 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <513F5ABB.9030006@egenix.com> References: <20130310150740.GE9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> <513F5ABB.9030 006@egenix.com> Message-ID: <98847D8D-02D3-4C00-AB63-9B100C283D29@stufft.io> On Mar 12, 2013, at 12:41 PM, "M.-A. Lemburg" wrote: > On 12.03.2013 17:29, Jacob Kaplan-Moss wrote: >> On Tue, Mar 12, 2013 at 11:19 AM, M.-A. Lemburg wrote: >>> So let's do this carefully and find a good solution before >>> jumping to conclusions. >> >> Completely agreed; rushing is a bad idea. >> >> But so is not starting. What I'm seeing ? as a total outsider, a user >> of these tools, not someone who creates them ? is that a bunch of >> people (Holger, Donald, Richard, the pip maintainers, etc.) have the >> beginnings of a solution ready to go *right now*, and I want to >> capture that energy and enthusiasm before it evaporates. >> >> This isn't an academic situation; I've seen companies decline to adopt >> Python over this exact security issue. I can't share details in >> writing but ask me at PyCon and I can tell you some stories. >> Externally-hosted packages are a security risk, full stop. >> >> There's likely a even better solution involving strong cryptography >> and such, but there's also an incremental improvement on the table >> right now. Nobody's suggesting that we do this hastily or all at once, >> but there *is* a proposal to get the process started right now. Why >> shouldn't we get going while there's momentum? > > Sure; I'm just saying that we need to test drive the proposal > before actually adopting it. fwiw https://restricted.crate.io/ is the simple index minus any external url and has existed for over a year. I use it full time. and have others doing the same. > > I'm also trying to get some of the more radical unneeded changes > reconsidered. We don't need to break things just because we can - > let's leave that to our kids ;-) > > Holger has already addressed much of this in his V2 proposal > and apart from the time frame and some details, it looks good. > > Meanwhile, I've been playing around with the earlier proposal > I put forward: > > http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal > > to secure external links and found several issues while > implementing it. It's easy to draw up a design, but you > only get down to the problems when actually trying to > implement it. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 12 2013) >>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From carl at oddbird.net Tue Mar 12 19:18:53 2013 From: carl at oddbird.net (Carl Meyer) Date: Tue, 12 Mar 2013 12:18:53 -0600 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> Message-ID: <513F718D.4040307@oddbird.net> It seems to me that there's a remarkable level of consensus developing here (though it may not look like it), and a small set of remaining open questions. The consensus (as I see it): - Migrate away from scraping external HTML pages, with package owners in control of the migration but a deadline for a forced switch, as outlined in Holger's PEP (with all appropriate caution and testing). - In some way, migrate to a situation where the popular installer tools install only release files from PyPI by default, but are capable of installing from other locations if the user provides an option. The open question is basically how to implement the latter portion. I see two options proposed: A) Leave external links in the PyPI simple index, but migrate the major tools to not use external links by default (i.e. Philip's plan to make allow-hosts=pypi the default in a future setuptools), with an option to turn them back on. or B) Do a second PyPI migration, again with a per-package toggle and package owners in control, to a "no external links in simple index" setting. Consider for a moment how similar the end state here is with either A or B. In either case, by default users install only from PyPI, but by providing a special option they can install from some external source. (In B, that special option would be something like --find-links with a URL). In either case, we can continue to allow packages to register themselves on PyPI, be found in searches, etc, without uploading release files to PyPI if they prefer not to; they'll just have to provide special installation instructions to their users in that case. Here are some differences: 1) With B, we can provide a gentler migration for package owners, where they are in control of when the switch happens. With A, regardless of how it's done at some point some package owners are likely to start getting "hey, i can't install your stuff anymore" reports from users, and they can't control when that starts happening. 2) With B, all end users benefit from the new defaults, not only end users who update to the latest and greatest tools. 3) With B (and probably some forms of A as well), end users clearly state which external sources they would like to trust and install from, rather than having a global "trust everything!" flag, which is less secure and less sensible. It seems to me that option B (a controlled, per-package, PyPI migration to no-external-links in simple index) is a better migration path than A (leaving it up to external tools), and the end result either way is very similar. Carl From robertc at robertcollins.net Tue Mar 12 19:43:06 2013 From: robertc at robertcollins.net (Robert Collins) Date: Wed, 13 Mar 2013 07:43:06 +1300 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <513F718D.4040307@oddbird.net> References: <20130310150740.GE9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> Message-ID: On 13 March 2013 07:18, Carl Meyer wrote: > It seems to me that there's a remarkable level of consensus developing > here (though it may not look like it), and a small set of remaining open > questions. > > The consensus (as I see it): I think that is a fair summary. One thing I'd like to mention, that I don't recall seeing so far is that PyPI is *really slow*. I don't mean 'the pypi web host is on a bad link' - far from it. pip, and I presume setuptools, spider to check dependencies and do the external HTML scraping and so forth. This takes an age when each new web host to talk to is a new DNS lookup (say 0.3 seconds) + HTTP request (0.6 seconds) with possible HTTPS setup in there too (up to 1.2 seconds). A project with dozens of dependencies in it's transitive dependency graph may take minutes *just spidering*. Now, if you read those figures and go 'zomg thats slow' - well yes, light speed isn't that fast - and even then while much of round-the-globe traffic is at light speed, a considerable chunk of time isn't. Moving all releases to one HTTPS host (and ensuring persistent connections are used for repeated index queries) [and then drop to HTTP for release files so they can be squid cached] is the simplest short term solution to this, and I'm *really* excited to see it being tackled. Longer term I'd love to see PyPI offer an API to return transitive data, to avoid the spidering altogether. -Rob -- Robert Collins Distinguished Technologist HP Cloud Services From jacob at jacobian.org Tue Mar 12 19:52:25 2013 From: jacob at jacobian.org (Jacob Kaplan-Moss) Date: Tue, 12 Mar 2013 13:52:25 -0500 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> <2564CB0F5D96477E86F655096A06941E@gmail.com> Message-ID: On Tue, Mar 12, 2013 at 12:54 PM, PJ Eby wrote: > This is a rationale for secure defaults for various options, like the > ones I outlined in the portions of my post that you *didn't* quote. > > It's not a rationale for removing the options themselves. Exactly; thanks for saying this better than I did. As we've seen from the recent Rails security vulnerabilities, secure has to be the default. Users having to explicitly choose the "secure" option is an anti-pattern, with teeth. As long as the default, out-of-the-box behavior is secure it's fine; users who want to run their tools with the "--hack-me-if-you-can" flag will find a way to do so. This isn't about taking away people's options, but about putting secure-by-default tools into the hands people who need them the most. Jacob From jacob at jacobian.org Tue Mar 12 19:56:00 2013 From: jacob at jacobian.org (Jacob Kaplan-Moss) Date: Tue, 12 Mar 2013 13:56:00 -0500 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <513F6D35.2030707@egenix.com> References: <20130310150740.GE9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> <2564CB0F5D96477E86F655096A06941E@gmail.com> <513F6D35.2030707@egenix.com> Message-ID: On Tue, Mar 12, 2013 at 1:00 PM, M.-A. Lemburg wrote: > The whole Python package eco-system works based on trust and > injecting fear into this system is not helpful, IMO. I'm sorry if my words came across that way; I'm not trying to scare anyone. I'm trying to emphasize that this isn't an academic discussion; the insecurity of PyPI is something that actively prevents the adoption of Python. I think I'm probably right in saying that everyone here wants to push Python forward; I'm trying to articulate how security fits into that. Again, sorry for not being clearer; you're totally right that fear-mongering isn't helpful. Jacob From jnoller at gmail.com Tue Mar 12 19:58:01 2013 From: jnoller at gmail.com (Jesse Noller) Date: Tue, 12 Mar 2013 14:58:01 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> <2564CB0F5D96477E86F655096A06941E@gmail.com> <513F6D35.2030707@egenix.com> Message-ID: <1A6A72B4C3944FD1A1D02F21AAABF82F@gmail.com> On Tuesday, March 12, 2013 at 2:56 PM, Jacob Kaplan-Moss wrote: > On Tue, Mar 12, 2013 at 1:00 PM, M.-A. Lemburg wrote: > > The whole Python package eco-system works based on trust and > > injecting fear into this system is not helpful, IMO. > > > > I'm sorry if my words came across that way; I'm not trying to scare > anyone. I'm trying to emphasize that this isn't an academic > discussion; the insecurity of PyPI is something that actively prevents > the adoption of Python. I think I'm probably right in saying that > everyone here wants to push Python forward; I'm trying to articulate > how security fits into that. Again, sorry for not being clearer; > you're totally right that fear-mongering isn't helpful. > > Jacob Nah, that was me injecting fear. I call dibs on that one. From jacob at jacobian.org Tue Mar 12 19:59:24 2013 From: jacob at jacobian.org (Jacob Kaplan-Moss) Date: Tue, 12 Mar 2013 13:59:24 -0500 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <1A6A72B4C3944FD1A1D02F21AAABF82F@gmail.com> References: <20130310150740.GE9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> <2564CB0F5D96477E86F655096A06941E@gmail.com> <513F6D35.2030707@egenix.com> <1A6A72B4C3944FD1A1D02F21AAABF82F@gmail.com> Message-ID: On Tue, Mar 12, 2013 at 1:58 PM, Jesse Noller wrote: > Nah, that was me injecting fear. I call dibs on that one. Aw, man! Can I have Uncertainty and Doubt then? Jacob From jnoller at gmail.com Tue Mar 12 20:01:12 2013 From: jnoller at gmail.com (Jesse Noller) Date: Tue, 12 Mar 2013 15:01:12 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> <2564CB0F5D96477E86F655096A06941E@gmail.com> <513F6D35.2030707@egenix.com> <1A6A72B4C3944FD1A1D02F21AAABF82F@gmail.com> Message-ID: On Tuesday, March 12, 2013 at 2:59 PM, Jacob Kaplan-Moss wrote: > On Tue, Mar 12, 2013 at 1:58 PM, Jesse Noller wrote: > > Nah, that was me injecting fear. I call dibs on that one. > > > > Aw, man! > > Can I have Uncertainty and Doubt then? > > Jacob Yes. Just as long as you call me Fear Injector. From mordred at inaugust.com Tue Mar 12 19:51:15 2013 From: mordred at inaugust.com (Monty Taylor) Date: Tue, 12 Mar 2013 11:51:15 -0700 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <513F6D35.2030707@egenix.com> References: <20130310150740.GE9677@merlinux.eu> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> <2564CB0F5D96477E86F655096A06941E@gmail.com> <513F6D35.2030707@egenix.com> Message-ID: <513F7923.4050106@inaugust.com> On 03/12/2013 11:00 AM, M.-A. Lemburg wrote: > On 12.03.2013 18:33, Jesse Noller wrote: >> >>> >>> And I've put multiple compromise proposals out there to begin >>> mitigating the problem *now* (i.e. for non-updated versions of >>> setuptools), and every time, the objection is, "no, we need to ban it >>> all now, no discussion, no re-evaluation, no personal choice, everyone >>> must do as we say, no argument". >>> >>> And I don't understand that, at all. >> >> There's not much to understand: external hosting of packages is *actively harmful*, period. End users of easy_install and pip *don't even realize* 99% of the time that these tools are following links off of PyPi and installing packages from random, probably insecure/non https locations all over the internet. Once they realize it they recoil in terror if they have any understanding of the implications. >> >> Let me put this in different terms: out of the packages using external hosting: can you prove to me that 100% of them aren't compromised machines serving malware, performing MITM attacks, etc? The fact that the end user tools support this is a bug, but one from history. The fact that PyPI continues to support external links on simple/ is inexcusable given that we know that they are an attack vector. >> >> A simple proof of concept on a popular package hosted off site deployed during PyCon would be terrible, it was bad enough that last year people were trying to MITM due to lack of SSL. > > Let's please not exaggerate all this. It's not like PyPI is > the only server out there implementing HTTPS, ye know ;-) > > A single package uploaded on PyPI with os.system('rm -rf') > in its setup.py could easily ruin all this and no HTTPS in this > world would stop it from showing its ugly face. > > The whole Python package eco-system works based on trust and > injecting fear into this system is not helpful, IMO. > > People need to understand the possible issues, we need to make > things safer from both the client and the server side and > improve the tool chain. There's really nothing new here. externally hosted packages isn't just about security. It's about reliability of the service. PyPI as it is right now with externally hosted packages is 100% unusable in automated systems for reasons having nothing to do with security. For better or for worse, PyPI _IS_ the place where python packages are expected to exist and be uploaded. However, attempting to hang on to a feature which undermines the ability of the service to be used is absolutely mind-blowing to me. Why, you ask, is it broken? a) it's massively unreliable, because reliability is now dependent on the availability of ALL of the external link hosting sites combined. It's not even just the packages - version information lookups, which should take 0.1 second and be the most reliable thing ever, have to spider a billion web pages. b) It's massively slow. All that spidering of lycos and altavista and some random trac site? Slow. Guess what - that spidering is happening on my LAPTOP - so while sitting here on this plane, if I want to install a package that's on PyPI, it has to go web-spider other things. c) It's agressive about being both of the above. Even if packages are hosted on PyPI, my local client will STILL spider external sites that are listed. The funny part is, if you remove the externally hosted packages, pypi is a wonderfully elegant system that is super easy to scale. A PyPI can be completely static, which is how we run the partial-mirror that OpenStack is forced to run due to the instability of homepages stored on Apple IIe's of various random people who decided that "python setup.py sdist upload" is too hard to run. It's great. We love it. I works for just about everything. Except for those darned external links. Why are we persisting in trying to make this super complex? Can we revisit PEP20 here? Specifically: Explicit is better than implicit. Simple is better than complex. Flat is better than nested. ... There should be one-- and preferably only one --obvious way to do it If I run : pip install foo I am EXPLICITLY asking for a package from PyPI, not from launchpad. There is a URL option, which would allow to to request a package from somewhere that is not pypi should I want to do that. Having to spider out to external sites is more complex that not doing that. External sites are effectively needless nesting. Most importantly - PyPI is there - it's where we upload packages? What benefit do we gain from subverting that? Nothing. Remove the external links. Please. Monty From holger at merlinux.eu Tue Mar 12 20:11:41 2013 From: holger at merlinux.eu (holger krekel) Date: Tue, 12 Mar 2013 19:11:41 +0000 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <513F718D.4040307@oddbird.net> References: <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> Message-ID: <20130312191141.GJ9677@merlinux.eu> On Tue, Mar 12, 2013 at 12:18 -0600, Carl Meyer wrote: > It seems to me that there's a remarkable level of consensus developing > here (though it may not look like it), and a small set of remaining open > questions. > > The consensus (as I see it): > > - Migrate away from scraping external HTML pages, with package owners in > control of the migration but a deadline for a forced switch, as outlined > in Holger's PEP (with all appropriate caution and testing). > > - In some way, migrate to a situation where the popular installer tools > install only release files from PyPI by default, but are capable of > installing from other locations if the user provides an option. > > The open question is basically how to implement the latter portion. I > see two options proposed: > > A) Leave external links in the PyPI simple index, but migrate the major > tools to not use external links by default (i.e. Philip's plan to make > allow-hosts=pypi the default in a future setuptools), with an option to > turn them back on. > > or > > B) Do a second PyPI migration, again with a per-package toggle and > package owners in control, to a "no external links in simple index" setting. > > Consider for a moment how similar the end state here is with either A or > B. In either case, by default users install only from PyPI, but by > providing a special option they can install from some external source. > (In B, that special option would be something like --find-links with a > URL). In either case, we can continue to allow packages to register > themselves on PyPI, be found in searches, etc, without uploading release > files to PyPI if they prefer not to; they'll just have to provide > special installation instructions to their users in that case. > > Here are some differences: > > 1) With B, we can provide a gentler migration for package owners, where > they are in control of when the switch happens. With A, regardless of > how it's done at some point some package owners are likely to start > getting "hey, i can't install your stuff anymore" reports from users, > and they can't control when that starts happening. > > 2) With B, all end users benefit from the new defaults, not only end > users who update to the latest and greatest tools. > > 3) With B (and probably some forms of A as well), end users clearly > state which external sources they would like to trust and install from, > rather than having a global "trust everything!" flag, which is less > secure and less sensible. > > It seems to me that option B (a controlled, per-package, PyPI migration > to no-external-links in simple index) is a better migration path than A > (leaving it up to external tools), and the end result either way is very > similar. Thanks for outlining this so well. I agree with the B approach and suggest to introduce three per-package hosting-states then: - pypi-only: only pypi-hosted files and all #egg links are served via simple/ (#egg links are necccessary and a special case for installing development snapshots - we should not leave them out i think) - pypi-nocrawl: all links as of know but without rel-attribution (i.e. all description links are served and also the homepage/download ones but without rel-attribution) - pypi-crawl: all links as of know The automatic transition of the hosting-mode for most packages (with pre-announcements) specified in the PEP will need to differentiate between switching to pypi-only and pypi-nocrawl. And as discussed elsewhere, the implementation of the underlying analysis script and the PYPI changes certainly needs to be ready before the PEP can be finally accepted. Am open to an according PR to the PEP-draft :) holger > > Carl > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From holger at merlinux.eu Tue Mar 12 20:17:21 2013 From: holger at merlinux.eu (holger krekel) Date: Tue, 12 Mar 2013 19:17:21 +0000 Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI In-Reply-To: <513F6EE0.6080503@egenix.com> References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com> <20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com> Message-ID: <20130312191721.GK9677@merlinux.eu> On Tue, Mar 12, 2013 at 19:07 +0100, M.-A. Lemburg wrote: > Just a quick note (more later, if time permits)... > > On 12.03.2013 18:05, holger krekel wrote: > > Hi Marc-Andre, all, > > > >>> - Prepare PYPI implementation to allow a per-project "hosting mode", > >>> effectively enabling or disabling external crawling. When enabled > >>> nothing changes from the current situation of producing ``rel=download`` > >>> and ``rel=homepage`` attributed links on ``simple/`` pages, > >>> causing installers to crawl those sites. > >>> When disabled, the attributions of links will change > >>> to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to > >>> avoid crawling 3rd party sites. Retaining the meta-information allows > >>> tools to still make use of the semantic information. > >> > >> Please start using versioned APIs for these things. The > >> old style index should still be available under some > >> URL, e.g. /simple-v1/ or /v1/simple/ or /1/simple/ > > > > Not sure it is neccessary in this case. I would think it makes > > the implementation harder and it would probably break PEP381 (mirroring > > infrastructure) as well. > > Here's what I meant: > > We publish the current implementation of the /simple/ index API > under a new URL /simple-v1/, so that people that want to use > the old API can continue to do so. > > Then we setup a new /simple-v2/ index API with your proposed > change, perhaps even dropping the rel attribute altogether. > > During testing, we'd then have: > > /simple/ - same as /simple-v1/ > /simple-v1/ - old API with rel attributes always set > /simple-v2/ - new API with your changes (rel attributes only > set in some cases) > > After a month or so of testing, we then switch this to: > > /simple/ - same as /simple-v2/ > /simple-v1/ - old API with rel attributes always set > /simple-v2/ - new API with your changes (rel attributes only > set in some cases) I understand but am not sure how easy this is to manage at the moment. I'd like to put this up in open questions and have (eventually) Richard comment on this before evolving it further. best, holger > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 12 2013) > >>> Python Projects, Consulting and Support ... http://www.egenix.com/ > >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > From pje at telecommunity.com Tue Mar 12 20:21:43 2013 From: pje at telecommunity.com (PJ Eby) Date: Tue, 12 Mar 2013 15:21:43 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <513F718D.4040307@oddbird.net> References: <20130310150740.GE9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> Message-ID: On Tue, Mar 12, 2013 at 2:18 PM, Carl Meyer wrote: > It seems to me that there's a remarkable level of consensus developing > here (though it may not look like it), and a small set of remaining open > questions. > > The consensus (as I see it): > > - Migrate away from scraping external HTML pages, with package owners in > control of the migration but a deadline for a forced switch, as outlined > in Holger's PEP (with all appropriate caution and testing). > > - In some way, migrate to a situation where the popular installer tools > install only release files from PyPI by default, but are capable of > installing from other locations if the user provides an option. Perhaps I'm confused, but ISTM that every time I've said this, Donald and Lennart argue that it should not be possible to provide such an option -- or to be more specific, that PyPI should not publish the information that makes that option possible. If that's *not* the position they're taking, it'd be good to know, because we could totally stop arguing about it in that case. > A) Leave external links in the PyPI simple index, but migrate the major > tools to not use external links by default (i.e. Philip's plan to make > allow-hosts=pypi the default in a future setuptools), with an option to > turn them back on. I don't know who has proposed this option, but it's not me. You seem to be confusing external links and HTML-scraped links (rel="" attributed links in /simple). I was the first person to propose disabling HTML-scraped links from PyPI *ASAP*. I still want them gone. That won't require tool changes, it just requires a rollout plan. Holger has one, let's work on that. The second thing I proposed is that new tools be developed to *assist* package authors in moving their files onto PyPI, so that future tool changes wouldn't result in widespread instances of people needing to set their tools to insecure settings just to get anything done. We need to get people's files moving onto PyPI *first*, in order to make changing the tool defaults practical. The *only* thing I object to is the part where some people want to ban external links from /simple, always and forever, regardless of the package authors' choice in the matter. > B) Do a second PyPI migration, again with a per-package toggle and > package owners in control, to a "no external links in simple index" setting. > > Consider for a moment how similar the end state here is with either A or > B. In either case, by default users install only from PyPI, but by > providing a special option they can install from some external source. > (In B, that special option would be something like --find-links with a > URL). In either case, we can continue to allow packages to register > themselves on PyPI, be found in searches, etc, without uploading release > files to PyPI if they prefer not to; they'll just have to provide > special installation instructions to their users in that case. Not true: approach B means that you won't know what values to pass to the option. It's also confused about an important point. All the links that appear in /simple are *already* completely under the package author's control. No new switches are required to remove external links - you can simply remove them from your releases' descriptions. This process could be made more transparent or easy, sure -- but it's a mistake to say that this is granting the package owners control that they don't already have. What they lack control over is the rel="" attributes, short of removing those links entirely. That's why I've proposed having a switch for that , as reflected in Holger's pre-PEP. > 1) With B, we can provide a gentler migration for package owners, where > they are in control of when the switch happens. > > 2) With B, all end users benefit from the new defaults, not only end > users who update to the latest and greatest tools. > > 3) With B (and probably some forms of A as well), end users clearly > state which external sources they would like to trust and install from, > rather than having a global "trust everything!" flag, which is less > secure and less sensible. These 3 statements all mischaracterize things substantially, because none of those benefits are exclusive to A, and nobody has proposed a "trust everything" flag. Removing rel="" attributes also benefits everyone right away, *without* new tools. From pje at telecommunity.com Tue Mar 12 20:24:52 2013 From: pje at telecommunity.com (PJ Eby) Date: Tue, 12 Mar 2013 15:24:52 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> Message-ID: On Tue, Mar 12, 2013 at 2:43 PM, Robert Collins wrote: > This takes an age when each new web host to talk to is a new DNS > lookup (say 0.3 seconds) + HTTP request (0.6 seconds) with possible > HTTPS setup in there too (up to 1.2 seconds). A project with dozens of > dependencies in it's transitive dependency graph may take minutes > *just spidering*. Which is why we should act on Holger's pre-PEP to drop the rel="" attributes from projects that don't actually use them -- builds involving those projects will immediately drop to one HTTP request to PyPI, plus one to whatever host has the actually needed file. And that's without any tooling changes whatsoever: builds all over the planet will just get faster and more secure, right away. From mal at egenix.com Tue Mar 12 20:28:21 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 12 Mar 2013 20:28:21 +0100 Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI In-Reply-To: <20130312191721.GK9677@merlinux.eu> References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com> <20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com> <20130312191721.GK9677@merlinux.eu> Message-ID: <513F81D5.1040802@egenix.com> On 12.03.2013 20:17, holger krekel wrote: > On Tue, Mar 12, 2013 at 19:07 +0100, M.-A. Lemburg wrote: >> Just a quick note (more later, if time permits)... >> >> On 12.03.2013 18:05, holger krekel wrote: >>> Hi Marc-Andre, all, >>> >>>>> - Prepare PYPI implementation to allow a per-project "hosting mode", >>>>> effectively enabling or disabling external crawling. When enabled >>>>> nothing changes from the current situation of producing ``rel=download`` >>>>> and ``rel=homepage`` attributed links on ``simple/`` pages, >>>>> causing installers to crawl those sites. >>>>> When disabled, the attributions of links will change >>>>> to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to >>>>> avoid crawling 3rd party sites. Retaining the meta-information allows >>>>> tools to still make use of the semantic information. >>>> >>>> Please start using versioned APIs for these things. The >>>> old style index should still be available under some >>>> URL, e.g. /simple-v1/ or /v1/simple/ or /1/simple/ >>> >>> Not sure it is neccessary in this case. I would think it makes >>> the implementation harder and it would probably break PEP381 (mirroring >>> infrastructure) as well. >> >> Here's what I meant: >> >> We publish the current implementation of the /simple/ index API >> under a new URL /simple-v1/, so that people that want to use >> the old API can continue to do so. >> >> Then we setup a new /simple-v2/ index API with your proposed >> change, perhaps even dropping the rel attribute altogether. >> >> During testing, we'd then have: >> >> /simple/ - same as /simple-v1/ >> /simple-v1/ - old API with rel attributes always set >> /simple-v2/ - new API with your changes (rel attributes only >> set in some cases) >> >> After a month or so of testing, we then switch this to: >> >> /simple/ - same as /simple-v2/ >> /simple-v1/ - old API with rel attributes always set >> /simple-v2/ - new API with your changes (rel attributes only >> set in some cases) > > I understand but am not sure how easy this is to manage at the moment. > I'd like to put this up in open questions and have (eventually) Richard > comment on this before evolving it further. Should be pretty easy to do... Just add a version parameter to .run_simple() at https://bitbucket.org/loewis/pypi/src/dc6c23cce746bb25e0b013a1a1e020bc27bb332b/webui.py?at=default#cl-706 and then hook it up to the two URLs at https://bitbucket.org/loewis/pypi/src/dc6c23cce746bb25e0b013a1a1e020bc27bb332b/webui.py?at=default#cl-486 and https://bitbucket.org/loewis/pypi/src/dc6c23cce746bb25e0b013a1a1e020bc27bb332b/pypi.wsgi?at=default#cl-71 -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 12 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From jacob at jacobian.org Tue Mar 12 20:36:20 2013 From: jacob at jacobian.org (Jacob Kaplan-Moss) Date: Tue, 12 Mar 2013 14:36:20 -0500 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> Message-ID: On Tue, Mar 12, 2013 at 2:21 PM, PJ Eby wrote: > The *only* thing I object to is the part where some people want to ban > external links from /simple, always and forever, regardless of the > package authors' choice in the matter. Here's the thing though, there are already a bunch of other ways users can install packages from external repositories. I can think of at least two: * I can pip/easy_install a given URL (e.g. easy_install https://www.djangoproject.com/download/1.5/tarball/) * I can use a custom index server (pip install -i http://localserver/ django) The important part is that in each of those cases I can see clearly where I'm getting things from. OTOH, if I do "pip install Django" I ? the person making the install ? have no control over where that package comes from. It really violates people's expectations that this reaches out to somewhere that's not-pypi. More importantly it prevents me from making a security choice -- I literally don't know until the download starts where the file might be coming from. >From where I stand the absolutely non-negotiable part is that `pip/easy_install/whatever package` should NEVER access an external host (after some suitable transition period). This needs to include older installer software, and it needs to make it hard for new tools to do the wrong thing. How this is achieved really doesn't matter to me -- if there's a "pip install --insecure Django" that's fine too -- but to me it's non-negotiable that the out-of-the-box configuration not allow external hosts. Yes, this means taking some options away from the package creator. It means that when I'm wearing my author-of-Django hat I can't choose to list Django on PyPI but provide the download elsewhere. That's not perfect, but given a "creator choice" vs "out of the box security" choice the latter has to win. [And as a package creator I still have options: I can run my own package server, fairly easy to do these days.] Again, the *how* isn't a big deal to me, but the result is really important: the tooling has to be secure-by-default, and that means (among other things) `pip install package` can never hit something that's not PyPI without me explicitly asking for it. Jacob From pje at telecommunity.com Tue Mar 12 20:46:48 2013 From: pje at telecommunity.com (PJ Eby) Date: Tue, 12 Mar 2013 15:46:48 -0400 Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI In-Reply-To: <513F6EE0.6080503@egenix.com> References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com> <20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com> Message-ID: On Tue, Mar 12, 2013 at 2:07 PM, M.-A. Lemburg wrote: > Just a quick note (more later, if time permits)... > > On 12.03.2013 18:05, holger krekel wrote: >> Hi Marc-Andre, all, >> >>>> - Prepare PYPI implementation to allow a per-project "hosting mode", >>>> effectively enabling or disabling external crawling. When enabled >>>> nothing changes from the current situation of producing ``rel=download`` >>>> and ``rel=homepage`` attributed links on ``simple/`` pages, >>>> causing installers to crawl those sites. >>>> When disabled, the attributions of links will change >>>> to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to >>>> avoid crawling 3rd party sites. Retaining the meta-information allows >>>> tools to still make use of the semantic information. >>> >>> Please start using versioned APIs for these things. The >>> old style index should still be available under some >>> URL, e.g. /simple-v1/ or /v1/simple/ or /1/simple/ >> >> Not sure it is neccessary in this case. I would think it makes >> the implementation harder and it would probably break PEP381 (mirroring >> infrastructure) as well. > > Here's what I meant: > > We publish the current implementation of the /simple/ index API > under a new URL /simple-v1/, so that people that want to use > the old API can continue to do so. Do you know of anyone who's *actually* going to need/use this alternate API. Why can't they just the XML-RPC API, the DOAP API, or any other means of obtaining this information? Heck, the proposal to just change the value of the rel attribute isn't going to stop anybody from using that data. Please let's not complicate this by adding more API formats for PyPI to support.. From holger at merlinux.eu Tue Mar 12 20:57:07 2013 From: holger at merlinux.eu (holger krekel) Date: Tue, 12 Mar 2013 19:57:07 +0000 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> Message-ID: <20130312195707.GL9677@merlinux.eu> On Tue, Mar 12, 2013 at 14:36 -0500, Jacob Kaplan-Moss wrote: > On Tue, Mar 12, 2013 at 2:21 PM, PJ Eby wrote: > > The *only* thing I object to is the part where some people want to ban > > external links from /simple, always and forever, regardless of the > > package authors' choice in the matter. > > Here's the thing though, there are already a bunch of other ways users > can install packages from external repositories. I can think of at > least two: > > * I can pip/easy_install a given URL (e.g. easy_install > https://www.djangoproject.com/download/1.5/tarball/) > * I can use a custom index server (pip install -i http://localserver/ django) > > The important part is that in each of those cases I can see clearly > where I'm getting things from. > > OTOH, if I do "pip install Django" I ? the person making the install ? > have no control over where that package comes from. It really violates > people's expectations that this reaches out to somewhere that's > not-pypi. More importantly it prevents me from making a security > choice -- I literally don't know until the download starts where the > file might be coming from. > > >From where I stand the absolutely non-negotiable part is that > `pip/easy_install/whatever package` should NEVER access an external > host (after some suitable transition period). This needs to include > older installer software, and it needs to make it hard for new tools > to do the wrong thing. How this is achieved really doesn't matter to > me -- if there's a "pip install --insecure Django" that's fine too -- > but to me it's non-negotiable that the out-of-the-box configuration > not allow external hosts. > > Yes, this means taking some options away from the package creator. It > means that when I'm wearing my author-of-Django hat I can't choose to > list Django on PyPI but provide the download elsewhere. That's not > perfect, but given a "creator choice" vs "out of the box security" > choice the latter has to win. [And as a package creator I still have > options: I can run my own package server, fairly easy to do these > days.] > > Again, the *how* isn't a big deal to me, but the result is really > important: the tooling has to be secure-by-default, and that means > (among other things) `pip install package` can never hit something > that's not PyPI without me explicitly asking for it. Let's be clear, however, that we are at most reducing attack vectors, there are substantial attack vectors left. Nobody should be lead to think that PYPI is a trusted or reviewed source of software even if we got rid of external hosting completely. holger > Jacob > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From holger at merlinux.eu Tue Mar 12 20:59:02 2013 From: holger at merlinux.eu (holger krekel) Date: Tue, 12 Mar 2013 19:59:02 +0000 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> Message-ID: <20130312195902.GM9677@merlinux.eu> On Tue, Mar 12, 2013 at 15:21 -0400, PJ Eby wrote: > On Tue, Mar 12, 2013 at 2:18 PM, Carl Meyer wrote: > > It seems to me that there's a remarkable level of consensus developing > > here (though it may not look like it), and a small set of remaining open > > questions. > > > > The consensus (as I see it): > > > > - Migrate away from scraping external HTML pages, with package owners in > > control of the migration but a deadline for a forced switch, as outlined > > in Holger's PEP (with all appropriate caution and testing). > > > > - In some way, migrate to a situation where the popular installer tools > > install only release files from PyPI by default, but are capable of > > installing from other locations if the user provides an option. > > Perhaps I'm confused, but ISTM that every time I've said this, Donald > and Lennart argue that it should not be possible to provide such an > option -- or to be more specific, that PyPI should not publish the > information that makes that option possible. > > If that's *not* the position they're taking, it'd be good to know, > because we could totally stop arguing about it in that case. I don't know. At least the pre-PEP doesn't take the position to unconditionally ban external links. Maybe Lennart or Donald can they whether they oppose the moves outlined in the PEP. I'd hope that the perceived "perfect" doesn't become the enemy of the good here :) > > A) Leave external links in the PyPI simple index, but migrate the major > > tools to not use external links by default (i.e. Philip's plan to make > > allow-hosts=pypi the default in a future setuptools), with an option to > > turn them back on. > > I don't know who has proposed this option, but it's not me. You seem > to be confusing external links and HTML-scraped links (rel="" > attributed links in /simple). The suggested behaviour of installers is not fully formulated yet in the PEP. We should improve that. > I was the first person to propose disabling HTML-scraped links from > PyPI *ASAP*. Yes, and thanks for pushing us in this direction. > I still want them gone. That won't require tool > changes, it just requires a rollout plan. Holger has one, let's work > on that. > > The second thing I proposed is that new tools be developed to *assist* > package authors in moving their files onto PyPI, so that future tool > changes wouldn't result in widespread instances of people needing to > set their tools to insecure settings just to get anything done. We > need to get people's files moving onto PyPI *first*, in order to make > changing the tool defaults practical. Indeed, it's a good idea to require the "re-hosting" or "transfer" tool ready before installers change their defaults. > The *only* thing I object to is the part where some people want to ban > external links from /simple, always and forever, regardless of the > package authors' choice in the matter. I agree the package author should have a choice about the serving of links for their package. And installers should change defaults so that install-users have a choice as well, eventually, to control whether they are fine with crawling or using external links. > > B) Do a second PyPI migration, again with a per-package toggle and > > package owners in control, to a "no external links in simple index" setting. > > > > Consider for a moment how similar the end state here is with either A or > > B. In either case, by default users install only from PyPI, but by > > providing a special option they can install from some external source. > > (In B, that special option would be something like --find-links with a > > URL). In either case, we can continue to allow packages to register > > themselves on PyPI, be found in searches, etc, without uploading release > > files to PyPI if they prefer not to; they'll just have to provide > > special installation instructions to their users in that case. > > Not true: approach B means that you won't know what values to pass to > the option. Yes and no: in the one case you need to specify "--crawl" or "--use-external-links" and in the other "--find-links https://..." The latter requires reading the homepage for the correct URL or long_description of a package so is less obvious to install-users. > It's also confused about an important point. All the links that > appear in /simple are *already* completely under the package author's > control. No new switches are required to remove external links - you > can simply remove them from your releases' descriptions. This process > could be made more transparent or easy, sure -- but it's a mistake to > say that this is granting the package owners control that they don't > already have. Right. I think allowing a package maintainer to say "actually, please don't serve external links for my package" (hosting mode "pypi-only") is an easy expressive way to exert this control. > What they lack control over is the rel="" attributes, short of > removing those links entirely. That's why I've proposed having a > switch for that , as reflected in Holger's pre-PEP. > > > > 1) With B, we can provide a gentler migration for package owners, where > > they are in control of when the switch happens. > > > > 2) With B, all end users benefit from the new defaults, not only end > > users who update to the latest and greatest tools. > > > > 3) With B (and probably some forms of A as well), end users clearly > > state which external sources they would like to trust and install from, > > rather than having a global "trust everything!" flag, which is less > > secure and less sensible. > > These 3 statements all mischaracterize things substantially, because > none of those benefits are exclusive to A, and nobody has proposed a i guess you mean "B" here. > "trust everything" flag. Removing rel="" attributes also benefits > everyone right away, *without* new tools. Right. I don't see much overall disagreement however ... let's re-check once the next PEP draft is out :) holger > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From mal at egenix.com Tue Mar 12 20:59:30 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 12 Mar 2013 20:59:30 +0100 Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI In-Reply-To: References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com> <20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com> Message-ID: <513F8922.90008@egenix.com> On 12.03.2013 20:46, PJ Eby wrote: > On Tue, Mar 12, 2013 at 2:07 PM, M.-A. Lemburg wrote: >> Just a quick note (more later, if time permits)... >> >> On 12.03.2013 18:05, holger krekel wrote: >>> Hi Marc-Andre, all, >>> >>>>> - Prepare PYPI implementation to allow a per-project "hosting mode", >>>>> effectively enabling or disabling external crawling. When enabled >>>>> nothing changes from the current situation of producing ``rel=download`` >>>>> and ``rel=homepage`` attributed links on ``simple/`` pages, >>>>> causing installers to crawl those sites. >>>>> When disabled, the attributions of links will change >>>>> to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to >>>>> avoid crawling 3rd party sites. Retaining the meta-information allows >>>>> tools to still make use of the semantic information. >>>> >>>> Please start using versioned APIs for these things. The >>>> old style index should still be available under some >>>> URL, e.g. /simple-v1/ or /v1/simple/ or /1/simple/ >>> >>> Not sure it is neccessary in this case. I would think it makes >>> the implementation harder and it would probably break PEP381 (mirroring >>> infrastructure) as well. >> >> Here's what I meant: >> >> We publish the current implementation of the /simple/ index API >> under a new URL /simple-v1/, so that people that want to use >> the old API can continue to do so. > > Do you know of anyone who's *actually* going to need/use this > alternate API. I think we should establish a versioned API like that for PyPI to make progress easier. All major web APIs use versioning for this reason. > Why can't they just the XML-RPC API, the DOAP API, or > any other means of obtaining this information? Those cannot easily be put on the CDN and would cause an unnecessary strain on the PyPI server. We could/should probably also make the PKG-INFO meta data file, plus some other static information such as upload/release dates (as RSS/Atom file) available on the /simple/ page to make this easier to use over the CDN. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 12 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Tue Mar 12 20:59:59 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 12 Mar 2013 20:59:59 +0100 Subject: [Catalog-sig] setuptools/distribute/easy_install/pkg_resource sorting algorithm In-Reply-To: <513F70B5.5030501@egenix.com> References: <513F70B5.5030501@egenix.com> Message-ID: <513F893F.9010707@egenix.com> On 12.03.2013 19:15, M.-A. Lemburg wrote: > I've run into a weird issue with easy_install, that I'm trying to solve: > > If I place two files named > > egenix_mxodbc_connect_client-2.0.2-py2.6.egg > egenix-mxodbc-connect-client-2.0.2.win32-py2.6.prebuilt.zip > > into the same directory and let easy_install running on Linux > scan this, it considers the second file for Windows as best > match. > > Is the algorithm used for determining the best match documented > somewhere ? > > I've had a look at the implementation, but this left me rather > clueless. > > I thought that setuptools would prefer the .egg file over > the prebuilt .zip file - binary files being easier to install > than "source" files. After some experiments, I found that the follow change in filename (swapping platform and python version, in addition to use '-' instead of '.) works: egenix-mxodbc-connect-client-2.0.2-py2.6-win32.prebuilt.zip OTOH, this one doesn't (notice the difference ?): egenix-mxodbc-connect-client-2.0.2.py2.6-win32.prebuilt.zip The logic behind all this looks rather fragile to me. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 12 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From donald at stufft.io Tue Mar 12 21:01:26 2013 From: donald at stufft.io (Donald Stufft) Date: Tue, 12 Mar 2013 16:01:26 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <20130312195707.GL9677@merlinux.eu> References: <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> <20130312195707.GL9677@merlinux.eu> Message-ID: <2C9D488E-7F9E-4FE9-ABB8-7CBBC309C90F@stufft.io> On Mar 12, 2013, at 3:57 PM, holger krekel wrote: > On Tue, Mar 12, 2013 at 14:36 -0500, Jacob Kaplan-Moss wrote: >> On Tue, Mar 12, 2013 at 2:21 PM, PJ Eby wrote: >>> The *only* thing I object to is the part where some people want to ban >>> external links from /simple, always and forever, regardless of the >>> package authors' choice in the matter. >> >> Here's the thing though, there are already a bunch of other ways users >> can install packages from external repositories. I can think of at >> least two: >> >> * I can pip/easy_install a given URL (e.g. easy_install >> https://www.djangoproject.com/download/1.5/tarball/) >> * I can use a custom index server (pip install -i http://localserver/ django) >> >> The important part is that in each of those cases I can see clearly >> where I'm getting things from. >> >> OTOH, if I do "pip install Django" I ? the person making the install ? >> have no control over where that package comes from. It really violates >> people's expectations that this reaches out to somewhere that's >> not-pypi. More importantly it prevents me from making a security >> choice -- I literally don't know until the download starts where the >> file might be coming from. >> >>> From where I stand the absolutely non-negotiable part is that >> `pip/easy_install/whatever package` should NEVER access an external >> host (after some suitable transition period). This needs to include >> older installer software, and it needs to make it hard for new tools >> to do the wrong thing. How this is achieved really doesn't matter to >> me -- if there's a "pip install --insecure Django" that's fine too -- >> but to me it's non-negotiable that the out-of-the-box configuration >> not allow external hosts. >> >> Yes, this means taking some options away from the package creator. It >> means that when I'm wearing my author-of-Django hat I can't choose to >> list Django on PyPI but provide the download elsewhere. That's not >> perfect, but given a "creator choice" vs "out of the box security" >> choice the latter has to win. [And as a package creator I still have >> options: I can run my own package server, fairly easy to do these >> days.] >> >> Again, the *how* isn't a big deal to me, but the result is really >> important: the tooling has to be secure-by-default, and that means >> (among other things) `pip install package` can never hit something >> that's not PyPI without me explicitly asking for it. > > Let's be clear, however, that we are at most reducing attack vectors, > there are substantial attack vectors left. Nobody should be lead to > think that PYPI is a trusted or reviewed source of software even > if we got rid of external hosting completely. "Trust" depends on your trust model. PyPI is not and will never be a system where you can pip install random packages and expect nothing bad to happen. You should however be able to trust that when you `pip install foo==1.0`` you will get exactly that. That it will not have been tampered with. It's up to you to decide is foo 1.0 is something trustworthy. There's handwaving here about what foo 1.0 is defined as. But in general when you ask for X you should get exactly X no more, no less. > > holger > >> Jacob >> _______________________________________________ >> Catalog-SIG mailing list >> Catalog-SIG at python.org >> http://mail.python.org/mailman/listinfo/catalog-sig >> > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From holger at merlinux.eu Tue Mar 12 21:02:15 2013 From: holger at merlinux.eu (holger krekel) Date: Tue, 12 Mar 2013 20:02:15 +0000 Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI In-Reply-To: <513F8922.90008@egenix.com> References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com> <20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com> <513F8922.90008@egenix.com> Message-ID: <20130312200215.GN9677@merlinux.eu> On Tue, Mar 12, 2013 at 20:59 +0100, M.-A. Lemburg wrote: > On 12.03.2013 20:46, PJ Eby wrote: > > On Tue, Mar 12, 2013 at 2:07 PM, M.-A. Lemburg wrote: > >> Just a quick note (more later, if time permits)... > >> > >> On 12.03.2013 18:05, holger krekel wrote: > >>> Hi Marc-Andre, all, > >>> > >>>>> - Prepare PYPI implementation to allow a per-project "hosting mode", > >>>>> effectively enabling or disabling external crawling. When enabled > >>>>> nothing changes from the current situation of producing ``rel=download`` > >>>>> and ``rel=homepage`` attributed links on ``simple/`` pages, > >>>>> causing installers to crawl those sites. > >>>>> When disabled, the attributions of links will change > >>>>> to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to > >>>>> avoid crawling 3rd party sites. Retaining the meta-information allows > >>>>> tools to still make use of the semantic information. > >>>> > >>>> Please start using versioned APIs for these things. The > >>>> old style index should still be available under some > >>>> URL, e.g. /simple-v1/ or /v1/simple/ or /1/simple/ > >>> > >>> Not sure it is neccessary in this case. I would think it makes > >>> the implementation harder and it would probably break PEP381 (mirroring > >>> infrastructure) as well. > >> > >> Here's what I meant: > >> > >> We publish the current implementation of the /simple/ index API > >> under a new URL /simple-v1/, so that people that want to use > >> the old API can continue to do so. > > > > Do you know of anyone who's *actually* going to need/use this > > alternate API. > > I think we should establish a versioned API like that for PyPI > to make progress easier. All major web APIs use versioning > for this reason. > > Why can't they just the XML-RPC API, the DOAP API, or > > any other means of obtaining this information? > > Those cannot easily be put on the CDN and > would cause an unnecessary strain on the PyPI server. The JSON API could be put on the CDN however. > We could/should probably also make the PKG-INFO meta data file, > plus some other static information such as upload/release dates > (as RSS/Atom file) available on the /simple/ page to make this > easier to use over the CDN. That should go into a re-newed CDN PEP :) holger > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 12 2013) > >>> Python Projects, Consulting and Support ... http://www.egenix.com/ > >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > From carl at oddbird.net Tue Mar 12 21:14:59 2013 From: carl at oddbird.net (Carl Meyer) Date: Tue, 12 Mar 2013 14:14:59 -0600 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> Message-ID: <513F8CC3.2070002@oddbird.net> On 03/12/2013 01:21 PM, PJ Eby wrote: >> - In some way, migrate to a situation where the popular installer tools >> install only release files from PyPI by default, but are capable of >> installing from other locations if the user provides an option. > > Perhaps I'm confused, but ISTM that every time I've said this, Donald > and Lennart argue that it should not be possible to provide such an > option -- or to be more specific, that PyPI should not publish the > information that makes that option possible. > > If that's *not* the position they're taking, it'd be good to know, > because we could totally stop arguing about it in that case. I think there's been misunderstanding on this point. Donald and Lennart can confirm for themselves, but I don't believe _anyone_ thinks that tools should not be able to install from non-PyPI sources when explicitly requested to do so. And IIUC from your previous message, you've "already agreed to change setuptools to default this option to only allow downloads from the same host as its index URL, in a future release". So I think everyone is roughly on the same page about where we should be headed. There is disagreement about how to make that work. My point is that I don't think PyPI publishing scraped-from-metadata external links on the simple/ index specifically, in perpetuity, is necessary or even beneficial to that future state. >> A) Leave external links in the PyPI simple index, but migrate the major >> tools to not use external links by default (i.e. Philip's plan to make >> allow-hosts=pypi the default in a future setuptools), with an option to >> turn them back on. > > I don't know who has proposed this option, but it's not me. You seem > to be confusing external links and HTML-scraped links (rel="" > attributed links in /simple). No, I'm not confusing those. All I'm referring to here is where you said you've "already agreed to change setuptools to default [allow-hosts] to only allow downloads from the same host as its index URL, in a future release." Did I not characterize that accurately? > I was the first person to propose disabling HTML-scraped links from > PyPI *ASAP*. I still want them gone. That won't require tool > changes, it just requires a rollout plan. Holger has one, let's work > on that. Fully agreed. I understand from Holger that he would like his PEP to also discuss the rough plan beyond just disabling rel-link HTML scraping, for how to get to a point where the tools don't follow off-PyPI links at all by default. This second stage is what I'm talking about. > The second thing I proposed is that new tools be developed to *assist* > package authors in moving their files onto PyPI, so that future tool > changes wouldn't result in widespread instances of people needing to > set their tools to insecure settings just to get anything done. We > need to get people's files moving onto PyPI *first*, in order to make > changing the tool defaults practical. Totally agreed that such tools could be useful, I should have included that point explicitly in my summary. > The *only* thing I object to is the part where some people want to ban > external links from /simple, always and forever, regardless of the > package authors' choice in the matter. I think the question of external links in /simple is causing far more heat than it's worth (from all sides), because it's fundamentally an implementation detail, not an end in itself. Discussing the pros and cons of this implementation detail is more or less what rest is all about. >> B) Do a second PyPI migration, again with a per-package toggle and >> package owners in control, to a "no external links in simple index" setting. >> >> Consider for a moment how similar the end state here is with either A or >> B. In either case, by default users install only from PyPI, but by >> providing a special option they can install from some external source. >> (In B, that special option would be something like --find-links with a >> URL). In either case, we can continue to allow packages to register >> themselves on PyPI, be found in searches, etc, without uploading release >> files to PyPI if they prefer not to; they'll just have to provide >> special installation instructions to their users in that case. > > Not true: approach B means that you won't know what values to pass to > the option. You say below that "nobody has proposed a 'trust everything' flag." If there is no "trust everything" flag, then it seems to me that with either option A or option B the user needs to specify what they intend to trust. I.e. if you make the default value of allow-hosts the index url host, as you said you plan to do at some point, users would need to override it with the hosts they want to allow. It seems like maybe what you are wanting is automatically-discoverable installation from externally-hosted files? I.e. that I could say "easy_install Foo --allow-external", without needing to know any specific external url for Foo? This is what I was characterizing as a "trust everything" flag, but on reflection I don't think I have any problem with that. I do think that: 1) external release-file URLs should be explicitly nominated by the package owner, not automatically sucked out of text metadata. 2) (After a suitable package-owner-controlled migration) those external links should live at a new separate (machine-readable) endpoint, not the existing /simple index. This has two benefits: a) even tools that exist today eventually gain the benefit of safer-by-default installations, and b) it's simpler and more reliable for future tools to distinguish between internal and external release file links. > It's also confused about an important point. All the links that > appear in /simple are *already* completely under the package author's > control. No new switches are required to remove external links - you > can simply remove them from your releases' descriptions. This process > could be made more transparent or easy, sure -- but it's a mistake to > say that this is granting the package owners control that they don't > already have. This is partly true. An explicit flag grants package owners more control in that right now they don't have a choice about whether external links to tarballs in their long_description automatically get sucked into the simple index. This is not hypothetical; even if there were no rel-link scraping, I've had cases where package owners have complained to me about pip installing an RC tarball they had linked directly from their long-description, not intending it to be auto-installable. I think it would be preferable if in the future package owners wouldn't need to be careful what release-file links they might place in their long_description, and release files would be only explicitly nominated. I think the current "automatically suck in links to simple/" behavior is only useful as a backwards-compatibility hack, which is why I think an explicit switch to disable it (on by default for newly-registered projects, slowly, gently, carefully migrated to on for existing projects) is better than keeping this link-scraping behavior indefinitely for all projects and asking package owners to clean up their long-descriptions. > What they lack control over is the rel="" attributes, short of > removing those links entirely. That's why I've proposed having a > switch for that , as reflected in Holger's pre-PEP. I agree with this switch, but I think there is more benefit than cost in extending the concept to all automatically-sucked-in external links. >> 1) With B, we can provide a gentler migration for package owners, where >> they are in control of when the switch happens. >> >> 2) With B, all end users benefit from the new defaults, not only end >> users who update to the latest and greatest tools. >> >> 3) With B (and probably some forms of A as well), end users clearly >> state which external sources they would like to trust and install from, >> rather than having a global "trust everything!" flag, which is less >> secure and less sensible. > > These 3 statements all mischaracterize things substantially, because > none of those benefits are exclusive to A, and nobody has proposed a > "trust everything" flag. You're right that item 1 is not technically exclusive to B, although I think B makes it much easier and simpler for package owners. "Just flip a switch and done" rather than "Go clean up all your package metadata including all past releases, or trust this tool we built to go editing all your release metadata for you." I'm not even sure how that hypothetical tool would work - what exactly would it do to automatically clean up a link to an external tarball that it finds in the long_description of a release from three years ago? Just remove it? What if the package owner actually wants that link there for human use? > Removing rel="" attributes also benefits > everyone right away, *without* new tools. Sure, and I'm fully in support of that being the first stage. Carl From pje at telecommunity.com Tue Mar 12 21:16:25 2013 From: pje at telecommunity.com (PJ Eby) Date: Tue, 12 Mar 2013 16:16:25 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> Message-ID: On Tue, Mar 12, 2013 at 3:36 PM, Jacob Kaplan-Moss wrote: > On Tue, Mar 12, 2013 at 2:21 PM, PJ Eby wrote: >> The *only* thing I object to is the part where some people want to ban >> external links from /simple, always and forever, regardless of the >> package authors' choice in the matter. > > Here's the thing though, there are already a bunch of other ways users > can install packages from external repositories. I can think of at > least two: > > * I can pip/easy_install a given URL (e.g. easy_install > https://www.djangoproject.com/download/1.5/tarball/) > * I can use a custom index server (pip install -i http://localserver/ django) > > The important part is that in each of those cases I can see clearly > where I'm getting things from. > > > From where I stand the absolutely non-negotiable part is that > `pip/easy_install/whatever package` should NEVER access an external > host (after some suitable transition period). This needs to include > older installer software, and it needs to make it hard for new tools > to do the wrong thing. How this is achieved really doesn't matter to > me -- if there's a "pip install --insecure Django" that's fine too -- > but to me it's non-negotiable that the out-of-the-box configuration > not allow external hosts. I'm confused by this statement. "never access an external host" is not consistent with "have the option to specify what hosts you trust", while still keeping PyPI as a universal index of Python software. > Yes, this means taking some options away from the package creator. It > means that when I'm wearing my author-of-Django hat I can't choose to > list Django on PyPI but provide the download elsewhere. That's not > perfect, but given a "creator choice" vs "out of the box security" > choice the latter has to win. [And as a package creator I still have > options: I can run my own package server, fairly easy to do these > days.] > > Again, the *how* isn't a big deal to me, but the result is really > important: the tooling has to be secure-by-default, and that means > (among other things) `pip install package` can never hit something > that's not PyPI without me explicitly asking for it. That part's fine. As I've said repeatedly, though, it's the removing other links from the /simple index entirely that's the problem. Under what I've proposed, as soon as the tools are updated to secure-default (and the situation *now* if you set your --allow-hosts to PyPI-only), is that easy_install will announce what URLs it is skipping because they're not on PyPI. (pip too, IIUC.) I can't tell you how to configure pip for this, but if you want to configure easy_install to be secure right *now*, add: [easy_install] allow_hosts=pypi.python.org to your user-level or site-wide distutils .cfg file. Better yet, encourage other people to add it now, find out what they can no longer install, and talk to their upstream providers about moving to PyPI. This is all good. I'm just saying, we don't need to change PyPI to do anything but drop the rel="" links, and change the tools to default allow-hosts to equal index-url. (pip has the same parameters, not sure what config files it uses, though. I don't think it inherits [easy_install] settings, though.) From donald at stufft.io Tue Mar 12 21:23:14 2013 From: donald at stufft.io (Donald Stufft) Date: Tue, 12 Mar 2013 16:23:14 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <513F8CC3.2070002@oddbird.net> References: <20130310150740.GE9677@merlinux.eu> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> <513F8CC3.20 70002@oddbird.net> Message-ID: On Mar 12, 2013, at 4:14 PM, Carl Meyer wrote: > On 03/12/2013 01:21 PM, PJ Eby wrote: >>> - In some way, migrate to a situation where the popular installer tools >>> install only release files from PyPI by default, but are capable of >>> installing from other locations if the user provides an option. >> >> Perhaps I'm confused, but ISTM that every time I've said this, Donald >> and Lennart argue that it should not be possible to provide such an >> option -- or to be more specific, that PyPI should not publish the >> information that makes that option possible. >> >> If that's *not* the position they're taking, it'd be good to know, >> because we could totally stop arguing about it in that case. > > I think there's been misunderstanding on this point. Donald and Lennart > can confirm for themselves, but I don't believe _anyone_ thinks that > tools should not be able to install from non-PyPI sources when > explicitly requested to do so. And IIUC from your previous message, > you've "already agreed to change setuptools to default this option to > only allow downloads from the same host as its index URL, in a future > release". So I think everyone is roughly on the same page about where we > should be headed. I've never and I never will support a proposal that removes the end users ability to install from a non PyPI source when requested to do so. Considering I operate a non PyPI source i'm not sure how this idea started. > > There is disagreement about how to make that work. My point is that I > don't think PyPI publishing scraped-from-metadata external links on the > simple/ index specifically, in perpetuity, is necessary or even > beneficial to that future state. > >>> A) Leave external links in the PyPI simple index, but migrate the major >>> tools to not use external links by default (i.e. Philip's plan to make >>> allow-hosts=pypi the default in a future setuptools), with an option to >>> turn them back on. >> >> I don't know who has proposed this option, but it's not me. You seem >> to be confusing external links and HTML-scraped links (rel="" >> attributed links in /simple). > > No, I'm not confusing those. All I'm referring to here is where you said > you've "already agreed to change setuptools to default [allow-hosts] to > only allow downloads from the same host as its index URL, in a future > release." Did I not characterize that accurately? > >> I was the first person to propose disabling HTML-scraped links from >> PyPI *ASAP*. I still want them gone. That won't require tool >> changes, it just requires a rollout plan. Holger has one, let's work >> on that. > > Fully agreed. I understand from Holger that he would like his PEP to > also discuss the rough plan beyond just disabling rel-link HTML > scraping, for how to get to a point where the tools don't follow > off-PyPI links at all by default. This second stage is what I'm talking > about. > >> The second thing I proposed is that new tools be developed to *assist* >> package authors in moving their files onto PyPI, so that future tool >> changes wouldn't result in widespread instances of people needing to >> set their tools to insecure settings just to get anything done. We >> need to get people's files moving onto PyPI *first*, in order to make >> changing the tool defaults practical. > > Totally agreed that such tools could be useful, I should have included > that point explicitly in my summary. > >> The *only* thing I object to is the part where some people want to ban >> external links from /simple, always and forever, regardless of the >> package authors' choice in the matter. > > I think the question of external links in /simple is causing far more > heat than it's worth (from all sides), because it's fundamentally an > implementation detail, not an end in itself. Discussing the pros and > cons of this implementation detail is more or less what rest is all about. > >>> B) Do a second PyPI migration, again with a per-package toggle and >>> package owners in control, to a "no external links in simple index" setting. >>> >>> Consider for a moment how similar the end state here is with either A or >>> B. In either case, by default users install only from PyPI, but by >>> providing a special option they can install from some external source. >>> (In B, that special option would be something like --find-links with a >>> URL). In either case, we can continue to allow packages to register >>> themselves on PyPI, be found in searches, etc, without uploading release >>> files to PyPI if they prefer not to; they'll just have to provide >>> special installation instructions to their users in that case. >> >> Not true: approach B means that you won't know what values to pass to >> the option. > > You say below that "nobody has proposed a 'trust everything' flag." If > there is no "trust everything" flag, then it seems to me that with > either option A or option B the user needs to specify what they intend > to trust. I.e. if you make the default value of allow-hosts the index > url host, as you said you plan to do at some point, users would need to > override it with the hosts they want to allow. > > It seems like maybe what you are wanting is automatically-discoverable > installation from externally-hosted files? I.e. that I could say > "easy_install Foo --allow-external", without needing to know any > specific external url for Foo? > > This is what I was characterizing as a "trust everything" flag, but on > reflection I don't think I have any problem with that. I do think that: > > 1) external release-file URLs should be explicitly nominated by the > package owner, not automatically sucked out of text metadata. > > 2) (After a suitable package-owner-controlled migration) those external > links should live at a new separate (machine-readable) endpoint, not the > existing /simple index. This has two benefits: a) even tools that exist > today eventually gain the benefit of safer-by-default installations, and > b) it's simpler and more reliable for future tools to distinguish > between internal and external release file links. > >> It's also confused about an important point. All the links that >> appear in /simple are *already* completely under the package author's >> control. No new switches are required to remove external links - you >> can simply remove them from your releases' descriptions. This process >> could be made more transparent or easy, sure -- but it's a mistake to >> say that this is granting the package owners control that they don't >> already have. > > This is partly true. An explicit flag grants package owners more control > in that right now they don't have a choice about whether external links > to tarballs in their long_description automatically get sucked into the > simple index. This is not hypothetical; even if there were no rel-link > scraping, I've had cases where package owners have complained to me > about pip installing an RC tarball they had linked directly from their > long-description, not intending it to be auto-installable. > > I think it would be preferable if in the future package owners wouldn't > need to be careful what release-file links they might place in their > long_description, and release files would be only explicitly nominated. > I think the current "automatically suck in links to simple/" behavior is > only useful as a backwards-compatibility hack, which is why I think an > explicit switch to disable it (on by default for newly-registered > projects, slowly, gently, carefully migrated to on for existing > projects) is better than keeping this link-scraping behavior > indefinitely for all projects and asking package owners to clean up > their long-descriptions. > >> What they lack control over is the rel="" attributes, short of >> removing those links entirely. That's why I've proposed having a >> switch for that , as reflected in Holger's pre-PEP. > > I agree with this switch, but I think there is more benefit than cost in > extending the concept to all automatically-sucked-in external links. > >>> 1) With B, we can provide a gentler migration for package owners, where >>> they are in control of when the switch happens. >>> >>> 2) With B, all end users benefit from the new defaults, not only end >>> users who update to the latest and greatest tools. >>> >>> 3) With B (and probably some forms of A as well), end users clearly >>> state which external sources they would like to trust and install from, >>> rather than having a global "trust everything!" flag, which is less >>> secure and less sensible. >> >> These 3 statements all mischaracterize things substantially, because >> none of those benefits are exclusive to A, and nobody has proposed a >> "trust everything" flag. > > You're right that item 1 is not technically exclusive to B, although I > think B makes it much easier and simpler for package owners. "Just flip > a switch and done" rather than "Go clean up all your package metadata > including all past releases, or trust this tool we built to go editing > all your release metadata for you." I'm not even sure how that > hypothetical tool would work - what exactly would it do to automatically > clean up a link to an external tarball that it finds in the > long_description of a release from three years ago? Just remove it? What > if the package owner actually wants that link there for human use? > >> Removing rel="" attributes also benefits >> everyone right away, *without* new tools. > > Sure, and I'm fully in support of that being the first stage. > > Carl > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From jacob at jacobian.org Tue Mar 12 21:30:22 2013 From: jacob at jacobian.org (Jacob Kaplan-Moss) Date: Tue, 12 Mar 2013 15:30:22 -0500 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> Message-ID: On Tue, Mar 12, 2013 at 3:16 PM, PJ Eby wrote: > I'm confused by this statement. "never access an external host" is > not consistent with "have the option to specify what hosts you trust", > while still keeping PyPI as a universal index of Python software. Sorry to be confusing! I'm trying to make a distinction between the out-of-the-box defaults and optional... options. Here's what I mean: imagine I'm new to Python and getting started. I grab my machine, install Python (via apt-get, homebrew, from source, whatever), and grab whatever the programmer next to me at work tells me is latest and greatest in the packaging world. No configuration, no editing of a config file, no reading of documentation, just `apt-get install python python-pip` or the equivalent. Now I type `pip install Django`. Again, with no configuration, no tweaking, no editing of anything, and no real understanding of what's going on. The point I'm trying to make is that I consider it absolutely critical that this by-the-defaults approach gets me the *best* security the Python ecosystem has to offer. So this means no external packages, it also means signing and verifying once that infrastructure is in place [1]. On the other hand, the "have the option" means that `pip install ` needs to continue to work, too. Is that clear? Again I'm sorry if I'm being confusing; I think I'm having "translate from brain to keyboard" fail. > I'm just saying, we don't need to change PyPI to do anything but drop > the rel="" links, and change the tools to default allow-hosts to equal > index-url. (pip has the same parameters, not sure what config files > it uses, though. I don't think it inherits [easy_install] settings, > though.) As I've said, the implementation details aren't of a concern to me; the result is. Jacob [1] This is also an important step a bit further down the line is eliminating or drastically reducing the use of an executable setup.py. But that's another show. From jacob at jacobian.org Tue Mar 12 21:35:23 2013 From: jacob at jacobian.org (Jacob Kaplan-Moss) Date: Tue, 12 Mar 2013 15:35:23 -0500 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> Message-ID: On Tue, Mar 12, 2013 at 3:30 PM, Jacob Kaplan-Moss wrote: > As I've said, the implementation details aren't of a concern to me; > the result is. You know what though, I kinda lied. While I don't care about the implementation, I *do* care about keeping this process moving forward. Holger has a PEP that's essentially done (if controversial), and Donald's offered to implement it. The PyCon sprints next week means we'll have a ton of focused attention, so there's a very good chance if we strike now we'll have this done in the next couple weeks. So yeah, I'm going to back the proposal that has a critical mass behind it, and it solves the problem. My experience with Python packaging is that there's a massive amount of inertia, so I think it's pretty vital to get work done while there are people who've got time to work on it. Not to put too fine a point on it, but unless there's actually something really wrong with Holger's proposal I can't see why we'd want to wait for some hypothetically better solution. Jacob From pje at telecommunity.com Tue Mar 12 22:22:02 2013 From: pje at telecommunity.com (PJ Eby) Date: Tue, 12 Mar 2013 17:22:02 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <513F8CC3.2070002@oddbird.net> References: <20130310150740.GE9677@merlinux.eu> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> <513F8CC3.2070002@oddbird.net> Message-ID: On Tue, Mar 12, 2013 at 4:14 PM, Carl Meyer wrote: > You say below that "nobody has proposed a 'trust everything' flag." If > there is no "trust everything" flag, then it seems to me that with > either option A or option B the user needs to specify what they intend > to trust. I.e. if you make the default value of allow-hosts the index > url host, as you said you plan to do at some point, users would need to > override it with the hosts they want to allow. > > It seems like maybe what you are wanting is automatically-discoverable > installation from externally-hosted files? I.e. that I could say > "easy_install Foo --allow-external", without needing to know any > specific external url for Foo? > > This is what I was characterizing as a "trust everything" flag, but on > reflection I don't think I have any problem with that. Here's a story to illustrate what I mean: Joe wants to install foo. He runs "easy_install Foo". Foo is hosted externally to PyPI, so easy_install says: URL foo.com/downloads/foo-1.2.tgz BLOCKED by allow-hosts option -- install failed. (Or words to that effect; I'd have to check the source to get you the exact phrasing). The point is, Joe now *knows where to get foo from*, because PyPI still had the information. Joe can now decide whether he wants to download it manually and inspect it first, expand his allow-hosts option, or give Foo a pass. The proposals that call for banning all links from the /simple index, prevent Joe from being able to do this at all. > This is partly true. An explicit flag grants package owners more control > in that right now they don't have a choice about whether external links > to tarballs in their long_description automatically get sucked into the > simple index. This is not hypothetical; even if there were no rel-link > scraping, I've had cases where package owners have complained to me > about pip installing an RC tarball they had linked directly from their > long-description, not intending it to be auto-installable. Fair enough. Thank you for actually providing an illustration of a problem. There's been far too much handwaving of problems without any explicit description of what the problem *is*. I would support making references to external links explicit rather than implicit. > I think it would be preferable if in the future package owners wouldn't > need to be careful what release-file links they might place in their > long_description, and release files would be only explicitly nominated. Ok. > I think the current "automatically suck in links to simple/" behavior is > only useful as a backwards-compatibility hack, which is why I think an > explicit switch to disable it (on by default for newly-registered > projects, slowly, gently, carefully migrated to on for existing > projects) is better than keeping this link-scraping behavior > indefinitely for all projects and asking package owners to clean up > their long-descriptions. I would agree with dropping link parsing from the description field, provided that an alternative way is provided for projects to explicitly add external links to /simple, concurrent with the other changes. Thank you for taking the time to engage and re-engage on this issue, and to "Explain It Like I'm Five" for me, with an illustration of an actual problematic use case. ;-) From tk47 at students.poly.edu Tue Mar 12 22:10:48 2013 From: tk47 at students.poly.edu (Trishank Karthik Kuppusamy) Date: Tue, 12 Mar 2013 17:10:48 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> Message-ID: <513F99D8.7080309@students.poly.edu> Hello Jacob, Good to hear from you! Thanks for stating your concerns so clearly, and we do understand them. We agree that inertia is important to maintain. In fact, we are excited to show this in person to the PyPI community on Friday. We expect to release a design document and a demo in a few hours. Let me finish my midterm, and then I will get back to you :) Thanks, Trishank On 03/12/2013 04:35 PM, Jacob Kaplan-Moss wrote: > On Tue, Mar 12, 2013 at 3:30 PM, Jacob Kaplan-Moss wrote: >> As I've said, the implementation details aren't of a concern to me; >> the result is. > > You know what though, I kinda lied. > > While I don't care about the implementation, I *do* care about keeping > this process moving forward. Holger has a PEP that's essentially done > (if controversial), and Donald's offered to implement it. The PyCon > sprints next week means we'll have a ton of focused attention, so > there's a very good chance if we strike now we'll have this done in > the next couple weeks. > > So yeah, I'm going to back the proposal that has a critical mass > behind it, and it solves the problem. My experience with Python > packaging is that there's a massive amount of inertia, so I think it's > pretty vital to get work done while there are people who've got time > to work on it. > > Not to put too fine a point on it, but unless there's actually > something really wrong with Holger's proposal I can't see why we'd > want to wait for some hypothetically better solution. > > Jacob > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From pje at telecommunity.com Tue Mar 12 22:26:22 2013 From: pje at telecommunity.com (PJ Eby) Date: Tue, 12 Mar 2013 17:26:22 -0400 Subject: [Catalog-sig] setuptools/distribute/easy_install/pkg_resource sorting algorithm In-Reply-To: <513F893F.9010707@egenix.com> References: <513F70B5.5030501@egenix.com> <513F893F.9010707@egenix.com> Message-ID: On Tue, Mar 12, 2013 at 3:59 PM, M.-A. Lemburg wrote: > On 12.03.2013 19:15, M.-A. Lemburg wrote: >> I've run into a weird issue with easy_install, that I'm trying to solve: >> >> If I place two files named >> >> egenix_mxodbc_connect_client-2.0.2-py2.6.egg >> egenix-mxodbc-connect-client-2.0.2.win32-py2.6.prebuilt.zip >> >> into the same directory and let easy_install running on Linux >> scan this, it considers the second file for Windows as best >> match. >> >> Is the algorithm used for determining the best match documented >> somewhere ? >> >> I've had a look at the implementation, but this left me rather >> clueless. >> >> I thought that setuptools would prefer the .egg file over >> the prebuilt .zip file - binary files being easier to install >> than "source" files. > > After some experiments, I found that the follow change > in filename (swapping platform and python version, in addition > to use '-' instead of '.) works: > > egenix-mxodbc-connect-client-2.0.2-py2.6-win32.prebuilt.zip > > OTOH, this one doesn't (notice the difference ?): > > egenix-mxodbc-connect-client-2.0.2.py2.6-win32.prebuilt.zip > > The logic behind all this looks rather fragile to me. easy_install only guarantees sane version parsing for distribution files built using setuptools' naming algorithms. If you use distutils, it can only make guesses, because the distutils does not have a completely unambiguous file naming scheme. And if you are naming the files by hand, God help you. ;-) From carl at oddbird.net Tue Mar 12 22:52:54 2013 From: carl at oddbird.net (Carl Meyer) Date: Tue, 12 Mar 2013 15:52:54 -0600 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> <513F8CC3.2070002@oddbird.net> Message-ID: <513FA3B6.8000307@oddbird.net> On 03/12/2013 03:22 PM, PJ Eby wrote: > Here's a story to illustrate what I mean: > > Joe wants to install foo. He runs "easy_install Foo". Foo is hosted > externally to PyPI, so easy_install says: > > URL foo.com/downloads/foo-1.2.tgz BLOCKED by allow-hosts option -- > install failed. > > (Or words to that effect; I'd have to check the source to get you the > exact phrasing). > > The point is, Joe now *knows where to get foo from*, because PyPI > still had the information. Joe can now decide whether he wants to > download it manually and inspect it first, expand his allow-hosts > option, or give Foo a pass. > > The proposals that call for banning all links from the /simple index, > prevent Joe from being able to do this at all. Ah, thank you! Yes, I was indeed missing that mode of getting the information to the user. Makes perfect sense now. > I would support making references to external links explicit rather > than implicit. Excellent. >> I think the current "automatically suck in links to simple/" behavior is >> only useful as a backwards-compatibility hack, which is why I think an >> explicit switch to disable it (on by default for newly-registered >> projects, slowly, gently, carefully migrated to on for existing >> projects) is better than keeping this link-scraping behavior >> indefinitely for all projects and asking package owners to clean up >> their long-descriptions. > > I would agree with dropping link parsing from the description field, > provided that an alternative way is provided for projects to > explicitly add external links to /simple, concurrent with the other > changes. So the other change I proposed is that these new explicitly-nominated external links would not be added to the main simple/ index page for a project, but to a with-external-links/ sub-page that includes all links, internal and external. (This being, of course, subject to the same package-owner-controlled migration process, nothing done abruptly). The long-term benefits I see to making this tweak: 1) Users still using today's easy_install on RHEL in five years will automatically get the benefit of safe-by-default (as each package owner makes their migration) without needing to upgrade their easy_install. 2) Implementors of future installers can make explicit choices about which set of links to ask for, without every single installer needing to reimplement possibly-error-prone and possibly-subject-to-attack host-comparison code. I realize that this requires updating easy_install/pip/buildout in order to take advantage of externally-hosted files in the new system, but since end-user tooling updates are part of the plan either way, I think in the spirit of safe-by-default it's preferable to require end-user tooling updates to get access to less-safe options, rather than require end-user tooling updates in order to become safer by default. What do you think? > Thank you for taking the time to engage and re-engage on this issue, > and to "Explain It Like I'm Five" for me, with an illustration of an > actual problematic use case. ;-) Of course, and likewise; I've learned a lot from this exchange and appreciate you sticking with it and explaining things the second and third time until I got it. :-) Carl From reinout at vanrees.org Tue Mar 12 23:13:59 2013 From: reinout at vanrees.org (Reinout van Rees) Date: Tue, 12 Mar 2013 23:13:59 +0100 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <212CF2F1-C4B1-46E6-A8F5-EE819DDF8B09@mac.com> <826D31AF-BE1C-4FC3-8FF9-EAC3B7D6EA54@mac.com> Message-ID: On 11-03-13 11:44, Lennart Regebro wrote: > That's now all the energy I'm willing to spend on discussing this > topic. Third-party hosting needs to go. I believe there is a broad > consensus on this. Let's instead discuss*how* to implement it. Hear hear! I'm so fed up with other people's non-pypi hosts breaking down breaking my releases... I should not be forced to deploy some caching proxy between ohters and my releases in order to get a marginally-working system. Those that have good reasons to break everybody's build processes should take their packages elsewhere. Reinout -- Reinout van Rees http://reinout.vanrees.org/ reinout at vanrees.org http://www.nelen-schuurmans.nl/ "If you're not sure what to do, make something. -- Paul Graham" From reinout at vanrees.org Tue Mar 12 23:21:47 2013 From: reinout at vanrees.org (Reinout van Rees) Date: Tue, 12 Mar 2013 23:21:47 +0100 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <20130310150740.GE9677@merlinux.eu> <710D5A78-9784-4B00-9C55-8981AF8CA5F2@stufft.io> <20130310181828.GH9677@merlinux.eu> <20130310195405.GI9677@merlinux.eu> <1FA03AEE-4293-411F-ABA0-92AD6FCFA25E@stufft.io> <459B0AEB-6D61-4DB5-8BA3-D447A2D044C8@stufft.io> Message-ID: On 12-03-13 16:38, PJ Eby wrote: > I'll ask it again: why should*thousands* of projects be censored or > made to change their release processes, because*you* can't be > bothered to cache the distributions of the projects you depend on? So... everyone that uses pypi should be *forced* to use their own private pypi+externals cache? Otherwise they're not friendly enough to projects that don't want to use Pypi but that do it anyway? Wow, that's user friendly... Thanks! Why aren't there instructions on the front page of pypi on how to set up a private mirror of all external packages as that's obviously the professional requirement of every single person that types in "pip install"? Reinout -- Reinout van Rees http://reinout.vanrees.org/ reinout at vanrees.org http://www.nelen-schuurmans.nl/ "If you're not sure what to do, make something. -- Paul Graham" From ncoghlan at gmail.com Wed Mar 13 07:28:47 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 12 Mar 2013 23:28:47 -0700 Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI In-Reply-To: <513F8922.90008@egenix.com> References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com> <20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com> <513F8922.90008@egenix.com> Message-ID: On Tue, Mar 12, 2013 at 12:59 PM, M.-A. Lemburg wrote: > I think we should establish a versioned API like that for PyPI > to make progress easier. All major web APIs use versioning > for this reason. Why set up versioning for something we want to phase out? There will never be a simple-v3, so this is really overengineering the proposed change. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tk47 at students.poly.edu Wed Mar 13 07:41:55 2013 From: tk47 at students.poly.edu (Trishank Karthik Kuppusamy) Date: Wed, 13 Mar 2013 02:41:55 -0400 Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF Message-ID: <51401FB3.7000408@students.poly.edu> Hello everyone, I am pleased to announce our demonstration of PyPI and pip with TUF. Firstly, we solicit your thoughts and comments on our design document for integrating PyPI with TUF: https://docs.google.com/document/d/1sHMhgrGXNCvBZdmjVJzuoN5uMaUAUDWBmn3jo7vxjjw/edit?usp=sharing Secondly, you may wish to test our demo of PyPI and pip with TUF: https://github.com/dachshund/pip/wiki/pip-over-TUF Thirdly, this is how little it takes to secure pip with TUF: https://github.com/dachshund/pip/compare/develop...tuf Finally, you may be interested to learn about how one might manually secure a PyPI package index with TUF: https://github.com/dachshund/pip/wiki/PyPI-over-TUF We are excited to be able to show this to you now, and in person at our lightning talk at PyCon this Friday. We think that there is great potential for the PyPI and TUF community to work together to secure Python package management. This is just the beginning, and there is some work left to do, but we are confident that we have demonstrated to you that PyPI could be secured with TUF in the very near future. We would be happy to discuss with you how we compare with other proposals. We look forward to your questions and feedback! Thanks, Trishank From mal at egenix.com Wed Mar 13 09:23:24 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 13 Mar 2013 09:23:24 +0100 Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI In-Reply-To: References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com> <20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com> <513F8922.90008@egenix.com> Message-ID: <5140377C.90909@egenix.com> On 13.03.2013 07:28, Nick Coghlan wrote: > On Tue, Mar 12, 2013 at 12:59 PM, M.-A. Lemburg wrote: >> I think we should establish a versioned API like that for PyPI >> to make progress easier. All major web APIs use versioning >> for this reason. > > Why set up versioning for something we want to phase out? There will > never be a simple-v3, so this is really overengineering the proposed > change. Who says that we want to phase out the /simple/ index ? FWIW, I don't think that two or three small changes to the PyPI (see my email to Holger) server warrants calling this over-engineering. This is about moving forward in a backwards compatible and future proof way. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 13 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ncoghlan at gmail.com Wed Mar 13 09:09:26 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 13 Mar 2013 01:09:26 -0700 Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF In-Reply-To: <51401FB3.7000408@students.poly.edu> References: <51401FB3.7000408@students.poly.edu> Message-ID: On Tue, Mar 12, 2013 at 11:41 PM, Trishank Karthik Kuppusamy wrote: > Hello everyone, > > I am pleased to announce our demonstration of PyPI and pip with TUF. > > Firstly, we solicit your thoughts and comments on our design document for > integrating PyPI with TUF: > > https://docs.google.com/document/d/1sHMhgrGXNCvBZdmjVJzuoN5uMaUAUDWBmn3jo7vxjjw/edit?usp=sharing Thanks for putting this together! Just a few notes regarding key management: - the PSF board generally stays out of the technical details of running the python.org infrastructure, so it's likely that any root keys would be handled by the PSF infrastructure committee. A (2, 4) or (3, 5) trust configuration would likely be manageable at this level. - at the target delegation level, PyPI supports the registration of new projects through the web service (see http://docs.python.org/2/distutils/packageindex.html). If my understanding of target delegation is correct, this means the "simple" and "packages/source/" delegations will need to be (1, 1) and online. - higher levels of the target delegation hierarchy could conceivably be kept offline, but there seems little value in doing so if they're trusting on online (1, 1) key - many PyPI packages are maintained by single developers, so (1, 1) or (1, n) is likely to be the only generally feasible level of signing at the project level. With the current focus being on getting an improvement from the status quo that we can successfully deploy in a reasonable period of time, the target delegation side of things probably needs to be substantially simpler in the initial iteration. Yes, it leaves us open to certain vulnerabilities we would like to remove in the long run, but we need to be very cautious in the additional demands we place on the users uploading to PyPI. It may even mean the initial iteration allows projects to rely on a PyPI provided signing key for their TUF metadata, using the existing upload mechanisms to add the files to PyPI. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tk47 at students.poly.edu Wed Mar 13 10:13:16 2013 From: tk47 at students.poly.edu (Trishank Karthik Kuppusamy) Date: Wed, 13 Mar 2013 05:13:16 -0400 Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF In-Reply-To: References: <51401FB3.7000408@students.poly.edu> Message-ID: <5140432C.7000904@students.poly.edu> Hello Nick, On 3/13/13 4:09 AM, Nick Coghlan wrote: > > - the PSF board generally stays out of the technical details of > running the python.org infrastructure, so it's likely that any root > keys would be handled by the PSF infrastructure committee. A (2, 4) or > (3, 5) trust configuration would likely be manageable at this level. Understood. We think a higher (t, n) [where t out of n signatures are needed to trust the metadata for a role] is better for the root role simply because its crucial metadata (the authorized keys for top-level roles) should change very rarely. > - at the target delegation level, PyPI supports the registration of > new projects through the web service (see > http://docs.python.org/2/distutils/packageindex.html). If my > understanding of target delegation is correct, this means the "simple" > and "packages/source/" delegations will need to be (1, 1) and > online. > - higher levels of the target delegation hierarchy could conceivably > be kept offline, but there seems little value in doing so if they're > trusting on online (1, 1) key Fortunately, the "targets/simple" and "targets/packages/(version)/(letter)/" roles should not require (1, 1) online keys, as their metadata (simply target delegations and no actual target files) should also fluctuate fairly rarely. I should make this clearer in our design document. > - many PyPI packages are maintained by single developers, so (1, 1) or > (1, n) is likely to be the only generally feasible level of signing at > the project level. Yes, the package developers themselves could choose any (t, n) they like. In our design, we propose that PyPI could eventually delegate to "stable" packages which need little change (and use more security with more offline keys) and to "unstable" packages which need frequent change (and use less security with more online keys). > With the current focus being on getting an improvement from the status > quo that we can successfully deploy in a reasonable period of time, > the target delegation side of things probably needs to be > substantially simpler in the initial iteration. Yes, it leaves us open > to certain vulnerabilities we would like to remove in the long run, > but we need to be very cautious in the additional demands we place on > the users uploading to PyPI. It may even mean the initial iteration > allows projects to rely on a PyPI provided signing key for their TUF > metadata, using the existing upload mechanisms to add the files to > PyPI. I agree that there is a delicate problem of balancing security with usability here, especially in the beginning. You raised a very good issue there: on first migration, how would PyPI accommodate packages which have not had their target files delegated to their developers? We imagine that in this case, PyPI could assume initial responsibility for these packages, and later PyPI would delegate those packages to their respective developers. Thanks for your input, Trishank From holger at merlinux.eu Wed Mar 13 12:21:59 2013 From: holger at merlinux.eu (holger krekel) Date: Wed, 13 Mar 2013 11:21:59 +0000 Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files Message-ID: <20130313112158.GO9677@merlinux.eu> Hi all, after some more discussions and hours spend by Carl Meyer (who is now co-authoring the PEP) and me, here is a new V3 pre-submit draft. It is now more ambitious than the previous draft as should be obvious from the modified abstract (and Carl Meyers and Philip's earlier interactions on this list). There also are more details of how the current link-scraping works among other improvements and incorporations of feedback from discussions here. We intend to submit this draft tonight to the PEP editors. Feedback now and later remains welcome. I am sure there are issues to be sorted and clarified, among them the versioning-API suggestion by Marc-Andre. Thanks for everybody's support and feedback so far, holger PEP: XXX Title: Transitioning to release-file hosting on PyPI Version: $Revision$ Last-Modified: $Date$ Author: Holger Krekel , Carl Meyer Discussions-To: catalog-sig at python.org Status: Draft (PRE-submit V3) Type: Process Content-Type: text/x-rst Created: 10-Mar-2013 Post-History: Abstract ======== This PEP proposes a backward-compatible two-phase transition process to speed up, simplify and robustify installing from the pypi.python.org (PyPI) package index. To ease the transition and minimize client-side friction, **no changes to distutils or existing installation tools are required in order to benefit from the transition phases, which is to result in faster, more reliable installs for most existing packages**. The first transition phase implements easy and explicit means for a package maintainter to control which release file links are served to present-day installation tools. The first phase also includes the implementation of analysis tools for present-day packages, to support communication with package maintainers and the automated setting of default modes for controling release file links. The second transition phase will result in the current PYPI index to only serve PYPI-hosted files by default. Externally hosted files will still be automatically discoverable through a second index. Present-day installation tools will be able to continue working by specifying this second index. New versions of installation tools shall default to only install packages from PYPI unless the user explicitely wishes to include non-PYPI sites. Rationale ========= .. _history: History and motivations for external hosting -------------------------------------------- When PyPI went online, it offered release registration but had no facility to host release files itself. When hosting was added, no automated downloading tool existed yet. When Philip Eby implemented automated downloading (through setuptools), he made the choice to allow people to use download hosts of their choice. The finding of externally-hosted packages was implemented as follows: #. The PyPI ``simple/`` index for a package contains all links found anywhere in that package's metadata for any release. Links in the "Download-URL" and "Home-page" metadata fields are given ``rel=download`` and ``rel=homepage`` attributes, respectively. #. Any of these links whose target is a file whose name appears to be in the form of an installable source or binary distribution, with basename in the form "packagename-version.ARCHIVEEXT", is considered a potential installation candidate. #. Similarly, any links suffixed with an "#egg=packagename-version" fragment are considered an installation candidate. #. Additionally, the ``rel=homepage`` and ``rel=download`` links are followed and, if HTML, are themselves scraped for release-file links in the above formats. Today, most packages released on PyPI host their release files on PyPI, but a small percentage (XXX need updated data) rely on external hosting. There are many reasons [2]_ why people have chosen external hosting. To cite just a few: - release processes and scripts have been developed already and upload to external sites - it takes too long to upload large files from some places in the world - export restrictions e.g. for crypto-related software - company policies which require offering open source packages through own sites - problems with integrating uploading to PYPI into one's release process (because of release policies) - desiring download statistics different from those maintained by PyPI - perceived bad reliability of PYPI - not aware that PyPI offers file-hosting Irrespective of the present-day validity of these reasons, there clearly is a history why people choose to host files externally and it even was for some time the only way you could do things. Problem ------- **Today, python package installers (pip, easy_install, buildout, and others) often need to query many non-PyPI URLs even if there are no externally hosted files**. Apart from querying pypi.python.org's simple index pages, also all homepages and download pages ever specified with any release of a package are crawled by an installer. The need for installers to crawl external sites slows down installation and makes for a brittle and unreliable installation process. Those sites and packages also don't take part in the :pep:`381` mirroring infrastructure, further decreasing reliability and speed of automated installation processes around the world. Most packages are hosted directly on pypi.python.org [1]_. Even for these packages, installers still crawl the homepage(s) of a package. Many package uploaders are not aware that specifying the "homepage" in their release process will slow down the installation process for all users. Relying on third party sites also opens up more attack vectors for injecting malicious packages into sites using automated installs. A simple attack might just involve getting hold of an old now-unused homepage domain and placing malicious packages there. Moreover, performing a Man-in-The-Middle (MITM) attack between an installation site and any of the download sites can inject malicious packages on the installation site. As many homepages and download locations are using HTTP and not HTTPS, such attacks are not hard to launch. Such MITM attacks can easily happen even for packages which never intended to host files externally as their homepages are contacted by installers anyway. There is currently no way for package maintainers to avoid 3rd party crawling, other than removing all homepage/download url metadata for all historic releases. While a script [3]_ has been written to perform this action, it is not a good general solution because it removes semantic information like the "homepage" specification from PYPI packages. Even if the "Homepage" and "Download-URL" links were not scraped for further links, there is still no way under the current system for a package owner to link to an installable file from their package metadata without installation tools automatically considering that file a candidate for installation. Solution / two transition phases ================================ This first transition phase starts off by introducing a "hosting-mode" field for each project on PYPI, allowing explicit control of which machine-readable release file links are served to present-day installation tools. The first transition will, after successful hosting-mode manipulations of individual early-adopters, then set a default hosting mode for existing packages, based on automated anaylsis. **Maintainers will be notified one month ahead of any such automated change**. At completion of the first transition phase, **all present-day existing release and installation processes and tools are expected to continue working**. Any remaining errors or problems are expected to only relate to installation of individual packages and can be easily corrected by package maintainers or PYPI admins if maintainers are not reachable. **The second transition phase will then get PyPI, after a three month warning period, to only serve links for PyPI-hosted packages under the present-day ``simple/`` index**. At this point, present-day installation tools will not see externally hosted links anymore, unless they specify a new ``simple/-with-externals`` index which PYPI MUST offer ahead of the start of the second transition phase. This new index contains the external links as controled by a package maintainer. Moreover, PYPI MUST also provide means to register and control download links, independently from the current metadata and remote html-scraping methods. At completion of the second transition phase, all present-day installation tools will and all future installation releases SHALL default to only install PYPI-hosted packages unless a user specifies option(s) to include external links or the external index. If an installation tool chooses to use the new ``simple/-with-externals/`` as a default, it MUST warn a user with a precise messsage of which external links were followed. Maintainers of packages which currently host release files on non-PyPI sites shall receive instructions and tools to ease "re-hosting" of their historic and future package release files. The implementation of such a re-hosting tool is expected but NOT REQUIRED to be available at the beginning of phase 2. Implementation ============== The foundation of both transition phases is the introduction of three "modes" of PyPI hosting for a package, effecting which links are generated for the ``simple/`` index in transition phase 1. These modes are implemented without requiring changes to installation tools via changes to the algorithm for generating the machine-readable "/simple" index. The modes are: - ``pypi-ext-crawl``: no change from the current situation of generating machine-readable links for installation tools, as outlined in the history_. - ``pypi-ext``: for a package in this mode, the "Home-page" and "Download-url" links added to the simple index are given ``rel=ext-homepage`` and ``rel=ext-download`` attributes instead of ``rel=homepage`` and ``rel=download``. The effect of this (with no change in installation tools neccessary) is that these links will not be followed and scraped for further candidate links. Only installable files linked directly from PyPI metadata (wherever they are hosted) will be considered for installation. - ``pypi-only``: for a package in this mode, only links to URLs on PyPI itself will be added to the simple index. At the end of the warning period of transition phase 2, the ``simple/`` index will be restricted to only show links to URLs on PyPI itself while the ``simple/-with-externals`` index will during both transition phases show links to PYPI and any externals as controled by the package maintainer and the hosting-mode. For a package in ``pypi-only`` mode, external links will no longer be automatically scraped from metadata and added to the two indexes. However, PyPI will expose an interface for package maintainers to explicitly specify any number of URLs to externally hosted installable files for a given release, and these URLs will be added to the ``simple/-with-ext`` index page for that project but NOT to the basic ``simple/`` index page. Thus the ``-with-ext`` alternative index provides a means for package owners with good reason to host their packages elsewhere a means to do so (even under the ``pypi-only`` package mode) and still have that information reflected on PyPI in machine-readable form, allowing installation tool users an explicit and easy choice of whether they wish to read an index that includes externally-hosted packages or one that does not. The goal of this PEP is that eventually all projects on PyPI can be migrated to the ``pypi-only`` mode, while preserving the ability to install release files hosted from third parties in an automated manner. Deprecation of hosting-modes to eventually only allow the "pypi-only" mode is NOT REGULATED by this PEP but is expected to become feasible some time after successfull implementation of the two transition phases described in this PEP. Implementation and interaction timeline -------------------------------------------------- The proposed solution consists of multiple implementation and communication steps: #. Implement in PyPI the three modes and the ``-with-ext`` index as described above, and an interface for package owners to select the mode for each package and register explicit external file URLs for the ``-with-ext`` index (for projects in the ``pypi-only`` mode). Default all newly-registered packages to ``pypi-only`` mode (but package owners can still switch to the other modes as desired). Implement in ``pep381client`` the mirroring of the ``-with-ext`` index pages. #. Determine which packages have installable versions available that are linked only from homepage/download pages (group B) and which packages have all installable files available on PyPI itself (group A). #. Send mail to maintainers of projects in group A that their project is going to be automatically configured to ``pypi-ext`` mode in one month. Inform them that this change is not expected to affect installability of their project at all, but will result in faster and safer installs for their users. Encourage them to set this mode (or ``pypi-only``) themselves earlier to benefit their users. #. Send mail to maintainers of packages in group B that their package hosting mode is ``pypi-ext-crawl``, list the sites which currently are crawled, and suggest that they re-host their packages directly on PyPI and then switch to ``pypi-only``. Provide instructions and tools to help with this "re-uploading" process. In addition, maintainers of installation tools are asked to release two updates. The first one shall provide clear warnings if externally-hosted packages (that is, packages at a URL whose domain name differs from the domain name of the index URL in use) are selected for download, for which projects and URLS exactly this happens, and that in future versions externally-hosted downloads will be disabled by default. The second update for installation tools should change the default mode to allow only installation of package files hosted at the index domain, and allow installation of externally-hosted packages only when the user supplies an option (ideally an option specifying exactly which external domains are to be trusted as download sources). When download of an externally-hosted package is disallowed, the user should be notified, with instructions for how to make the install succeed and warnings about the potential consequences. It is expected that tools in this release may choose to change the default index url to ``https://pypi.python.org/simple/-with-ext`` in order to support explicitly-registered external URLs for projects in ``pypi-only`` mode. Tools may choose to do this only when the user requests installation of externally-hosted packages, or may choose to do this in all cases so as to be able to notify users when an externally-hosted file is available. Specific timelines for deprecation of ``pypi-ext-crawl`` and ``pypi-ext`` modes are not mandated in this PEP; this will depend on observed behavior of package owners and availability of tooling. It is expected that ``pypi-ext-crawl`` mode will be an early candidate for deprecation; it may be necessary to leave ``pypi-ext`` mode in place for quite some time, at least for those packages already depending on it (it may be removed as an option for new packages when tool support for explicit external URLs and the ``-with-ext`` index is sufficient). Open questions ============== - Should we introduce a third index which maintains the old behaviour of providing links irrespective of a maintainer's hosting-mode choice? - should we introduce some form of PYPI API versioning in this PEP? (it might complicate matters and delay the implementation but is often seen as good practise) References ========== .. [1] Donald Stufft, ratio of externally hosted versus pypi-hosted, http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html (XXX need to update this data for all easy_install-supported formats) .. [2] Marc-Andre Lemburg, reasons for external hosting, http://mail.python.org/pipermail/catalog-sig/2013-March/005626.html .. [3] Holger Krekel, Script to remove homepage/download metadata for all releases http://mail.python.org/pipermail/catalog-sig/2013-February/005423.html Acknowledgements ================ Philip Eby for precise information and the basic ideas to implement the transition via server-side changes only. Donald Stufft for pushing away from external hosting and and offering to implement both a Pull Request for the neccessary PYPI changes and the analysis tool to drive the transition phase 1. Marc-Andre Lemburg, Nick Coghlan and catalog-sig in general for thinking through issues regarding getting rid of "external hosting". Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From pje at telecommunity.com Wed Mar 13 15:26:16 2013 From: pje at telecommunity.com (PJ Eby) Date: Wed, 13 Mar 2013 10:26:16 -0400 Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files In-Reply-To: <20130313112158.GO9677@merlinux.eu> References: <20130313112158.GO9677@merlinux.eu> Message-ID: On Wed, Mar 13, 2013 at 7:21 AM, holger krekel wrote: > Hi all, > > after some more discussions and hours spend by Carl Meyer (who is now > co-authoring the PEP) and me, here is a new V3 pre-submit draft. > It is now more ambitious than the previous draft as should be obvious > from the modified abstract (and Carl Meyers and Philip's earlier > interactions on this list). There also are more details of how > the current link-scraping works among other improvements and incorporations > of feedback from discussions here. > > We intend to submit this draft tonight to the PEP editors. > > Feedback now and later remains welcome. I am sure there are issues to > be sorted and clarified, among them the versioning-API suggestion by > Marc-Andre. > > Thanks for everybody's support and feedback so far, > holger Looks good to me! Setuptools' two releases will probably look like this: 1. Default to externals index, warn when fetching URLs that are not the same host as the index 2. Default to externals index, reject URLs that are not the same host as the index unless --allow-hosts is configured (IOW, default allow-hosts to equal index-url host) That way, external URLs can still be discovered by the user, but the default configuration is still secure. From tseaver at palladion.com Wed Mar 13 17:54:04 2013 From: tseaver at palladion.com (Tres Seaver) Date: Wed, 13 Mar 2013 12:54:04 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <20130312195707.GL9677@merlinux.eu> References: <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> <20130312195707.GL9677@merlinux.eu> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/12/2013 03:57 PM, holger krekel wrote: > Nobody should be lead to think that PYPI is a trusted or reviewed > source of software even if we got rid of external hosting completely. Amen. I still boggle at the amount of "sky is falling" stuff here over MITM / external links / whatever, given the potential damaage from explicitly malicious uploads (trojans, viruses, whatever). Package signing might help here, but only for consumers who willing to think hard enough about the problem to manage a web of trust (frankly, a vanishingly small minority). And then there are these problems: - - Backward-imcompatible releases (even those which make appropriate signals in their version numbers). - - Removal of distributions / releases / projects. - - Re-upload of new distributions which sliently replace previous distributions *of the same release* ("Yes, Virginia, there are people out there who do this"). which are deal-killers for the folks who want always-on, reliable, repeatable, automatic installation from PyPI (instead of creating their own indexes). Adding HTTPS or removing external links does nothing to mitigate those issues. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iEYEARECAAYFAlFArywACgkQ+gerLs4ltQ7zLACgluGTMdUYheeMGoFgAUH1VZja VJYAnjBPXbs8yeQ1FYa0mNZhAkTlcJQf =8KSF -----END PGP SIGNATURE----- From donald at stufft.io Wed Mar 13 18:06:08 2013 From: donald at stufft.io (Donald Stufft) Date: Wed, 13 Mar 2013 13:06:08 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> <20130312195707.GL9677@merlinux.eu> Message-ID: <48C1CAC9-C80A-470A-A0FF-500391101918@stufft.io> On Mar 13, 2013, at 12:54 PM, Tres Seaver wrote: > Signed PGP part > On 03/12/2013 03:57 PM, holger krekel wrote: > > Nobody should be lead to think that PYPI is a trusted or reviewed > > source of software even if we got rid of external hosting completely. > > Amen. I still boggle at the amount of "sky is falling" stuff here over > MITM / external links / whatever, given the potential damaage from > explicitly malicious uploads (trojans, viruses, whatever). Package > signing might help here, but only for consumers who willing to think hard > enough about the problem to manage a web of trust (frankly, a vanishingly > small minority). Really now? Let's see I can easily protect against malicous uploads by only installing from trusted authors. I cannot easily prevent a MITM or a compromised external host if the tools don't protect me against it. Without the tooling and infrastructure moving to close this gap the only way to do it is to not use that tooling or infrastructure at all. Namely even if the author of the package is myself I cannot be secure installing it using the current toolchain and infrastructure unless I bend over backwards to make sure that no installable link appears anywhere in my long description, and I don't have a homepage, and I don't have a download url. > > And then there are these problems: > > - - Backward-imcompatible releases (even those which make appropriate > signals in their version numbers). > > - - Removal of distributions / releases / projects. > > - - Re-upload of new distributions which sliently replace previous > distributions *of the same release* ("Yes, Virginia, there are > people out there who do this"). > > which are deal-killers for the folks who want always-on, reliable, > repeatable, automatic installation from PyPI (instead of creating their > own indexes). > > Adding HTTPS or removing external links does nothing to mitigate those > issues. Yes there are other problems, so let's just throw our hands in the air and say fuck it instead of iteratively working to secure the system. > > > Tres. > - -- > =================================================================== > Tres Seaver +1 540-429-0999 tseaver at palladion.com > Palladion Software "Excellence by Design" http://palladion.com > > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From donald at stufft.io Wed Mar 13 18:12:59 2013 From: donald at stufft.io (Donald Stufft) Date: Wed, 13 Mar 2013 13:12:59 -0400 Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files In-Reply-To: References: <20130313112158.GO9677@merlinux.eu> Message-ID: <17BFC490-4CB2-4CE0-B946-2FDD30A34111@stufft.io> On Mar 13, 2013, at 10:26 AM, PJ Eby wrote: > On Wed, Mar 13, 2013 at 7:21 AM, holger krekel wrote: >> Hi all, >> >> after some more discussions and hours spend by Carl Meyer (who is now >> co-authoring the PEP) and me, here is a new V3 pre-submit draft. >> It is now more ambitious than the previous draft as should be obvious >> from the modified abstract (and Carl Meyers and Philip's earlier >> interactions on this list). There also are more details of how >> the current link-scraping works among other improvements and incorporations >> of feedback from discussions here. >> >> We intend to submit this draft tonight to the PEP editors. >> >> Feedback now and later remains welcome. I am sure there are issues to >> be sorted and clarified, among them the versioning-API suggestion by >> Marc-Andre. >> >> Thanks for everybody's support and feedback so far, >> holger > > Looks good to me! > > Setuptools' two releases will probably look like this: > > 1. Default to externals index, warn when fetching URLs that are not > the same host as the index > 2. Default to externals index, reject URLs that are not the same host > as the index unless --allow-hosts is configured (IOW, default > allow-hosts to equal index-url host) > > That way, external URLs can still be discovered by the user, but the > default configuration is still secure. > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig For the record I support the PEP and these 2 steps sound ok to me. My only suggestion is an additional rel attribute for indexes to indicate this is index hosted file incase the index domain and the package host domain differ (as is the case with Crate). ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From tseaver at palladion.com Wed Mar 13 18:21:45 2013 From: tseaver at palladion.com (Tres Seaver) Date: Wed, 13 Mar 2013 13:21:45 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: <48C1CAC9-C80A-470A-A0FF-500391101918@stufft.io> References: <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> <20130312195707.GL9677@merlinux.eu> <48C1CAC9-C80A-470A-A0FF-500391101918@stufft.io> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/13/2013 01:06 PM, Donald Stufft wrote: > Really now? Let's see I can easily protect against malicous uploads > by only installing from trusted authors How do you know who to trust? What if an author you trust adds a dependency to a package to an author you have no konwledege of, or one you actively distrust? What if an author you trust commits one of the other changes I outlined (removes a release / distribution, makes backward-incompatible changes, re-uploads a changed distribution over an existing one?) The only way to implement "only install from trusted authors" is to run your own index, and explicitly review / curate the package set maintained there. In that scenario, you run a script from time to time which looks for new versions of your packages on PyPI and puts them into a queue for review. Bob, a casual reviewer, might install the new verison from PyPI into a fresh virtualenv and test it there before pushing it into the curated index. Carol, more pranoid^Wsecurity mindex, downloads the package, verifies its signature, unpacks the tarball, diffs it against the curated version, compares that diff against the changelog, looks at new / changed dependencies, and installs it into a hardened sandbox for testing. Only after that kind of review does she push the newly-reviewed distribution into the curated index. Adding an entirely new package to the curated index is a similar process, but requires more effort from either Bob or Carol. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iEYEARECAAYFAlFAtakACgkQ+gerLs4ltQ5O4wCcC92ew66wVGEPBM/Jr8z1bYU8 e9AAoNXmaiuBHQOIFQlT0SRemI43hoG7 =idDp -----END PGP SIGNATURE----- From donald at stufft.io Wed Mar 13 18:34:45 2013 From: donald at stufft.io (Donald Stufft) Date: Wed, 13 Mar 2013 13:34:45 -0400 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> <20130312195707.GL9677@merlinux.eu> <48C1CAC9-C80A-470A-A0FF-500391101918@stufft.io> Message-ID: <7141E066-8DD0-49FE-BA28-DBCF81F37465@stufft.io> On Mar 13, 2013, at 1:21 PM, Tres Seaver wrote: > Signed PGP part > On 03/13/2013 01:06 PM, Donald Stufft wrote: > > Really now? Let's see I can easily protect against malicous uploads > > by only installing from trusted authors > > How do you know who to trust? What if an author you trust adds a > dependency to a package to an author you have no konwledege of, or one > you actively distrust? What if an author you trust commits one of the > other changes I outlined (removes a release / distribution, makes > backward-incompatible changes, re-uploads a changed distribution over an > existing one?) > > The only way to implement "only install from trusted authors" is to run > your own index, and explicitly review / curate the package set maintained > there. In that scenario, you run a script from time to time which looks > for new versions of your packages on PyPI and puts them into a queue for > review. > > Bob, a casual reviewer, might install the new verison from PyPI into a > fresh virtualenv and test it there before pushing it into the curated index. > > Carol, more pranoid^Wsecurity mindex, downloads the package, verifies its > signature, unpacks the tarball, diffs it against the curated version, > compares that diff against the changelog, looks at new / changed > dependencies, and installs it into a hardened sandbox for testing. Only > after that kind of review does she push the newly-reviewed distribution > into the curated index. > > Adding an entirely new package to the curated index is a similar process, > but requires more effort from either Bob or Carol. > > > Tres. > - -- > =================================================================== > Tres Seaver +1 540-429-0999 tseaver at palladion.com > Palladion Software "Excellence by Design" http://palladion.com > > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig Threat models are a thing. It the way it *should* work in PyPI is you ask for X, you get X and it was not modified in transit (and ideally not on the repository as well but that is more difficult). PyPI is not and will never be a curated index. However if I trust Author A, then I implicity trust his actions. I trust that he won't do your stated issues. Now is a curated index *more secure*? Well again it depends on what your threat model is. PyPI isn't going to protect you from a malicious or incompetent author. For the threat model that PyPI is able to deliver on your system is no more or less secure. In fact without the sort of things you dismiss here your proposal is also just as insecure unless you only ever access it on a protected network which you can be sure no attacker has gained access too. Even your 3 issues are far less concerning than the fact MiTM on either PyPI (fixed now with pip 1.3) or an external url allows a random guy at PyCon to execute arbitrary code on your machine if you install a package from PyPI at pycon, or at a coffee shop, or on any wifi ever that could have someone else on it. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From robertc at robertcollins.net Wed Mar 13 18:41:33 2013 From: robertc at robertcollins.net (Robert Collins) Date: Thu, 14 Mar 2013 06:41:33 +1300 Subject: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site In-Reply-To: References: <513F5596.5090302@egenix.com> <513F718D.4040307@oddbird.net> <20130312195707.GL9677@merlinux.eu> Message-ID: On 14 March 2013 05:54, Tres Seaver wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 03/12/2013 03:57 PM, holger krekel wrote: >> Nobody should be lead to think that PYPI is a trusted or reviewed >> source of software even if we got rid of external hosting completely. > > Amen. I still boggle at the amount of "sky is falling" stuff here over > MITM / external links / whatever, given the potential damaage from > explicitly malicious uploads (trojans, viruses, whatever). Package > signing might help here, but only for consumers who willing to think hard > enough about the problem to manage a web of trust (frankly, a vanishingly > small minority). Well yes HTTPS and external links are problems which it is necessary to solve, and not sufficient to make 'pypi secure' - but that doesn't mean we should do a poor job solving them. -Rob -- Robert Collins Distinguished Technologist HP Cloud Services From dholth at gmail.com Wed Mar 13 19:15:16 2013 From: dholth at gmail.com (Daniel Holth) Date: Wed, 13 Mar 2013 14:15:16 -0400 Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF In-Reply-To: <5140432C.7000904@students.poly.edu> References: <51401FB3.7000408@students.poly.edu> <5140432C.7000904@students.poly.edu> Message-ID: On Wed, Mar 13, 2013 at 5:13 AM, Trishank Karthik Kuppusamy wrote: > Hello Nick, > > > On 3/13/13 4:09 AM, Nick Coghlan wrote: >> >> >> - the PSF board generally stays out of the technical details of >> running the python.org infrastructure, so it's likely that any root >> keys would be handled by the PSF infrastructure committee. A (2, 4) or >> (3, 5) trust configuration would likely be manageable at this level. > > > Understood. We think a higher (t, n) [where t out of n signatures are needed > to trust the metadata for a role] is better for the root role simply because > its crucial metadata (the authorized keys for top-level roles) should change > very rarely. > > >> - at the target delegation level, PyPI supports the registration of >> new projects through the web service (see >> http://docs.python.org/2/distutils/packageindex.html). If my >> understanding of target delegation is correct, this means the "simple" >> and "packages/source/" delegations will need to be (1, 1) and >> online. >> - higher levels of the target delegation hierarchy could conceivably >> be kept offline, but there seems little value in doing so if they're >> trusting on online (1, 1) key > > > Fortunately, the "targets/simple" and "targets/packages/(version)/(letter)/" > roles should not require (1, 1) online keys, as their metadata (simply > target delegations and no actual target files) should also fluctuate fairly > rarely. I should make this clearer in our design document. > > >> - many PyPI packages are maintained by single developers, so (1, 1) or >> (1, n) is likely to be the only generally feasible level of signing at >> the project level. > > > Yes, the package developers themselves could choose any (t, n) they like. In > our design, we propose that PyPI could eventually delegate to "stable" > packages which need little change (and use more security with more offline > keys) and to "unstable" packages which need frequent change (and use less > security with more online keys). > > >> With the current focus being on getting an improvement from the status >> quo that we can successfully deploy in a reasonable period of time, >> the target delegation side of things probably needs to be >> substantially simpler in the initial iteration. Yes, it leaves us open >> to certain vulnerabilities we would like to remove in the long run, >> but we need to be very cautious in the additional demands we place on >> the users uploading to PyPI. It may even mean the initial iteration >> allows projects to rely on a PyPI provided signing key for their TUF >> metadata, using the existing upload mechanisms to add the files to >> PyPI. > > > I agree that there is a delicate problem of balancing security with > usability here, especially in the beginning. > > You raised a very good issue there: on first migration, how would PyPI > accommodate packages which have not had their target files delegated to > their developers? We imagine that in this case, PyPI could assume initial > responsibility for these packages, and later PyPI would delegate those > packages to their respective developers. > > Thanks for your input, > Trishank With all the different kinds of metadata, It's interesting to note that currently TUF seems to only be concerned with the available file names and their integrity. (Some of us will think of PEP 426 "PKG-INFO" first when we hear the word metadata.) It looks like the D metadata lists all the filenames for Django, and then Django lists them again with hashes and signatures. Why all the lists? Does every Django release re-assert all the versions of Django that are available on the index? How might I deal with producing the official source distribution myself and having a friend produce the official Windows build of a package? As an aside PyPI has been doubling in size every 1.5 - 2 years. Thanks Daniel Holth From jcappos at poly.edu Wed Mar 13 19:29:49 2013 From: jcappos at poly.edu (Justin Cappos) Date: Wed, 13 Mar 2013 14:29:49 -0400 Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF In-Reply-To: References: <51401FB3.7000408@students.poly.edu> <5140432C.7000904@students.poly.edu> Message-ID: We may have something unclear in the doc. We definitely don't just worry about package names. (In between meetings, will send a longer response in a bit.) Thanks, Justin On Wed, Mar 13, 2013 at 2:15 PM, Daniel Holth wrote: > On Wed, Mar 13, 2013 at 5:13 AM, Trishank Karthik Kuppusamy > wrote: > > Hello Nick, > > > > > > On 3/13/13 4:09 AM, Nick Coghlan wrote: > >> > >> > >> - the PSF board generally stays out of the technical details of > >> running the python.org infrastructure, so it's likely that any root > >> keys would be handled by the PSF infrastructure committee. A (2, 4) or > >> (3, 5) trust configuration would likely be manageable at this level. > > > > > > Understood. We think a higher (t, n) [where t out of n signatures are > needed > > to trust the metadata for a role] is better for the root role simply > because > > its crucial metadata (the authorized keys for top-level roles) should > change > > very rarely. > > > > > >> - at the target delegation level, PyPI supports the registration of > >> new projects through the web service (see > >> http://docs.python.org/2/distutils/packageindex.html). If my > >> understanding of target delegation is correct, this means the "simple" > >> and "packages/source/" delegations will need to be (1, 1) and > >> online. > >> - higher levels of the target delegation hierarchy could conceivably > >> be kept offline, but there seems little value in doing so if they're > >> trusting on online (1, 1) key > > > > > > Fortunately, the "targets/simple" and > "targets/packages/(version)/(letter)/" > > roles should not require (1, 1) online keys, as their metadata (simply > > target delegations and no actual target files) should also fluctuate > fairly > > rarely. I should make this clearer in our design document. > > > > > >> - many PyPI packages are maintained by single developers, so (1, 1) or > >> (1, n) is likely to be the only generally feasible level of signing at > >> the project level. > > > > > > Yes, the package developers themselves could choose any (t, n) they > like. In > > our design, we propose that PyPI could eventually delegate to "stable" > > packages which need little change (and use more security with more > offline > > keys) and to "unstable" packages which need frequent change (and use less > > security with more online keys). > > > > > >> With the current focus being on getting an improvement from the status > >> quo that we can successfully deploy in a reasonable period of time, > >> the target delegation side of things probably needs to be > >> substantially simpler in the initial iteration. Yes, it leaves us open > >> to certain vulnerabilities we would like to remove in the long run, > >> but we need to be very cautious in the additional demands we place on > >> the users uploading to PyPI. It may even mean the initial iteration > >> allows projects to rely on a PyPI provided signing key for their TUF > >> metadata, using the existing upload mechanisms to add the files to > >> PyPI. > > > > > > I agree that there is a delicate problem of balancing security with > > usability here, especially in the beginning. > > > > You raised a very good issue there: on first migration, how would PyPI > > accommodate packages which have not had their target files delegated to > > their developers? We imagine that in this case, PyPI could assume initial > > responsibility for these packages, and later PyPI would delegate those > > packages to their respective developers. > > > > Thanks for your input, > > Trishank > > With all the different kinds of metadata, It's interesting to note > that currently TUF seems to only be concerned with the available file > names and their integrity. (Some of us will think of PEP 426 > "PKG-INFO" first when we hear the word metadata.) > > It looks like the D metadata lists all the filenames for Django, and > then Django lists them again with hashes and signatures. Why all the > lists? Does every Django release re-assert all the versions of Django > that are available on the index? > > How might I deal with producing the official source distribution > myself and having a friend produce the official Windows build of a > package? > > As an aside PyPI has been doubling in size every 1.5 - 2 years. > > Thanks > > Daniel Holth > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Wed Mar 13 19:57:58 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 13 Mar 2013 19:57:58 +0100 Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files In-Reply-To: <20130313112158.GO9677@merlinux.eu> References: <20130313112158.GO9677@merlinux.eu> Message-ID: <5140CC36.10807@egenix.com> On 13.03.2013 12:21, holger krekel wrote: > Hi all, > > after some more discussions and hours spend by Carl Meyer (who is now > co-authoring the PEP) and me, here is a new V3 pre-submit draft. > It is now more ambitious than the previous draft as should be obvious > from the modified abstract (and Carl Meyers and Philip's earlier > interactions on this list). There also are more details of how > the current link-scraping works among other improvements and incorporations > of feedback from discussions here. > > We intend to submit this draft tonight to the PEP editors. > > Feedback now and later remains welcome. I am sure there are issues to > be sorted and clarified, among them the versioning-API suggestion by > Marc-Andre. > > Thanks for everybody's support and feedback so far, > holger > > > PEP: XXX > Title: Transitioning to release-file hosting on PyPI > Version: $Revision$ > Last-Modified: $Date$ > Author: Holger Krekel , Carl Meyer > Discussions-To: catalog-sig at python.org > Status: Draft (PRE-submit V3) > Type: Process > Content-Type: text/x-rst > Created: 10-Mar-2013 > Post-History: > > > Abstract > ======== > > This PEP proposes a backward-compatible two-phase transition process to speed > up, simplify and robustify installing from the pypi.python.org (PyPI) > package index. To ease the transition and minimize client-side > friction, **no changes to distutils or existing installation tools are > required in order to benefit from the transition phases, which is to > result in faster, more reliable installs for most existing packages**. > > The first transition phase implements easy and explicit means for > a package maintainter to control which release file links are > served to present-day installation tools. The first phase also > includes the implementation of analysis tools for present-day packages, > to support communication with package maintainers and the automated > setting of default modes for controling release file links. > > The second transition phase will result in the current PYPI index > to only serve PYPI-hosted files by default. Externally hosted files > will still be automatically discoverable through a second index. > Present-day installation tools will be able to continue working > by specifying this second index. New versions of installation > tools shall default to only install packages from PYPI unless > the user explicitely wishes to include non-PYPI sites. I must say, don't like this change in motivation compared to V1 and V2. The original of the discussion was to make PyPI more secure and the installation process faster and more reliable by moving away from crawling arbitrary external web pages. Both can be had by: * limiting the crawling to package author defined specific URLs, with added hashes to make sure that the URLs and their target content is not modified (this is the securing external downloads part - see here for an example: https://pypi.python.org/pypi/egenix-pyopenssl/0.13.1.1.0.1.5), and * adding a way for the package authors to say "PyPI, please go ahead and cache/copy my distributions files" (this is the increase download reliability part - can be had by doing opt-in CDN caching/proxying of external links via PyPI) Now, with V3 of the proposal, you are moving towards a system that basically says "do it this way, or stay out of our eco system", which, in my book, is not what the Python eco system is all about. Your V2 was much more inviting in this respect. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 13 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From jcappos at poly.edu Wed Mar 13 19:58:31 2013 From: jcappos at poly.edu (Justin Cappos) Date: Wed, 13 Mar 2013 14:58:31 -0400 Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF In-Reply-To: References: <51401FB3.7000408@students.poly.edu> <5140432C.7000904@students.poly.edu> Message-ID: We use the simple directory and filenames because that is what pip uses. You have a nice suggestion to include other metadata in the TUF metadata. We certainly could do this if desirable. This required a redesign of the PyPI API and we weren't sure if this was wanted. Our current doc / prototype is trying to minimize the changes needed all around. Thanks, Justin On Wed, Mar 13, 2013 at 2:15 PM, Daniel Holth wrote: > On Wed, Mar 13, 2013 at 5:13 AM, Trishank Karthik Kuppusamy > wrote: > > Hello Nick, > > > > > > On 3/13/13 4:09 AM, Nick Coghlan wrote: > >> > >> > >> - the PSF board generally stays out of the technical details of > >> running the python.org infrastructure, so it's likely that any root > >> keys would be handled by the PSF infrastructure committee. A (2, 4) or > >> (3, 5) trust configuration would likely be manageable at this level. > > > > > > Understood. We think a higher (t, n) [where t out of n signatures are > needed > > to trust the metadata for a role] is better for the root role simply > because > > its crucial metadata (the authorized keys for top-level roles) should > change > > very rarely. > > > > > >> - at the target delegation level, PyPI supports the registration of > >> new projects through the web service (see > >> http://docs.python.org/2/distutils/packageindex.html). If my > >> understanding of target delegation is correct, this means the "simple" > >> and "packages/source/" delegations will need to be (1, 1) and > >> online. > >> - higher levels of the target delegation hierarchy could conceivably > >> be kept offline, but there seems little value in doing so if they're > >> trusting on online (1, 1) key > > > > > > Fortunately, the "targets/simple" and > "targets/packages/(version)/(letter)/" > > roles should not require (1, 1) online keys, as their metadata (simply > > target delegations and no actual target files) should also fluctuate > fairly > > rarely. I should make this clearer in our design document. > > > > > >> - many PyPI packages are maintained by single developers, so (1, 1) or > >> (1, n) is likely to be the only generally feasible level of signing at > >> the project level. > > > > > > Yes, the package developers themselves could choose any (t, n) they > like. In > > our design, we propose that PyPI could eventually delegate to "stable" > > packages which need little change (and use more security with more > offline > > keys) and to "unstable" packages which need frequent change (and use less > > security with more online keys). > > > > > >> With the current focus being on getting an improvement from the status > >> quo that we can successfully deploy in a reasonable period of time, > >> the target delegation side of things probably needs to be > >> substantially simpler in the initial iteration. Yes, it leaves us open > >> to certain vulnerabilities we would like to remove in the long run, > >> but we need to be very cautious in the additional demands we place on > >> the users uploading to PyPI. It may even mean the initial iteration > >> allows projects to rely on a PyPI provided signing key for their TUF > >> metadata, using the existing upload mechanisms to add the files to > >> PyPI. > > > > > > I agree that there is a delicate problem of balancing security with > > usability here, especially in the beginning. > > > > You raised a very good issue there: on first migration, how would PyPI > > accommodate packages which have not had their target files delegated to > > their developers? We imagine that in this case, PyPI could assume initial > > responsibility for these packages, and later PyPI would delegate those > > packages to their respective developers. > > > > Thanks for your input, > > Trishank > > With all the different kinds of metadata, It's interesting to note > that currently TUF seems to only be concerned with the available file > names and their integrity. (Some of us will think of PEP 426 > "PKG-INFO" first when we hear the word metadata.) > > It looks like the D metadata lists all the filenames for Django, and > then Django lists them again with hashes and signatures. Why all the > lists? Does every Django release re-assert all the versions of Django > that are available on the index? > > How might I deal with producing the official source distribution > myself and having a friend produce the official Windows build of a > package? > > As an aside PyPI has been doubling in size every 1.5 - 2 years. > > Thanks > > Daniel Holth > -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Wed Mar 13 20:08:32 2013 From: donald at stufft.io (Donald Stufft) Date: Wed, 13 Mar 2013 15:08:32 -0400 Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files In-Reply-To: <5140CC36.10807@egenix.com> References: <20130313112158.GO9677@merlinux.eu> <5140CC36.10807@egenix.com> Message-ID: <8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io> On Mar 13, 2013, at 2:57 PM, "M.-A. Lemburg" wrote: > On 13.03.2013 12:21, holger krekel wrote: >> Hi all, >> >> after some more discussions and hours spend by Carl Meyer (who is now >> co-authoring the PEP) and me, here is a new V3 pre-submit draft. >> It is now more ambitious than the previous draft as should be obvious >> from the modified abstract (and Carl Meyers and Philip's earlier >> interactions on this list). There also are more details of how >> the current link-scraping works among other improvements and incorporations >> of feedback from discussions here. >> >> We intend to submit this draft tonight to the PEP editors. >> >> Feedback now and later remains welcome. I am sure there are issues to >> be sorted and clarified, among them the versioning-API suggestion by >> Marc-Andre. >> >> Thanks for everybody's support and feedback so far, >> holger >> >> >> PEP: XXX >> Title: Transitioning to release-file hosting on PyPI >> Version: $Revision$ >> Last-Modified: $Date$ >> Author: Holger Krekel , Carl Meyer >> Discussions-To: catalog-sig at python.org >> Status: Draft (PRE-submit V3) >> Type: Process >> Content-Type: text/x-rst >> Created: 10-Mar-2013 >> Post-History: >> >> >> Abstract >> ======== >> >> This PEP proposes a backward-compatible two-phase transition process to speed >> up, simplify and robustify installing from the pypi.python.org (PyPI) >> package index. To ease the transition and minimize client-side >> friction, **no changes to distutils or existing installation tools are >> required in order to benefit from the transition phases, which is to >> result in faster, more reliable installs for most existing packages**. >> >> The first transition phase implements easy and explicit means for >> a package maintainter to control which release file links are >> served to present-day installation tools. The first phase also >> includes the implementation of analysis tools for present-day packages, >> to support communication with package maintainers and the automated >> setting of default modes for controling release file links. >> >> The second transition phase will result in the current PYPI index >> to only serve PYPI-hosted files by default. Externally hosted files >> will still be automatically discoverable through a second index. >> Present-day installation tools will be able to continue working >> by specifying this second index. New versions of installation >> tools shall default to only install packages from PYPI unless >> the user explicitely wishes to include non-PYPI sites. > > I must say, don't like this change in motivation compared > to V1 and V2. > > The original of the discussion was to make PyPI more secure > and the installation process faster and more reliable > by moving away from crawling arbitrary external web pages. > > Both can be had by: > > * limiting the crawling to package author defined specific > URLs, with added hashes to make sure that the URLs and > their target content is not modified (this is the securing > external downloads part - see here for an example: > https://pypi.python.org/pypi/egenix-pyopenssl/0.13.1.1.0.1.5), > and > > * adding a way for the package authors to say "PyPI, please go > ahead and cache/copy my distributions files" (this is the > increase download reliability part - can be had by doing > opt-in CDN caching/proxying of external links via PyPI) > > Now, with V3 of the proposal, you are moving towards a system > that basically says "do it this way, or stay out of our eco > system", which, in my book, is not what the Python eco system > is all about. > I don't see how? The -with-externals index will still contain all the existing links, and indeed PJ Elby has already stated that setuptools will move to support this index by default but with proper warnings to people so they know they are installing a package off site. This allows existing tools to be moved to a secure by default position. Allows future tools to choose if they want to enable the existing behavior through use of -with-externals (hopefully with a warning or opt-in sort of thing like laid out by PJE, but it's certainly not required). And even allows users of existing tools to opt into the old behavior via the -i option. Maybe i'm missing it but in what way does this force authors to "do it this way or stay out of our eco system" since all the same options are available as there are today? > Your V2 was much more inviting in this respect. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 13 2013) >>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From mal at egenix.com Wed Mar 13 20:33:36 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 13 Mar 2013 20:33:36 +0100 Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files In-Reply-To: <8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io> References: <20130313112158.GO9677@merlinux.eu> <5140CC36.10807@egenix.com> <8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io> Message-ID: <5140D490.3040401@egenix.com> On 13.03.2013 20:08, Donald Stufft wrote: > > On Mar 13, 2013, at 2:57 PM, "M.-A. Lemburg" wrote: > >> On 13.03.2013 12:21, holger krekel wrote: >>> [V3 proposal] >> >> I must say, don't like this change in motivation compared >> to V1 and V2. >> >> The original of the discussion was to make PyPI more secure >> and the installation process faster and more reliable >> by moving away from crawling arbitrary external web pages. >> >> Both can be had by: >> >> * limiting the crawling to package author defined specific >> URLs, with added hashes to make sure that the URLs and >> their target content is not modified (this is the securing >> external downloads part - see here for an example: >> https://pypi.python.org/pypi/egenix-pyopenssl/0.13.1.1.0.1.5), >> and >> >> * adding a way for the package authors to say "PyPI, please go >> ahead and cache/copy my distributions files" (this is the >> increase download reliability part - can be had by doing >> opt-in CDN caching/proxying of external links via PyPI) >> >> Now, with V3 of the proposal, you are moving towards a system >> that basically says "do it this way, or stay out of our eco >> system", which, in my book, is not what the Python eco system >> is all about. >> > > I don't see how? The -with-externals index will still contain all the existing links, and indeed PJ Elby has already stated that setuptools will move to support this index by default but with proper warnings to people so they know they are installing a package off site. > This allows existing tools to be moved to a secure by default position. Allows future tools to choose if they want to enable the existing behavior through use of -with-externals (hopefully with a warning or opt-in sort of thing like laid out by PJE, but it's certainly not required). And even allows users of existing tools to opt into the old behavior via the -i option. > > Maybe i'm missing it but in what way does this force authors to "do it this way or stay out of our eco system" since all the same options are available as there are today? The proposal marks all external links as evil, and instead of making external links more secure, the user is left with the option to either not enable external links at all, or to let the "devil" in :-) That's not nice. It's also security theater. The real problem is unreviewed code getting executed by users, or worse, automated build systems. Yet, we let users believe that everything is secured on PyPI. Taking an extreme position, it would probably be better just leave everything as it is and instead educate users about the risk they are taking with a "pip install AngryBirds", signed with keys issued by the PSF on the official PyPI server, delivered straight to your drive via the latest in crypto technology, only to wipe your notebook... But then, I don't like extreme positions, so would rather like to incrementally improve the situation both from the server and the client side, both addressing user and author concerns, and keeping the Python eco system a friendly place to be. >> Your V2 was much more inviting in this respect. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 13 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From dholth at gmail.com Wed Mar 13 20:40:08 2013 From: dholth at gmail.com (Daniel Holth) Date: Wed, 13 Mar 2013 15:40:08 -0400 Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files In-Reply-To: <5140D490.3040401@egenix.com> References: <20130313112158.GO9677@merlinux.eu> <5140CC36.10807@egenix.com> <8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io> <5140D490.3040401@egenix.com> Message-ID: On Wed, Mar 13, 2013 at 3:33 PM, M.-A. Lemburg wrote: > On 13.03.2013 20:08, Donald Stufft wrote: >> >> On Mar 13, 2013, at 2:57 PM, "M.-A. Lemburg" wrote: >> >>> On 13.03.2013 12:21, holger krekel wrote: >>>> [V3 proposal] >>> >>> I must say, don't like this change in motivation compared >>> to V1 and V2. >>> >>> The original of the discussion was to make PyPI more secure >>> and the installation process faster and more reliable >>> by moving away from crawling arbitrary external web pages. >>> >>> Both can be had by: >>> >>> * limiting the crawling to package author defined specific >>> URLs, with added hashes to make sure that the URLs and >>> their target content is not modified (this is the securing >>> external downloads part - see here for an example: >>> https://pypi.python.org/pypi/egenix-pyopenssl/0.13.1.1.0.1.5), >>> and >>> >>> * adding a way for the package authors to say "PyPI, please go >>> ahead and cache/copy my distributions files" (this is the >>> increase download reliability part - can be had by doing >>> opt-in CDN caching/proxying of external links via PyPI) >>> >>> Now, with V3 of the proposal, you are moving towards a system >>> that basically says "do it this way, or stay out of our eco >>> system", which, in my book, is not what the Python eco system >>> is all about. >>> >> >> I don't see how? The -with-externals index will still contain all the existing links, and indeed PJ Elby has already stated that setuptools will move to support this index by default but with proper warnings to people so they know they are installing a package off site. > >> This allows existing tools to be moved to a secure by default position. Allows future tools to choose if they want to enable the existing behavior through use of -with-externals (hopefully with a warning or opt-in sort of thing like laid out by PJE, but it's certainly not required). And even allows users of existing tools to opt into the old behavior via the -i option. >> >> Maybe i'm missing it but in what way does this force authors to "do it this way or stay out of our eco system" since all the same options are available as there are today? > > The proposal marks all external links as evil, and instead of > making external links more secure, the user is left with the option > to either not enable external links at all, or to let the > "devil" in :-) > > That's not nice. It's also security theater. > > The real problem is unreviewed code getting executed by users, > or worse, automated build systems. Yet, we let users believe > that everything is secured on PyPI. > > Taking an extreme position, it would probably be better just > leave everything as it is and instead educate users about the > risk they are taking with a "pip install AngryBirds", signed > with keys issued by the PSF on the official PyPI server, > delivered straight to your drive via the latest in crypto > technology, only to wipe your notebook... > > But then, I don't like extreme positions, so would rather > like to incrementally improve the situation both from the > server and the client side, both addressing user and author > concerns, and keeping the Python eco system a friendly place > to be. > >>> Your V2 was much more inviting in this respect. Perhaps it would be better to decide whether it is "reliability theater" and concentrate on consistency rather than whether the code actually does what you want. It is nice to have a system that at least prevents targeted third party bad-package attacks. From donald at stufft.io Wed Mar 13 20:46:37 2013 From: donald at stufft.io (Donald Stufft) Date: Wed, 13 Mar 2013 15:46:37 -0400 Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files In-Reply-To: <5140D490.3040401@egenix.com> References: <20130313112158.GO9677@merlinux.eu> <5140CC36.10807@egenix.com> <8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io> <5140D490.3040401@egenix.com> Message-ID: <8507A01C-D6C6-49C2-82D7-C3B48EDF16FF@stufft.io> On Mar 13, 2013, at 3:33 PM, "M.-A. Lemburg" wrote: > On 13.03.2013 20:08, Donald Stufft wrote: >> >> On Mar 13, 2013, at 2:57 PM, "M.-A. Lemburg" wrote: >> >>> On 13.03.2013 12:21, holger krekel wrote: >>>> [V3 proposal] >>> >>> I must say, don't like this change in motivation compared >>> to V1 and V2. >>> >>> The original of the discussion was to make PyPI more secure >>> and the installation process faster and more reliable >>> by moving away from crawling arbitrary external web pages. >>> >>> Both can be had by: >>> >>> * limiting the crawling to package author defined specific >>> URLs, with added hashes to make sure that the URLs and >>> their target content is not modified (this is the securing >>> external downloads part - see here for an example: >>> https://pypi.python.org/pypi/egenix-pyopenssl/0.13.1.1.0.1.5), >>> and >>> >>> * adding a way for the package authors to say "PyPI, please go >>> ahead and cache/copy my distributions files" (this is the >>> increase download reliability part - can be had by doing >>> opt-in CDN caching/proxying of external links via PyPI) >>> >>> Now, with V3 of the proposal, you are moving towards a system >>> that basically says "do it this way, or stay out of our eco >>> system", which, in my book, is not what the Python eco system >>> is all about. >>> >> >> I don't see how? The -with-externals index will still contain all the existing links, and indeed PJ Elby has already stated that setuptools will move to support this index by default but with proper warnings to people so they know they are installing a package off site. > >> This allows existing tools to be moved to a secure by default position. Allows future tools to choose if they want to enable the existing behavior through use of -with-externals (hopefully with a warning or opt-in sort of thing like laid out by PJE, but it's certainly not required). And even allows users of existing tools to opt into the old behavior via the -i option. >> >> Maybe i'm missing it but in what way does this force authors to "do it this way or stay out of our eco system" since all the same options are available as there are today? > > The proposal marks all external links as evil, and instead of > making external links more secure, the user is left with the option > to either not enable external links at all, or to let the > "devil" in :-) It doesn't mark them as evil, it marks them as requiring users to opt into them. Authors are free to not publish their packages directly to PyPI and users are free to opt in to installing the external urls that the authors haven chosen to publish. Further more it gives package authors complete control over what urls appear on their simple index page. ISTM that this is even friendlier than before because now both sides have explicitly decided to use those urls, instead of it being completely implicit on one said, and partially implicit on the other. > > That's not nice. It's also security theater. It's not security theater, it moves the defaults to more secure. Further work can (and will be) to ensure that for those users and authors who opt into the external urls it's still secure while again requiring both sides to explicitly opt into it. > > The real problem is unreviewed code getting executed by users, > or worse, automated build systems. Yet, we let users believe > that everything is secured on PyPI. "We"? I' don't think anyones ever said that *everything is secured on pypi*. The best the PyPI infrastructure and tooling can do (security wise) is to try and make as sure as possible then when you ask for foo==X.Y PyPI currently can't make that claim for external links. On top of that many users (and i'd wager most users) are not aware that when they install something it reaches outwardly to other hosts. This proposal makes it so they *are* aware so they opt into potentially lowering their downtime and they opt into exposing details to external hosts (which may or may not be SSL secured). > > Taking an extreme position, it would probably be better just > leave everything as it is and instead educate users about the > risk they are taking with a "pip install AngryBirds", signed > with keys issued by the PSF on the official PyPI server, > delivered straight to your drive via the latest in crypto > technology, only to wipe your notebook... > > But then, I don't like extreme positions, so would rather > like to incrementally improve the situation both from the > server and the client side, both addressing user and author > concerns, and keeping the Python eco system a friendly place > to be. > >>> Your V2 was much more inviting in this respect. This gives _all_ the abilities of the current system (besides spidering random urls) with *more* control given to the authors as to what exists on their various index pages. This is a net win for everyone involved. The only "loss" is that projects that choose to host externally to PyPI will have people trying to install it told to explicitly allow it (as mentioned by PJ Elby). > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 13 2013) >>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From tk47 at students.poly.edu Thu Mar 14 01:11:04 2013 From: tk47 at students.poly.edu (Trishank Karthik Kuppusamy) Date: Wed, 13 Mar 2013 20:11:04 -0400 Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF In-Reply-To: References: <51401FB3.7000408@students.poly.edu> <5140432C.7000904@students.poly.edu> Message-ID: <51411598.1010100@students.poly.edu> On 03/13/2013 02:15 PM, Daniel Holth wrote: > > With all the different kinds of metadata, It's interesting to note > that currently TUF seems to only be concerned with the available file > names and their integrity. (Some of us will think of PEP 426 > "PKG-INFO" first when we hear the word metadata.) Yes, you are right that the many different kinds of metadata in this discussion (TUF metadata, PyPI metadata) makes things a little confusing sometimes! :)) My understanding of PEP 426 is that the distribution metadata is specified by the developer with the setup.py script. To take the running Django example, since the Django developers will sign everything under the Django role with their own keys that the D role will talk about, setup.py, as well as the generated "PKG-INFO", will be signed by the Django developers. This means that pip + TUF will be able to verify these distribution metadata indirectly via the source distribution package. Does this answer your question? > It looks like the D metadata lists all the filenames for Django, and > then Django lists them again with hashes and signatures. Why all the > lists? Does every Django release re-assert all the versions of Django > that are available on the index? Good observation. For D, you are talking about the "paths" attribute here: https://updateframework.com/pypi/repository/metadata/targets/packages/source/D.txt For Django, you are talking about the "targets" attribute here: https://updateframework.com/pypi/repository/metadata/targets/packages/source/D/Django.txt Why is "paths" in D listing all the "targets" that Django already talks about? Presently, this is because our target delegation tool (signercli.py) is being paranoid and making sure that D is explicitly delegating only targets matching these "paths". However, the TUF specification allows for D to simply say, "I delegate any target whatsoever under Django", by settings "paths" to "packages/source/D/Django/**": https://www.updateframework.com/browser/specs/tuf-spec.txt#L525 > How might I deal with producing the official source distribution > myself and having a friend produce the official Windows build of a > package? There are a few solutions. You could have your friend produce the official Windows build for a package, and then you could sign it, implicitly trusting your friend but not publishing that trust. A more secure solution would have you delegate that target to your friend. > As an aside PyPI has been doubling in size every 1.5 - 2 years. Exponential growth strikes again! We have anticipated this, and we have a few solutions to curb the growth of TUF metadata. Since TUF metadata is simply text, GZIP compression would go a long way. Alternatively, we could implement delta updates of TUF metadata. The more difficult problem is how to ensure that target delegation structure scales with PyPI growth. A good design will keep this in mind and plan accordingly. Speaking of which, it may be the case that our design document for integrating PyPI with TUF may not be terribly easy to understand. (After all, you do need to understand TUF first, but TUF is fairly easy once you understand its main ideas.) I plan to publish a friendlier document which introduce TUF at a very high-level and instead discuss more pragmatic issues (such as workflows). From jcappos at poly.edu Thu Mar 14 01:15:03 2013 From: jcappos at poly.edu (Justin Cappos) Date: Wed, 13 Mar 2013 20:15:03 -0400 Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF In-Reply-To: <51411598.1010100@students.poly.edu> References: <51401FB3.7000408@students.poly.edu> <5140432C.7000904@students.poly.edu> <51411598.1010100@students.poly.edu> Message-ID: > Speaking of which, it may be the case that our design document for > integrating PyPI with TUF may not be terribly easy to understand. (After > all, you do need to understand TUF first, but TUF is fairly easy once you > understand its main ideas.) I plan to publish a friendlier document which > introduce TUF at a very high-level and instead discuss more pragmatic > issues (such as workflows). > > Feel free to chime in if you'd rather see something else or want us to focus on clarifying a specific topic. Thanks, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Thu Mar 14 01:16:30 2013 From: carl at oddbird.net (Carl Meyer) Date: Wed, 13 Mar 2013 18:16:30 -0600 Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files In-Reply-To: <5140D490.3040401@egenix.com> References: <20130313112158.GO9677@merlinux.eu> <5140CC36.10807@egenix.com> <8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io> <5140D490.3040401@egenix.com> Message-ID: <514116DE.50907@oddbird.net> On 03/13/2013 01:33 PM, M.-A. Lemburg wrote: > The proposal marks all external links as evil, I'm sorry the text of the PEP gave you that impression. I can see how you'd have gotten it from some of the comments here on catalog-sig, but we went to some lengths to avoid it in the PEP text, and plan to further revise the text to try harder to avoid that implication. In the proposed PEP, we are attempting to balance two things that I believe to be true: 1) There are good and valid reasons for some package owners to prefer external hosting, and it is good for automated installers to easily be able to install such packages (on user request). 2) Installing non-PyPI-hosted packages should not be the *default* behavior of installer tools, for many reasons, among them because that is unusual and surprising behavior to many newcomers to the Python ecosystem, and often leads to concerns on their part about the stability of the ecosystem. These are the axioms, if you will, of this proposal, and while I'd guess many people in this discussion are at least slightly uncomfortable with one or the other of them, I think accepting both is the most likely path to a compromise everyone can live with. I think we can find a solution that embraces both these axioms and maintains good backwards-compatibility and usability. Holger and I had a long talk this evening about that, and here are some of our thoughts: A) You mentioned opt-in PyPI caching of externally-hosted files as a means to improve reliability. We basically agree, but implementing this on the PyPI side adds complexity to the PyPI implementation that we are hesitant to propose. Rather, we propose that this is better handled by a client-side tool that you point at a PyPI release with externally-hosted files, and it simply copies those release files onto PyPI. This has essentially the same effect. We envision this being a simple enough tool that it could reasonably be run for every release of a project in an ongoing way, not just as a one-time project-wide migration. We plan to change the line in the PEP that says the existence of this tool is NOT REQUIRED to begin the phase 2 transition to instead say that the existence of this tool IS REQUIRED before the phase 2 transition begins. (Holger already has a partial implementation of this tool.) B) We also plan to change the PEP to say even more strongly that installer tools should provide an easy option for installing externally-hosted projects, and that our definition of "easy" includes the ability for an installer to automatically tell a user what options they can use to install a specific externally-hosted package that the tool is refusing to install by default. C) To make that latter part of (B) easier, we also propose that the basic simple index include a link with a distinct rel attribute that points to the -with-externals index page for that project, only for a package that has external links. This way even tools using the no-externals index by default can notify users of the existence of external links for a project when they try to install it. There's also another possible change, a bit more significant, that we discussed that I'd be curious to hear your thoughts on. The initial motivation for separating external links from the main simple/ index was twofold: 1) Allow future tools to distinguish between internal and external links without every tool needing to implement host-comparison algorithms (which may break indexes that host "internal" files on a CDN), and 2) Allow today's installers, without upgrade, to automatically migrate eventually to no-external-installs-by-default. Some things have caused us to re-evaluate these points: - PyPI can automatically tag internal/external links in the simple index with rel="internal" and rel="external", which gives future tools a more reliable marker than host-comparison. So this takes care of #1. - It may be that giving up #2 is acceptable in the interest of better backward-compatibility. Old tools will still gain most of the benefits of this PEP due to the eventual elimination of automatic link-scraping (both from metadata and external pages) and the move to explicit submission of external links, only for those projects that want them. And old tools will not be able to provide a useful error message to users trying to install an externally-hosted package that is no longer listed in the main simple/ index, which is a bad usability breakage. Given that, we are thinking of perhaps simplifying the PEP to eliminate the separate -with-externals index, and list external links in the main simple/ index, clearly marked with rel="external". The PEP would still recommend that future installer tools not follow rel="external" links without specific user authorization. Old tools still get many of the benefits, without the breakage. > and instead of > making external links more secure, the user is left with the option > to either not enable external links at all, or to let the > "devil" in :-) There is no "instead of." There are parallel proposals (see the TUF thread) to improve the security of the ecosystem, and those proposals are not mutually exclusive with this one. If you search the PEP text, note that you don't find the words "secure" or "security" anywhere within it, or any claims of security achieved by this proposal alone. There is a brief mention of MITM attacks, which is relevant to the PEP because avoiding external link-crawling does reduce that attack surface, even if other proposals will also help with that (even more). Thanks for taking the time to read all this! Looking forward to hearing your thoughts, Carl From dholth at gmail.com Thu Mar 14 02:19:12 2013 From: dholth at gmail.com (Daniel Holth) Date: Wed, 13 Mar 2013 21:19:12 -0400 Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF In-Reply-To: <51411598.1010100@students.poly.edu> References: <51401FB3.7000408@students.poly.edu> <5140432C.7000904@students.poly.edu> <51411598.1010100@students.poly.edu> Message-ID: On Wed, Mar 13, 2013 at 8:11 PM, Trishank Karthik Kuppusamy wrote: > On 03/13/2013 02:15 PM, Daniel Holth wrote: >> >> >> With all the different kinds of metadata, It's interesting to note >> that currently TUF seems to only be concerned with the available file >> names and their integrity. (Some of us will think of PEP 426 >> "PKG-INFO" first when we hear the word metadata.) > > > Yes, you are right that the many different kinds of metadata in this > discussion (TUF metadata, PyPI metadata) makes things a little confusing > sometimes! :)) > > My understanding of PEP 426 is that the distribution metadata is specified > by the developer with the setup.py script. > > To take the running Django example, since the Django developers will sign > everything under the Django role with their own keys that the D role will > talk about, setup.py, as well as the generated "PKG-INFO", will be signed by > the Django developers. This means that pip + TUF will be able to verify > these distribution metadata indirectly via the source distribution package. > > Does this answer your question? Thanks, yes. The individual .tar.gz distributions do contain PKG-INFO but we would eventually like to expose it in a more efficient way. Then to be suitably paranoid you would also have to check that it matched the package you downloaded! :( Also note that on http://crate.io the simple index works the same way as on pypi, except that the actual packages are on a different (CDN) host. Thanks, Daniel >> It looks like the D metadata lists all the filenames for Django, and >> then Django lists them again with hashes and signatures. Why all the >> lists? Does every Django release re-assert all the versions of Django >> that are available on the index? > > > Good observation. For D, you are talking about the "paths" attribute here: > > https://updateframework.com/pypi/repository/metadata/targets/packages/source/D.txt > > For Django, you are talking about the "targets" attribute here: > > https://updateframework.com/pypi/repository/metadata/targets/packages/source/D/Django.txt > > Why is "paths" in D listing all the "targets" that Django already talks > about? Presently, this is because our target delegation tool (signercli.py) > is being paranoid and making sure that D is explicitly delegating only > targets matching these "paths". > > However, the TUF specification allows for D to simply say, "I delegate any > target whatsoever under Django", by settings "paths" to > "packages/source/D/Django/**": > > https://www.updateframework.com/browser/specs/tuf-spec.txt#L525 > > >> How might I deal with producing the official source distribution >> myself and having a friend produce the official Windows build of a >> package? > > > There are a few solutions. You could have your friend produce the official > Windows build for a package, and then you could sign it, implicitly trusting > your friend but not publishing that trust. > > A more secure solution would have you delegate that target to your friend. > > >> As an aside PyPI has been doubling in size every 1.5 - 2 years. > > > Exponential growth strikes again! We have anticipated this, and we have a > few solutions to curb the growth of TUF metadata. Since TUF metadata is > simply text, GZIP compression would go a long way. Alternatively, we could > implement delta updates of TUF metadata. > > The more difficult problem is how to ensure that target delegation structure > scales with PyPI growth. A good design will keep this in mind and plan > accordingly. > > Speaking of which, it may be the case that our design document for > integrating PyPI with TUF may not be terribly easy to understand. (After > all, you do need to understand TUF first, but TUF is fairly easy once you > understand its main ideas.) I plan to publish a friendlier document which > introduce TUF at a very high-level and instead discuss more pragmatic issues > (such as workflows). > From fqj1994 at gmail.com Thu Mar 14 05:17:35 2013 From: fqj1994 at gmail.com (Qijiang Fan) Date: Thu, 14 Mar 2013 12:17:35 +0800 Subject: [Catalog-sig] ResponseNotReady error while trying to do fresh sync Message-ID: Hello, I'm maintaining e.pypi.python.org (with Aron Xu). We met some issues on our network attached storage, so we decided to do a fresh sync of pypi. We met an issue while doing that, we got an exception httplib.ResponseNotReady similar to this mail "http://mail.python.org/pipermail/catalog-sig/2013-February/005224.html" Currently, we ignored all packages with that issues, and finish the sync. But there would be some files missing. The three packages which cause that exception are listed below: https://pypi.python.org/simple/iterator/ https://pypi.python.org/simple/nester_test_ling/ https://pypi.python.org/simple/nesterswe/ Please notify us when it get fixed, so that we can update it and make it completed. Best Regards, Qijiang Fan From tk47 at students.poly.edu Thu Mar 14 06:47:17 2013 From: tk47 at students.poly.edu (Trishank Karthik Kuppusamy) Date: Thu, 14 Mar 2013 01:47:17 -0400 Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF In-Reply-To: References: <51401FB3.7000408@students.poly.edu> <5140432C.7000904@students.poly.edu> <51411598.1010100@students.poly Message-ID: <51416465.6080407@students.poly.edu> On 3/13/13 9:19 PM, Daniel Holth wrote: > > Thanks, yes. The individual .tar.gz distributions do contain PKG-INFO > but we would eventually like to expose it in a more efficient way. > Then to be suitably paranoid you would also have to check that it > matched the package you downloaded! :( Great, glad we could help. Well, at least the paranoid would just need an extra download :)) > Also note that on http://crate.io the simple index works the same way > as on pypi, except that the actual packages are on a different (CDN) > host. Got it. I'll take a look at crate.io to see how it works. Conceivably, the TUF metadata and the PyPI files could live in separate locations altogether and we would just have to check that the TUF metadata matches the PyPI files. From ncoghlan at gmail.com Thu Mar 14 07:19:15 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 13 Mar 2013 23:19:15 -0700 Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI In-Reply-To: <5140377C.90909@egenix.com> References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com> <20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com> <513F8922.90008@egenix.com> <5140377C.90909@egenix.com> Message-ID: On Wed, Mar 13, 2013 at 1:23 AM, M.-A. Lemburg wrote: > On 13.03.2013 07:28, Nick Coghlan wrote: >> On Tue, Mar 12, 2013 at 12:59 PM, M.-A. Lemburg wrote: >>> I think we should establish a versioned API like that for PyPI >>> to make progress easier. All major web APIs use versioning >>> for this reason. >> >> Why set up versioning for something we want to phase out? There will >> never be a simple-v3, so this is really overengineering the proposed >> change. > > Who says that we want to phase out the /simple/ index ? I want to render it redundant, because it's a crazy way to distribute completely inadequate metadata. Cheers, Nick. > > FWIW, I don't think that two or three small changes to the PyPI > (see my email to Holger) server warrants calling this over-engineering. > This is about moving forward in a backwards compatible and future > proof way. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 13 2013) >>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Mar 14 07:25:27 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 13 Mar 2013 23:25:27 -0700 Subject: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI In-Reply-To: References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com> <20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com> <513F8922.90008@egenix.com> <5140377C.90909@egenix.com> Message-ID: On Wed, Mar 13, 2013 at 11:19 PM, Nick Coghlan wrote: > On Wed, Mar 13, 2013 at 1:23 AM, M.-A. Lemburg wrote: >> On 13.03.2013 07:28, Nick Coghlan wrote: >>> On Tue, Mar 12, 2013 at 12:59 PM, M.-A. Lemburg wrote: >>>> I think we should establish a versioned API like that for PyPI >>>> to make progress easier. All major web APIs use versioning >>>> for this reason. >>> >>> Why set up versioning for something we want to phase out? There will >>> never be a simple-v3, so this is really overengineering the proposed >>> change. >> >> Who says that we want to phase out the /simple/ index ? > > I want to render it redundant, because it's a crazy way to distribute > completely inadequate metadata. Specifically, once we have the infrastructure in place to publish metadata v2.0 (or a suitable subset) to installation tools, the relatively impoverished contents of the simple index will be a legacy interface retained only to preserve the correct operation of existing tools. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Mar 14 07:43:20 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 13 Mar 2013 23:43:20 -0700 Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files In-Reply-To: <514116DE.50907@oddbird.net> References: <20130313112158.GO9677@merlinux.eu> <5140CC36.10807@egenix.com> <8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io> <5140D490.3040401@egenix.com> <514116DE.50907@oddbird.net> Message-ID: On Wed, Mar 13, 2013 at 5:16 PM, Carl Meyer wrote: > There is no "instead of." There are parallel proposals (see the TUF > thread) to improve the security of the ecosystem, and those proposals > are not mutually exclusive with this one. If you search the PEP text, > note that you don't find the words "secure" or "security" anywhere > within it, or any claims of security achieved by this proposal alone. > There is a brief mention of MITM attacks, which is relevant to the PEP > because avoiding external link-crawling does reduce that attack surface, > even if other proposals will also help with that (even more). Right, the changes to provide end-to-end security require more extensive changes and need to be given appropriate consideration before we proceed to implementation and deployment. This PEP, especially with the additional changes you propose here is an excellent approach to *near term* improvement, as a parallel effort to the more complex proposals. The /simple/ index will also be around for a long time for backwards compatibility reasons, regardless of any other changes that happen in the overall distribution ecosystem. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Mar 14 08:03:00 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 Mar 2013 00:03:00 -0700 Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF In-Reply-To: References: <51401FB3.7000408@students.poly.edu> <5140432C.7000904@students.poly.edu> Message-ID: On Wed, Mar 13, 2013 at 11:58 AM, Justin Cappos wrote: > We use the simple directory and filenames because that is what pip uses. > > You have a nice suggestion to include other metadata in the TUF metadata. > We certainly could do this if desirable. This required a redesign of the > PyPI API and we weren't sure if this was wanted. Our current doc / > prototype is trying to minimize the changes needed all around. I think what you currently propose (signing the metadata pip already understands) is a good first step, especially if we can have PyPI signing *all* the target metadata in the initial deployment, and defer the delegation to package developers until the next phase of the rollout (we obviously want to do that eventually, but it's easier if we can get a preliminary version working without needing to change the upload tools). While such an approach doesn't immediately give us the end-to-end security we ultimately want to set up, it means a few things become possible: 1. Rather than requiring every developer to start signing end-to-end metadata immediately, we can ask a few major projects (e.g. Django, Zope, NumPy) if they're willing to serve as guinea pigs for the developer target signing delegations. Once we're happy the signing process is usable, we can make it generally available as an option to projects (while also allowing them to continue with PyPI's existing upload mechanisms and only offer PyPI-user integrity checks rather than developer-user) 2. Gives the PSF infrastructure team and the PyPI maintainers a chance to work with the installation tool developers to get the PyPI-user link sorted out, before needing to work on the developer-PyPI link 3. Considering alternate mirroring solutions based on replicating the TUF metadata rather than PEP 381 Eventually I would also like to tunnel a subset of the PEP 426 metadata through TUF's "custom" fields, but again, I think we're better off skipping that for the first iteration. Incremental enhancements are a good thing :) Regards, Nick. > > Thanks, > Justin > > > On Wed, Mar 13, 2013 at 2:15 PM, Daniel Holth wrote: >> >> On Wed, Mar 13, 2013 at 5:13 AM, Trishank Karthik Kuppusamy >> wrote: >> > Hello Nick, >> > >> > >> > On 3/13/13 4:09 AM, Nick Coghlan wrote: >> >> >> >> >> >> - the PSF board generally stays out of the technical details of >> >> running the python.org infrastructure, so it's likely that any root >> >> keys would be handled by the PSF infrastructure committee. A (2, 4) or >> >> (3, 5) trust configuration would likely be manageable at this level. >> > >> > >> > Understood. We think a higher (t, n) [where t out of n signatures are >> > needed >> > to trust the metadata for a role] is better for the root role simply >> > because >> > its crucial metadata (the authorized keys for top-level roles) should >> > change >> > very rarely. >> > >> > >> >> - at the target delegation level, PyPI supports the registration of >> >> new projects through the web service (see >> >> http://docs.python.org/2/distutils/packageindex.html). If my >> >> understanding of target delegation is correct, this means the "simple" >> >> and "packages/source/" delegations will need to be (1, 1) and >> >> online. >> >> - higher levels of the target delegation hierarchy could conceivably >> >> be kept offline, but there seems little value in doing so if they're >> >> trusting on online (1, 1) key >> > >> > >> > Fortunately, the "targets/simple" and >> > "targets/packages/(version)/(letter)/" >> > roles should not require (1, 1) online keys, as their metadata (simply >> > target delegations and no actual target files) should also fluctuate >> > fairly >> > rarely. I should make this clearer in our design document. >> > >> > >> >> - many PyPI packages are maintained by single developers, so (1, 1) or >> >> (1, n) is likely to be the only generally feasible level of signing at >> >> the project level. >> > >> > >> > Yes, the package developers themselves could choose any (t, n) they >> > like. In >> > our design, we propose that PyPI could eventually delegate to "stable" >> > packages which need little change (and use more security with more >> > offline >> > keys) and to "unstable" packages which need frequent change (and use >> > less >> > security with more online keys). >> > >> > >> >> With the current focus being on getting an improvement from the status >> >> quo that we can successfully deploy in a reasonable period of time, >> >> the target delegation side of things probably needs to be >> >> substantially simpler in the initial iteration. Yes, it leaves us open >> >> to certain vulnerabilities we would like to remove in the long run, >> >> but we need to be very cautious in the additional demands we place on >> >> the users uploading to PyPI. It may even mean the initial iteration >> >> allows projects to rely on a PyPI provided signing key for their TUF >> >> metadata, using the existing upload mechanisms to add the files to >> >> PyPI. >> > >> > >> > I agree that there is a delicate problem of balancing security with >> > usability here, especially in the beginning. >> > >> > You raised a very good issue there: on first migration, how would PyPI >> > accommodate packages which have not had their target files delegated to >> > their developers? We imagine that in this case, PyPI could assume >> > initial >> > responsibility for these packages, and later PyPI would delegate those >> > packages to their respective developers. >> > >> > Thanks for your input, >> > Trishank >> >> With all the different kinds of metadata, It's interesting to note >> that currently TUF seems to only be concerned with the available file >> names and their integrity. (Some of us will think of PEP 426 >> "PKG-INFO" first when we hear the word metadata.) >> >> It looks like the D metadata lists all the filenames for Django, and >> then Django lists them again with hashes and signatures. Why all the >> lists? Does every Django release re-assert all the versions of Django >> that are available on the index? >> >> How might I deal with producing the official source distribution >> myself and having a friend produce the official Windows build of a >> package? >> >> As an aside PyPI has been doubling in size every 1.5 - 2 years. >> >> Thanks >> >> Daniel Holth > > -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mal at egenix.com Thu Mar 14 08:54:05 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 14 Mar 2013 08:54:05 +0100 Subject: [Catalog-sig] Publishing metadata (was: V2 pre-PEP: transitioning to release file hosting on PYPI) In-Reply-To: References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com> <20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com> <513F8922.90008@egenix.com> <5140377C.90909@egenix.com> Message-ID: <5141821D.6060601@egenix.com> On 14.03.2013 07:25, Nick Coghlan wrote: > On Wed, Mar 13, 2013 at 11:19 PM, Nick Coghlan wrote: >> On Wed, Mar 13, 2013 at 1:23 AM, M.-A. Lemburg wrote: >>> On 13.03.2013 07:28, Nick Coghlan wrote: >>>> On Tue, Mar 12, 2013 at 12:59 PM, M.-A. Lemburg wrote: >>>>> I think we should establish a versioned API like that for PyPI >>>>> to make progress easier. All major web APIs use versioning >>>>> for this reason. >>>> >>>> Why set up versioning for something we want to phase out? There will >>>> never be a simple-v3, so this is really overengineering the proposed >>>> change. >>> >>> Who says that we want to phase out the /simple/ index ? >> >> I want to render it redundant, because it's a crazy way to distribute >> completely inadequate metadata. > > Specifically, once we have the infrastructure in place to publish > metadata v2.0 (or a suitable subset) to installation tools, the > relatively impoverished contents of the simple index will be a legacy > interface retained only to preserve the correct operation of existing > tools. Those two are orthogonal. The index itself is just a bag of things and, as such, one that's very well suited to publish data, since it can easily be exposed in form of static files, which can be put on a CDNs or mirrored using rsync. It's easy to add the metadata file to that index for tools to pick up - in addition to the other data exposed on the index pages and perfectly backwards compatible. As mentioned before, I think we should start publishing the existing metadata stored in the PyPI database on those index pages as PKG-INFO files, so that tools can easily access the data without having to go through XML-RPC. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 14 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From tk47 at students.poly.edu Thu Mar 14 08:21:47 2013 From: tk47 at students.poly.edu (Trishank Karthik Kuppusamy) Date: Thu, 14 Mar 2013 03:21:47 -0400 Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF In-Reply-To: References: <51401FB3.7000408@students.poly.edu> <5140432C.7000904@students.poly.edu> Message-ID: <51417A8B.8030909@students.poly.edu> On 3/14/13 3:03 AM, Nick Coghlan wrote: > > I think what you currently propose (signing the metadata pip already > understands) is a good first step, especially if we can have PyPI > signing *all* the target metadata in the initial deployment, and defer > the delegation to package developers until the next phase of the > rollout (we obviously want to do that eventually, but it's easier if > we can get a preliminary version working without needing to change the > upload tools). > > While such an approach doesn't immediately give us the end-to-end > security we ultimately want to set up, it means a few things become > possible: > 1. Rather than requiring every developer to start signing end-to-end > metadata immediately, we can ask a few major projects (e.g. Django, > Zope, NumPy) if they're willing to serve as guinea pigs for the > developer target signing delegations. Once we're happy the signing > process is usable, we can make it generally available as an option to > projects (while also allowing them to continue with PyPI's existing > upload mechanisms and only offer PyPI-user integrity checks rather > than developer-user) > 2. Gives the PSF infrastructure team and the PyPI maintainers a chance > to work with the installation tool developers to get the PyPI-user > link sorted out, before needing to work on the developer-PyPI link > 3. Considering alternate mirroring solutions based on replicating the > TUF metadata rather than PEP 381 > > Eventually I would also like to tunnel a subset of the PEP 426 > metadata through TUF's "custom" fields, but again, I think we're > better off skipping that for the first iteration. Incremental > enhancements are a good thing :) This sounds good to me --- I like the idea of incremental enhancements. Justin, what are your thoughts from a security perspective? From holger at merlinux.eu Thu Mar 14 09:58:01 2013 From: holger at merlinux.eu (holger krekel) Date: Thu, 14 Mar 2013 08:58:01 +0000 Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files In-Reply-To: References: <20130313112158.GO9677@merlinux.eu> <5140CC36.10807@egenix.com> <8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io> <5140D490.3040401@egenix.com> <514116DE.50907@oddbird.net> Message-ID: <20130314085800.GT9677@merlinux.eu> On Wed, Mar 13, 2013 at 23:43 -0700, Nick Coghlan wrote: > On Wed, Mar 13, 2013 at 5:16 PM, Carl Meyer wrote: > > There is no "instead of." There are parallel proposals (see the TUF > > thread) to improve the security of the ecosystem, and those proposals > > are not mutually exclusive with this one. If you search the PEP text, > > note that you don't find the words "secure" or "security" anywhere > > within it, or any claims of security achieved by this proposal alone. > > There is a brief mention of MITM attacks, which is relevant to the PEP > > because avoiding external link-crawling does reduce that attack surface, > > even if other proposals will also help with that (even more). > > Right, the changes to provide end-to-end security require more > extensive changes and need to be given appropriate consideration > before we proceed to implementation and deployment. This PEP, > especially with the additional changes you propose here is an > excellent approach to *near term* improvement, as a parallel effort to > the more complex proposals. > > The /simple/ index will also be around for a long time for backwards > compatibility reasons, regardless of any other changes that happen in > the overall distribution ecosystem. I haven't followed the latest TUF discussions and related docs in depths yet but if those developments will regard "simple/" as a deprecated interface, i think this PEP here should maybe not introduce "simple/-with-externals" as it will just make the situation more complicated for everyone to understand in a few months from now. best, holger > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From mal at egenix.com Thu Mar 14 11:07:07 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 14 Mar 2013 11:07:07 +0100 Subject: [Catalog-sig] setuptools/distribute/easy_install/pkg_resource sorting algorithm In-Reply-To: References: <513F70B5.5030501@egenix.com> <513F893F.9010707@egenix.com> Message-ID: <5141A14B.9030301@egenix.com> On 12.03.2013 22:26, PJ Eby wrote: > On Tue, Mar 12, 2013 at 3:59 PM, M.-A. Lemburg wrote: >> On 12.03.2013 19:15, M.-A. Lemburg wrote: >>> I've run into a weird issue with easy_install, that I'm trying to solve: >>> >>> If I place two files named >>> >>> egenix_mxodbc_connect_client-2.0.2-py2.6.egg >>> egenix-mxodbc-connect-client-2.0.2.win32-py2.6.prebuilt.zip >>> >>> into the same directory and let easy_install running on Linux >>> scan this, it considers the second file for Windows as best >>> match. >>> >>> Is the algorithm used for determining the best match documented >>> somewhere ? >>> >>> I've had a look at the implementation, but this left me rather >>> clueless. >>> >>> I thought that setuptools would prefer the .egg file over >>> the prebuilt .zip file - binary files being easier to install >>> than "source" files. >> >> After some experiments, I found that the follow change >> in filename (swapping platform and python version, in addition >> to use '-' instead of '.) works: >> >> egenix-mxodbc-connect-client-2.0.2-py2.6-win32.prebuilt.zip >> >> OTOH, this one doesn't (notice the difference ?): >> >> egenix-mxodbc-connect-client-2.0.2.py2.6-win32.prebuilt.zip >> >> The logic behind all this looks rather fragile to me. > > easy_install only guarantees sane version parsing for distribution > files built using setuptools' naming algorithms. If you use > distutils, it can only make guesses, because the distutils does not > have a completely unambiguous file naming scheme. And if you are > naming the files by hand, God help you. ;-) The problem appears to be a bug in setuptools' package_index.py. The function interpret_distro_name() creates a set of possible separations of the found name into project name and version. It does find the right separation, but for some reason, the code using that function does not check the found project names against the project name the user is trying to install, but simply takes the last entry of the list returned by the above function. As a result, easy_install downloads and tries to install project files that don't match the project name in some cases. Here's another example where it fails (say you're on a x64 Linux box): # easy_install egenix-pyopenssl As example, say it finds these distribution files: 'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs2-linux-x86_64-prebuilt.zip', 'egenix_pyopenssl-0.13.1.1.0.1.5-py2.7-linux-x86_64.egg', 'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs2-macosx-10.5-x86_64-prebuilt.zip', 'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs4-macosx-10.5-x86_64-prebuilt.zip', It then creates different interpretations of those names, puts them in a list and sorts them. Here's the end of that list: egenix-pyopenssl; 0.13.1.1.0.1.5 <<-- this would be the correct .egg file egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs2-linux-x86-64-prebuilt egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs2-macosx-10.5-x86-64-prebuilt egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs2-macosx; 10.5-x86-64-prebuilt egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs4-macosx; 10.5-x86-64-prebuilt It picks the last entry, which would be for a project called "egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs4-macosx" - not the one the user searched. I'm trying to find a way to get it to use the correct .egg file The .egg files does have precedence over the other files, since easy_install regards them as source files with lower precedence. This is important, because the /simple/ index page will have links not only to .egg files, but also to our prebuilt .zip files, which use a source file compatible setup.py interface. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 14 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From tk47 at students.poly.edu Thu Mar 14 13:14:50 2013 From: tk47 at students.poly.edu (Trishank Karthik Kuppusamy) Date: Thu, 14 Mar 2013 08:14:50 -0400 Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files In-Reply-To: <20130314085800.GT9677@merlinux.eu> References: <20130313112158.GO9677@merlinux.eu> <5140CC36.10807@egenix.com> <8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io> <5140D490.3040401@egenix.com> <514116DE.50907@oddbird.net> <20130314085800.GT9677@merlinux.eu> Message-ID: <5141BF3A.6060606@students.poly.edu> On 3/14/13 4:58 AM, holger krekel wrote: > > I haven't followed the latest TUF discussions and related docs in > depths yet but if those developments will regard "simple/" as a deprecated > interface, i think this PEP here should maybe not introduce > "simple/-with-externals" as it will just make the situation more > complicated for everyone to understand in a few months from now. I haven't yet followed your PEP in as much depth as I would like, but I wish to assure you that we do not regard "/simple/" as a deprecated interface. In fact, we aim to preserve backwards-compatibility as much as possible! :) From jim at zope.com Thu Mar 14 13:26:07 2013 From: jim at zope.com (Jim Fulton) Date: Thu, 14 Mar 2013 08:26:07 -0400 Subject: [Catalog-sig] Packaging & Distribution Mini-Summit at PyCon US In-Reply-To: References: Message-ID: On Thu, Feb 7, 2013 at 10:19 AM, Jim Fulton wrote: > On Wed, Feb 6, 2013 at 3:15 AM, Nick Coghlan wrote: >> As folks may be aware, I am moderating a panel called "Directions in >> Packaging" on the Saturday afternoon at PyCon US. >> >> Before that though, I am also organising what I am calling a >> "Packaging & Distribution Mini-Summit" as an open space on the Friday >> night (we have one of the larger open space rooms reserved, so we >> should have a fair bit of space if a decent crowd turns up). > > I wasn't going to be at PyCon, but I changed my plans specifically to > participate in this. Thanks for setting this up. > >> An overview of what I'm hoping we can achieve at the session is at >> https://us.pycon.org/2013/community/openspaces/packaginganddistributionminisummit/ >> (that page should be editable by anyone that has registered for PyCon >> US). > > Cool. A major difficulty in these sorts of discussions is that people > have different problems they want to solve and argue about solutions > without clearly stating their problems. > > If you don't mind, I'll try to find some time in the next few days to > add a section > to that page to list goals/problems. OK, well, hopefully better late than never. I took a stab at adding this to the end of: https://us.pycon.org/2013/community/openspaces/packaginganddistributionminisummit/ Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton From jcappos at poly.edu Thu Mar 14 15:13:03 2013 From: jcappos at poly.edu (Justin Cappos) Date: Thu, 14 Mar 2013 10:13:03 -0400 Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files In-Reply-To: <5141BF3A.6060606@students.poly.edu> References: <20130313112158.GO9677@merlinux.eu> <5140CC36.10807@egenix.com> <8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io> <5140D490.3040401@egenix.com> <514116DE.50907@oddbird.net> <20130314085800.GT9677@merlinux.eu> <5141BF3A.6060606@students.poly.edu> Message-ID: Maybe a different way to say it is that the current TUF integration doc assumes that it is desirable to make minimal change to PyPI's layout and pip, easy_install, etc. while adding security. We made several choices based upon this assumption, including using and retaining the /simple dir. If the community wants a more 'clean-slate' design, we could put that together also. This requires a lot of information specific to your setup and use cases so we'd appreciate collaboration with you guys to write that up. Thanks, Justin On Thu, Mar 14, 2013 at 8:14 AM, Trishank Karthik Kuppusamy < tk47 at students.poly.edu> wrote: > On 3/14/13 4:58 AM, holger krekel wrote: > >> >> I haven't followed the latest TUF discussions and related docs in >> depths yet but if those developments will regard "simple/" as a deprecated >> interface, i think this PEP here should maybe not introduce >> "simple/-with-externals" as it will just make the situation more >> complicated for everyone to understand in a few months from now. >> > > I haven't yet followed your PEP in as much depth as I would like, but I > wish to assure you that we do not regard "/simple/" as a deprecated > interface. In fact, we aim to preserve backwards-compatibility as much as > possible! :) > > > ______________________________**_________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/**mailman/listinfo/catalog-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Mar 14 15:39:46 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 Mar 2013 07:39:46 -0700 Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files In-Reply-To: References: <20130313112158.GO9677@merlinux.eu> <5140CC36.10807@egenix.com> <8DA4F828-BF12-4F96-9664-A87FA0EFBF12@stufft.io> <5140D490.3040401@egenix.com> <514116DE.50907@oddbird.net> <20130314085800.GT9677@merlinux.eu> <5141BF3A.6060606@students.poly.edu> Message-ID: On Thu, Mar 14, 2013 at 7:13 AM, Justin Cappos wrote: > Maybe a different way to say it is that the current TUF integration doc > assumes that it is desirable to make minimal change to PyPI's layout and > pip, easy_install, etc. while adding security. We made several choices > based upon this assumption, including using and retaining the /simple dir. I think what you're proposing now is a pretty good place to state (although I'm suggesting making it even simpler in the near term by starting by focusing on the PyPI->end user link, and then moving to delegating signing of the per-project metadata to the individual projects as a later step) > If the community wants a more 'clean-slate' design, we could put that > together also. This requires a lot of information specific to your setup > and use cases so we'd appreciate collaboration with you guys to write that > up. I'd like to do a "distribution 2.0" at some point where we make the simple index redundant by including that info (and more) directly in the TUF metadata, but I think that's a "later" project - securing what we have now is a better place to start. Cheers, Nick. > > Thanks, > Justin > > > On Thu, Mar 14, 2013 at 8:14 AM, Trishank Karthik Kuppusamy > wrote: >> >> On 3/14/13 4:58 AM, holger krekel wrote: >>> >>> >>> I haven't followed the latest TUF discussions and related docs in >>> depths yet but if those developments will regard "simple/" as a >>> deprecated >>> interface, i think this PEP here should maybe not introduce >>> "simple/-with-externals" as it will just make the situation more >>> complicated for everyone to understand in a few months from now. >> >> >> I haven't yet followed your PEP in as much depth as I would like, but I >> wish to assure you that we do not regard "/simple/" as a deprecated >> interface. In fact, we aim to preserve backwards-compatibility as much as >> possible! :) >> >> >> _______________________________________________ >> Catalog-SIG mailing list >> Catalog-SIG at python.org >> http://mail.python.org/mailman/listinfo/catalog-sig > > > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Mar 14 15:45:23 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 Mar 2013 07:45:23 -0700 Subject: [Catalog-sig] Publishing metadata (was: V2 pre-PEP: transitioning to release file hosting on PYPI) In-Reply-To: <5141821D.6060601@egenix.com> References: <20130312113817.GA9677@merlinux.eu> <513F5282.3010206@egenix.com> <20130312170508.GG9677@merlinux.eu> <513F6EE0.6080503@egenix.com> <513F8922.90008@egenix.com> <5140377C.90909@egenix.com> <5141821D.6060601@egenix.com> Message-ID: On Thu, Mar 14, 2013 at 12:54 AM, M.-A. Lemburg wrote: > The index itself is just a bag of things and, as such, one that's very > well suited to publish data, since it can easily be exposed in form > of static files, which can be put on a CDNs or mirrored using > rsync. The TUF metadata is also just a collection of static files which can be put on CDNs and mirrored using rsync. That's one of the reasons TUF is an interesting approach :) > It's easy to add the metadata file to that index for tools to > pick up - in addition to the other data exposed on the index > pages and perfectly backwards compatible. > > As mentioned before, I think we should start publishing the > existing metadata stored in the PyPI database on those > index pages as PKG-INFO files, so that tools can easily > access the data without having to go through XML-RPC. Yes, I think that's a good near term approach. However, there's still a lot of duplication of functionality between the TUF metadata and the simple index, so if we get TUF-based security up and running, my long term aim will be to make it so that once you have downloaded the TUF metadata, you shouldn't *need* anything from the simple index, and would be able to go directly to downloading the release files. That's a longer term idea, though and we may even decide it isn't worth the hassle if PKG-INFO is made available through /simple. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From jcappos at poly.edu Thu Mar 14 15:58:14 2013 From: jcappos at poly.edu (Justin Cappos) Date: Thu, 14 Mar 2013 10:58:14 -0400 Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF In-Reply-To: <51417A8B.8030909@students.poly.edu> References: <51401FB3.7000408@students.poly.edu> <5140432C.7000904@students.poly.edu> <51417A8B.8030909@students.poly.edu> Message-ID: Yes, Nick's suggestions are good ones. I'd agree that getting an initial deployment together that doesn't include things like custom metadata is probably for the best. We can certainly add things incrementally. Thanks, Justin On Thu, Mar 14, 2013 at 3:21 AM, Trishank Karthik Kuppusamy < tk47 at students.poly.edu> wrote: > On 3/14/13 3:03 AM, Nick Coghlan wrote: > >> >> I think what you currently propose (signing the metadata pip already >> understands) is a good first step, especially if we can have PyPI >> signing *all* the target metadata in the initial deployment, and defer >> the delegation to package developers until the next phase of the >> rollout (we obviously want to do that eventually, but it's easier if >> we can get a preliminary version working without needing to change the >> upload tools). >> >> While such an approach doesn't immediately give us the end-to-end >> security we ultimately want to set up, it means a few things become >> possible: >> 1. Rather than requiring every developer to start signing end-to-end >> metadata immediately, we can ask a few major projects (e.g. Django, >> Zope, NumPy) if they're willing to serve as guinea pigs for the >> developer target signing delegations. Once we're happy the signing >> process is usable, we can make it generally available as an option to >> projects (while also allowing them to continue with PyPI's existing >> upload mechanisms and only offer PyPI-user integrity checks rather >> than developer-user) >> 2. Gives the PSF infrastructure team and the PyPI maintainers a chance >> to work with the installation tool developers to get the PyPI-user >> link sorted out, before needing to work on the developer-PyPI link >> 3. Considering alternate mirroring solutions based on replicating the >> TUF metadata rather than PEP 381 >> >> Eventually I would also like to tunnel a subset of the PEP 426 >> metadata through TUF's "custom" fields, but again, I think we're >> better off skipping that for the first iteration. Incremental >> enhancements are a good thing :) >> > > This sounds good to me --- I like the idea of incremental enhancements. > Justin, what are your thoughts from a security perspective? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Thu Mar 14 17:39:44 2013 From: pje at telecommunity.com (PJ Eby) Date: Thu, 14 Mar 2013 12:39:44 -0400 Subject: [Catalog-sig] setuptools/distribute/easy_install/pkg_resource sorting algorithm In-Reply-To: <5141A14B.9030301@egenix.com> References: <513F70B5.5030501@egenix.com> <513F893F.9010707@egenix.com> <5141A14B.9030301@egenix.com> Message-ID: On Thu, Mar 14, 2013 at 6:07 AM, M.-A. Lemburg wrote: > On 12.03.2013 22:26, PJ Eby wrote: >> On Tue, Mar 12, 2013 at 3:59 PM, M.-A. Lemburg wrote: >>> On 12.03.2013 19:15, M.-A. Lemburg wrote: >>>> I've run into a weird issue with easy_install, that I'm trying to solve: >>>> >>>> If I place two files named >>>> >>>> egenix_mxodbc_connect_client-2.0.2-py2.6.egg >>>> egenix-mxodbc-connect-client-2.0.2.win32-py2.6.prebuilt.zip >>>> >>>> into the same directory and let easy_install running on Linux >>>> scan this, it considers the second file for Windows as best >>>> match. >>>> >>>> Is the algorithm used for determining the best match documented >>>> somewhere ? >>>> >>>> I've had a look at the implementation, but this left me rather >>>> clueless. >>>> >>>> I thought that setuptools would prefer the .egg file over >>>> the prebuilt .zip file - binary files being easier to install >>>> than "source" files. >>> >>> After some experiments, I found that the follow change >>> in filename (swapping platform and python version, in addition >>> to use '-' instead of '.) works: >>> >>> egenix-mxodbc-connect-client-2.0.2-py2.6-win32.prebuilt.zip >>> >>> OTOH, this one doesn't (notice the difference ?): >>> >>> egenix-mxodbc-connect-client-2.0.2.py2.6-win32.prebuilt.zip >>> >>> The logic behind all this looks rather fragile to me. >> >> easy_install only guarantees sane version parsing for distribution >> files built using setuptools' naming algorithms. If you use >> distutils, it can only make guesses, because the distutils does not >> have a completely unambiguous file naming scheme. And if you are >> naming the files by hand, God help you. ;-) > > The problem appears to be a bug in setuptools' package_index.py. > > The function interpret_distro_name() creates a set of possible > separations of the found name into project name and version. > > It does find the right separation, but for some reason, the > code using that function does not check the found project > names against the project name the user is trying to install, > but simply takes the last entry of the list returned by the > above function. > > As a result, easy_install downloads and tries to install > project files that don't match the project name in some > cases. > > Here's another example where it fails (say you're on a x64 Linux box): > > # easy_install egenix-pyopenssl > > As example, say it finds these distribution files: > > 'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs2-linux-x86_64-prebuilt.zip', > 'egenix_pyopenssl-0.13.1.1.0.1.5-py2.7-linux-x86_64.egg', > 'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs2-macosx-10.5-x86_64-prebuilt.zip', > 'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs4-macosx-10.5-x86_64-prebuilt.zip', > > It then creates different interpretations of those names, puts > them in a list and sorts them. Here's the end of that list: > > egenix-pyopenssl; 0.13.1.1.0.1.5 <<-- this would be the correct .egg file > egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs2-linux-x86-64-prebuilt > egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs2-macosx-10.5-x86-64-prebuilt > egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt > egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs2-macosx; 10.5-x86-64-prebuilt > egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs4-macosx; 10.5-x86-64-prebuilt > > It picks the last entry, which would be for a project called > "egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs4-macosx" - not the one > the user searched. Actually, that's not quite true. It's picking: egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt Because it thinks that '0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt' is a higher version than 0.13.1.1.0.1.5. It does also record the possibility you mentioned, but it doesn't pick that one. The project names actually *do* have to match. If you open a ticket on the setuptools tracker, 'll try to see if I can get it to recognize that strings like py2.7, macosx, ucs, and the like are terminators for a version number. I don't know how successful I'll be, though. Basically, those zip files are (I assume) bdist_dumb distributions being taken for source distributions, and easy_install doesn't actually support bdist_dumb files at the moment. From mal at egenix.com Thu Mar 14 19:11:59 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 14 Mar 2013 19:11:59 +0100 Subject: [Catalog-sig] setuptools/distribute/easy_install/pkg_resource sorting algorithm In-Reply-To: References: <513F70B5.5030501@egenix.com> <513F893F.9010707@egenix.com> <5141A14B.9030301@egenix.com> Message-ID: <514212EF.4030505@egenix.com> On 14.03.2013 17:39, PJ Eby wrote: > On Thu, Mar 14, 2013 at 6:07 AM, M.-A. Lemburg wrote: >> On 12.03.2013 22:26, PJ Eby wrote: >>> On Tue, Mar 12, 2013 at 3:59 PM, M.-A. Lemburg wrote: >>>> On 12.03.2013 19:15, M.-A. Lemburg wrote: >>>>> I've run into a weird issue with easy_install, that I'm trying to solve: >>>>> >>>>> If I place two files named >>>>> >>>>> egenix_mxodbc_connect_client-2.0.2-py2.6.egg >>>>> egenix-mxodbc-connect-client-2.0.2.win32-py2.6.prebuilt.zip >>>>> >>>>> into the same directory and let easy_install running on Linux >>>>> scan this, it considers the second file for Windows as best >>>>> match. >>>>> >>>>> Is the algorithm used for determining the best match documented >>>>> somewhere ? >>>>> >>>>> I've had a look at the implementation, but this left me rather >>>>> clueless. >>>>> >>>>> I thought that setuptools would prefer the .egg file over >>>>> the prebuilt .zip file - binary files being easier to install >>>>> than "source" files. >>>> >>>> After some experiments, I found that the follow change >>>> in filename (swapping platform and python version, in addition >>>> to use '-' instead of '.) works: >>>> >>>> egenix-mxodbc-connect-client-2.0.2-py2.6-win32.prebuilt.zip >>>> >>>> OTOH, this one doesn't (notice the difference ?): >>>> >>>> egenix-mxodbc-connect-client-2.0.2.py2.6-win32.prebuilt.zip >>>> >>>> The logic behind all this looks rather fragile to me. >>> >>> easy_install only guarantees sane version parsing for distribution >>> files built using setuptools' naming algorithms. If you use >>> distutils, it can only make guesses, because the distutils does not >>> have a completely unambiguous file naming scheme. And if you are >>> naming the files by hand, God help you. ;-) >> >> The problem appears to be a bug in setuptools' package_index.py. >> >> The function interpret_distro_name() creates a set of possible >> separations of the found name into project name and version. >> >> It does find the right separation, but for some reason, the >> code using that function does not check the found project >> names against the project name the user is trying to install, >> but simply takes the last entry of the list returned by the >> above function. >> >> As a result, easy_install downloads and tries to install >> project files that don't match the project name in some >> cases. >> >> Here's another example where it fails (say you're on a x64 Linux box): >> >> # easy_install egenix-pyopenssl >> >> As example, say it finds these distribution files: >> >> 'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs2-linux-x86_64-prebuilt.zip', >> 'egenix_pyopenssl-0.13.1.1.0.1.5-py2.7-linux-x86_64.egg', >> 'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs2-macosx-10.5-x86_64-prebuilt.zip', >> 'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs4-macosx-10.5-x86_64-prebuilt.zip', >> >> It then creates different interpretations of those names, puts >> them in a list and sorts them. Here's the end of that list: >> >> egenix-pyopenssl; 0.13.1.1.0.1.5 <<-- this would be the correct .egg file >> egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs2-linux-x86-64-prebuilt >> egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs2-macosx-10.5-x86-64-prebuilt >> egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt >> egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs2-macosx; 10.5-x86-64-prebuilt >> egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs4-macosx; 10.5-x86-64-prebuilt >> >> It picks the last entry, which would be for a project called >> "egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs4-macosx" - not the one >> the user searched. > > Actually, that's not quite true. It's picking: > > egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt > > Because it thinks that > '0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt' is a higher > version than 0.13.1.1.0.1.5. > > It does also record the possibility you mentioned, but it doesn't pick > that one. The project names actually *do* have to match. Ah, ok, that makes sense then. Is there any way to have "0.13.1.1.0.1.5-" sort before "0.13.1.1.0.1.5" ? (e.g. like is done for release candidates) Ideally, I'd like to get this to work without any changes to setuptools, even though it would of course be better not to take stuff after a Python version marker into account when looking for a package version (since the Python marker is actually a new component in the file name). > If you open a ticket on the setuptools tracker, 'll try to see if I > can get it to recognize that strings like py2.7, macosx, ucs, and the > like are terminators for a version number. I don't know how > successful I'll be, though. Basically, those zip files are (I assume) > bdist_dumb distributions being taken for source distributions, and > easy_install doesn't actually support bdist_dumb files at the moment. If you could point me to that tracker, I'll open a ticket :-) Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 14 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From pje at telecommunity.com Thu Mar 14 22:03:14 2013 From: pje at telecommunity.com (PJ Eby) Date: Thu, 14 Mar 2013 17:03:14 -0400 Subject: [Catalog-sig] setuptools/distribute/easy_install/pkg_resource sorting algorithm In-Reply-To: <514212EF.4030505@egenix.com> References: <513F70B5.5030501@egenix.com> <513F893F.9010707@egenix.com> <5141A14B.9030301@egenix.com> <514212EF.4030505@egenix.com> Message-ID: On Thu, Mar 14, 2013 at 2:11 PM, M.-A. Lemburg wrote: > Is there any way to have "0.13.1.1.0.1.5-" sort before > "0.13.1.1.0.1.5" ? (e.g. like is done for release candidates) Make it "0.13.1.1.0.1.5-dev", and it'll have lower precedence than both "0.13.1.1.0.1.5" and "0.13.1.1.0.1.5-". > If you could point me to that tracker, I'll open a ticket :-) http://bugs.python.org/setuptools/ From qwcode at gmail.com Fri Mar 15 08:32:02 2013 From: qwcode at gmail.com (Marcus Smith) Date: Fri, 15 Mar 2013 00:32:02 -0700 Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files In-Reply-To: <20130313112158.GO9677@merlinux.eu> References: <20130313112158.GO9677@merlinux.eu> Message-ID: In addition, maintainers of installation tools are asked to release > two updates. The first one shall provide clear warnings [...] > The second update for installation tools should change the default > mode to allow only installation of package files hosted at the index > domain, sounds good to me. It is expected that tools in this release may choose to change the > default index url to ``https://pypi.python.org/simple/-with-ext``in > so, *eventually*, the /simple interface (that has been transitioned to only serve pypi links) could be deprecated? (because new tools would be smart enough to responsibly navigate /simple/-with-ext) but slightly ironic that we'd be left with an interface called "simple/-with-ext", given the goal of all this, but it makes sense. Marcus -------------- next part -------------- An HTML attachment was scrubbed... URL: From holger at merlinux.eu Fri Mar 15 10:29:59 2013 From: holger at merlinux.eu (holger krekel) Date: Fri, 15 Mar 2013 09:29:59 +0000 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI Message-ID: <20130315092959.GA9677@merlinux.eu> Hi all, in particular Philip, Marc-Andre, Donald, Carl and me decided to simplify the PEP and avoid the somewhat awkward ``simple/-with-externals`` index for various reasons, among them Marc-Andre's criticisms. This also means present-day installation tools (shipped with Redhat/Debian/etc.) will continue to work as today for those packages which remain in a hosting-mode that requires crawling and scraping. They will still benefit from the fact that most packages will soon have a hosting-mode that avoids it. Future releases of installation tools will default to not perform crawling or using (scraped) external links, and new PYPI projects will default to only serve uploaded files. The V4 pre-PEP also renames the three PyPI hosting modes to be more descriptive. Since all three modes allow external links, "pypi-ext" vs "pypi-only" were misleading. The new naming distinguishes the mode that both scrapes links from metadata and crawls external pages for more links ("pypi-scrape-crawl") from the mode that only scrapes links from metadata ("pypi-scrape") from the mode where all links are explicit ("pypi-explicit"). Without the separate external index, it also turns out that the two transition phases are separated into PyPI changes (phase one) and installer-tool updates (phase two). There are no PyPI changes necessary in phase two. As stated in a new open question, it should be possible to do PEP-related installation tool updates during phase 1, that may require a bit of clarification in the PEP's language still. Carl and me are happy with this PEP version now and hope you all are as well. Donald is already working on improving the analysis tool so we hopefully have some updated numbers soon. cheers, Holger PEP: XXX Title: Transitioning to release-file hosting on PyPI Version: $Revision$ Last-Modified: $Date$ Author: Holger Krekel , Carl Meyer Discussions-To: catalog-sig at python.org Status: Draft (PRE-submit V4) Type: Process Content-Type: text/x-rst Created: 10-Mar-2013 Post-History: Abstract ======== This PEP proposes a backward-compatible two-phase transition process to speed up, simplify and robustify installing from the pypi.python.org (PyPI) package index. To ease the transition and minimize client-side friction, **no changes to distutils or existing installation tools are required in order to benefit from the first transition phase, which will result in faster, more reliable installs for most existing packages**. The first transition phase implements an easy and explicit means for a package maintainer to control which release file links are served to present-day installation tools. The first phase also includes the implementation of analysis tools for present-day packages, to support communication with package maintainers and the automated setting of default modes for controlling release file links. The first phase also will make new projects on PYPI use a default to only serve links to release files which were uploaded to PYPI. The second transition phase concerns end-user installation tools, which shall default to only install release files that are hosted on PyPI and tell the user if external release files exist, offering a choice to automatically use those external files. Rationale ========= .. _history: History and motivations for external hosting -------------------------------------------- When PyPI went online, it offered release registration but had no facility to host release files itself. When hosting was added, no automated downloading tool existed yet. When Philip Eby implemented automated downloading (through setuptools), he made the choice to allow people to use download hosts of their choice. The finding of externally-hosted packages was implemented as follows: #. The PyPI ``simple/`` index for a package contains all links found by scraping them from that package's long_description metadata for any release. Links in the "Download-URL" and "Home-page" metadata fields are given ``rel=download`` and ``rel=homepage`` attributes, respectively. #. Any of these links whose target is a file whose name appears to be in the form of an installable source or binary distribution, with name in the form "packagename-version.ARCHIVEEXT", is considered a potential installation candidate by installation tools. #. Similarly, any links suffixed with an "#egg=packagename-version" fragment are considered an installation candidate. #. Additionally, the ``rel=homepage`` and ``rel=download`` links are crawled by installation tools and, if HTML, are themselves scraped for release-file links in the above formats. Today, most packages released on PyPI host their release files on PyPI, but a small percentage (XXX need updated data) rely on external hosting. There are many reasons [2]_ why people have chosen external hosting. To cite just a few: - release processes and scripts have been developed already and upload to external sites - it takes too long to upload large files from some places in the world - export restrictions e.g. for crypto-related software - company policies which require offering open source packages through own sites - problems with integrating uploading to PyPI into one's release process (because of release policies) - desiring download statistics different from those maintained by PyPI - perceived bad reliability of PyPI - not aware that PyPI offers file-hosting Irrespective of the present-day validity of these reasons, there clearly is a history why people choose to host files externally and it even was for some time the only way you could do things. This PEP takes the position that there are at least some valid reasons for external hosting. Problem ------- **Today, python package installers (pip, easy_install, buildout, and others) often need to query many non-PyPI URLs even if there are no externally hosted files**. Apart from querying pypi.python.org's simple index pages, also all homepages and download pages ever specified with any release of a package are crawled by an installer. The need for installers to crawl external sites slows down installation and makes for a brittle and unreliable installation process. Those sites and packages also don't take part in the :pep:`381` mirroring infrastructure, further decreasing reliability and speed of automated installation processes around the world. Most packages are hosted directly on pypi.python.org [1]_. Even for these packages, installers still crawl their homepage and download-url, if specified. Many package uploaders are not aware that specifying the "homepage" or "download-url" in their package metadata will needlessly slow down the installation process for all users. Relying on third party sites also opens up more attack vectors for injecting malicious packages into sites using automated installs. A simple attack might just involve getting hold of an old now-unused homepage domain and placing malicious packages there. Moreover, performing a Man-in-The-Middle (MITM) attack between an installation site and any of the download sites can inject malicious packages on the installation site. As many homepages and download locations are using HTTP and not HTTPS, such attacks are not hard to launch. Such MITM attacks can easily happen even for packages which never intended to host files externally as their homepages are contacted by installers anyway. There is currently no way for package maintainers to avoid external-link crawling, other than removing all homepage/download url metadata for all historic releases. While a script [3]_ has been written to perform this action, it is not a good general solution because it removes useful metadata from PyPI releases. Even if the sites referenced by "Homepage" and "Download-URL" links were not scraped for further links, there is no obvious way under the current system for a package owner to link to an installable file from a long_description metadata field (which is shown as package documentation on ``/pypi/PKG``) without installation tools automatically considering that file a candidate for installation. Conversely, there is no way to explicitely register multiple external release files without putting them in metadata fields. Goals ----- These are the goals to be achieved by implementation of this PEP: * Package owners should be able to explicitly control which files are presented by PyPI to installer tools as installation candidates. Installation should not be slowed and made less reliable by extensive and unnecessary crawling of links that package owners did not explicitly nominate as installation files. * It should remain possible for package owners to choose to host their release files on their own hosting, external to PyPI. It should be easy for a user to request the installation of such releases using automated installer tools. * Automated installer tools should not install externally-hosted packages **by default**, but only when explicitly authorized to do so by the user. When tools refuse to install such a package by default, they should tell the user exactly which external link(s) they would need to follow, and what option(s) the user can provide to authorize the tool to follow those links. PyPI should provide all necessary metadata for installer tools to implement this easily and within a single request/reply interaction. * Migration from the status quo to the above points should be gradual and minimize breakage. This includes tooling that makes it easy for package owners with an existing release process that uploads to non-PyPI hosting to also upload those release files to PyPI. Solution / two transition phases ================================ The first transition phase introduces a "hosting-mode" field for each project on PyPI, allowing package owners explicit control of which release file links are served to present-day installation tools in the machine-readable ``simple/`` index. The first transition will, after successful hosting-mode manipulations by individual early-adopters, set a default hosting mode for existing packages, based on automated analysis. **Maintainers will be notified one month ahead of any such automated change**. At completion of the first transition phase, **all present-day existing release and installation processes and tools are expected to continue working**. Any remaining errors or problems are expected to only relate to installation of individual packages and can be easily corrected by package maintainers or PyPI admins if maintainers are not reachable. Also in the first phase, each link served in the ``simple/`` index will be explicitly marked as ``rel="internal"`` (hosted by the index itself) or ``rel="external"`` (linking to an external site that is not part of the index). In the second transition phase, PyPI client installation tools shall be updated to default to only install ``rel="internal"`` packages unless a user specifies option(s) to permit installing from external links. Maintainers of packages which currently host release files on non-PyPI sites shall receive instructions and tools to ease "re-hosting" of their historic and future package release files. This re-hosting tool MUST be available before automated hosting-mode changes are announced to package maintainers. Implementation ============== Hosting modes ------------- The foundation of the first transition phase is the introduction of three "modes" of PyPI hosting for a package, affecting which links are generated for the ``simple/`` index. These modes are implemented without requiring changes to installation tools via changes to the algorithm for generating the machine-readable ``simple/`` index. The modes are: - ``pypi-scrape-crawl``: no change from the current situation of generating machine-readable links for installation tools, as outlined in the history_. - ``pypi-scrape``: for a package in this mode, links to be added to the ``simple/`` index are still scraped from package metadata. However, the "Home-page" and "Download-url" links are given ``rel=ext-homepage`` and ``rel=ext-download`` attributes instead of ``rel=homepage`` and ``rel=download``. The effect of this (with no change in installation tools necessary) is that these links will not be followed and scraped for further candidate links by present-day installation tools: only installable files directly hosted from PYPI or linked directly from PyPI metadata will be considered for installation. Installation tools MAY evolve to offer an option to use the new rel-attribution to crawl external pages but MUST NOT default to it. - ``pypi-explicit``: for a package in this mode, only links to release files uploaded to PyPI, and external links to release files explicitly nominated by the package owner (via a new interface exposed by PyPI) will be added to the ``simple/`` index. Thus the hope is that eventually all projects on PyPI can be migrated to the ``pypi-explicit`` mode, while preserving the ability to install release files hosted externally via installer tools. Deprecation of hosting modes to eventually only allow the ``pypi-explicit`` mode is NOT REGULATED by this PEP but is expected to become feasible some time after successful implementation of the transition phases described in this PEP. It is expected that deprecation requires **a new process to deal with abandoned packages** because of unreachable maintainers for still popular packages. First transition phase (PyPI) ----------------------------- The proposed solution consists of multiple implementation and communication steps: #. Implement in PyPI the three modes described above, with an interface for package owners to select the mode for each package and register explicit external file URLs. #. For packages in all modes, label all links in the ``simple/`` index with ``rel="internal"`` or ``rel="external"``, to make it easier for client tools to distinguish the types of links in the second transition phase. #. Default all newly-registered packages to ``pypi-explicit`` mode (package owners can still switch to the other modes as desired). #. Determine (via an automated analysis tool) which packages have all installable files available on PyPI itself (group A), which have all installable files linked directly from PyPI metadata (group B), and which have installable versions available that are linked only from external homepage/download HTML pages (group C). #. Send mail to maintainers of projects in group A that their project will be automatically configured to ``pypi-explicit`` mode in one month, and similarly to maintainers of projects in group B that their project will be automatically configured to ``pypi-scrape`` mode. Inform them that this change is not expected to affect installability of their project at all, but will result in faster and safer installs for their users. Encourage them to set this mode themselves sooner to benefit their users. #. Send mail to maintainers of packages in group C that their package hosting mode is ``pypi-scrape-crawl``, list the URLs which currently are crawled, and suggest that they either re-host their packages directly on PyPI and switch to ``pypi-explicit``, or at least provide direct links to release files in PyPI metadata and switch to ``pypi-scrape``. Provide instructions and tools to help with these transitions. Second transition phase (installer tools) ----------------------------------------- For the second transition phase, maintainers of installation tools are asked to release two updates. The first update shall provide clear warnings if externally-hosted release files (that is, files whose link is ``rel="external"``) are selected for download, for which projects and URLs exactly this happens, and warn that in future versions externally-hosted downloads will be disabled by default. The second update should change the default mode to allow only installation of ``rel="internal"`` package files, and allow installation of externally-hosted packages only when the user supplies an option (ideally an option specifying exactly which external domains are to be trusted as download sources). When download of an externally-hosted package is disallowed, the user should be notified, with instructions for how to make the install succeed and warnings about the implication (that a file will be downloaded from a site that is not part of the package index). Open questions / Tasks =========================== - Should we introduce some form of PyPI API versioning in this PEP? (it might complicate matters and delay the implementation but is often seen as good practise). - in pypi-scrape mode: does PYPI determine itself what are installation candidates and avoids presenting other random links (which are currently served)? - consider that installation tools may choose to release updates during transition phase 1 already, to warn about crawling and scraped links (which are easily identifiable today and after the new rel-attribution after transition phase 1). References ========== .. [1] Donald Stufft, ratio of externally hosted versus pypi-hosted, http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html (XXX need to update this data for all easy_install-supported formats) .. [2] Marc-Andre Lemburg, reasons for external hosting, http://mail.python.org/pipermail/catalog-sig/2013-March/005626.html .. [3] Holger Krekel, Script to remove homepage/download metadata for all releases http://mail.python.org/pipermail/catalog-sig/2013-February/005423.html Acknowledgments ================ Philip Eby for precise information and the basic ideas to implement the transition via server-side changes only. Donald Stufft for pushing away from external hosting and offering to implement both a Pull Request for the necessary PyPI changes and the analysis tool to drive the transition phase 1. Marc-Andre Lemburg, Nick Coghlan and catalog-sig in general for thinking through issues regarding getting rid of "external hosting". Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From pje at telecommunity.com Fri Mar 15 16:15:57 2013 From: pje at telecommunity.com (PJ Eby) Date: Fri, 15 Mar 2013 11:15:57 -0400 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: <20130315092959.GA9677@merlinux.eu> References: <20130315092959.GA9677@merlinux.eu> Message-ID: Do we even need the internal/external rel info? I was planning to just use the URL hostname. i.e., are there any use cases for designating an externally-hosted file internal, or an internally-hosted file external? If not, it seems the rel="" is redundant. It's also more work to implement, vs. just defaulting --allow-hosts to be the --index-url host; a strategy ISTM pip could also use, since it has the same two options available. Also, if we're not doing homepage/download crawling any more, I was hoping we could just drop the code that 'parses' rel="" links in the first place, as it's an awkward ugly hack. ;-) From donald at stufft.io Fri Mar 15 16:22:05 2013 From: donald at stufft.io (Donald Stufft) Date: Fri, 15 Mar 2013 11:22:05 -0400 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: References: <20130315092959.GA9677@merlinux.eu> Message-ID: On Mar 15, 2013, at 11:15 AM, PJ Eby wrote: > Do we even need the internal/external rel info? I was planning to > just use the URL hostname. > > i.e., are there any use cases for designating an externally-hosted > file internal, or an internally-hosted file external? If not, it > seems the rel="" is redundant. > > It's also more work to implement, vs. just defaulting --allow-hosts to > be the --index-url host; a strategy ISTM pip could also use, since it > has the same two options available. > > Also, if we're not doing homepage/download crawling any more, I was > hoping we could just drop the code that 'parses' rel="" links in the > first place, as it's an awkward ugly hack. ;-) > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig It makes things uglier for end users if you have packages and the simple index hosted on several sites. It also just adds extra information so if setuptools/easy_install wants to just use the host case that wouldn't be bad. It's actually more defensible to keep the service (ala PyPI/simple index) and the user uploaded content (ala distribution files) hosted on separate domains as it makes things like gifar style attacks harder to execute. Making a move like that would break mirroring ATM on PyPI but it's good information to include on the simple index to make it simpler for tools to determine what links are internal and what are external. FWIW Crate has the uploaded files on an external domain for just this reason. (Also for CDN reasons but that's because a SSL CDN is $$$$). ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From holger at merlinux.eu Fri Mar 15 16:30:41 2013 From: holger at merlinux.eu (holger krekel) Date: Fri, 15 Mar 2013 15:30:41 +0000 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: References: <20130315092959.GA9677@merlinux.eu> Message-ID: <20130315153041.GF9677@merlinux.eu> On Fri, Mar 15, 2013 at 11:15 -0400, PJ Eby wrote: > Do we even need the internal/external rel info? I was planning to > just use the URL hostname. > > i.e., are there any use cases for designating an externally-hosted > file internal, or an internally-hosted file external? If not, it > seems the rel="" is redundant. > > It's also more work to implement, vs. just defaulting --allow-hosts to > be the --index-url host; a strategy ISTM pip could also use, since it > has the same two options available. > > Also, if we're not doing homepage/download crawling any more, I was > hoping we could just drop the code that 'parses' rel="" links in the > first place, as it's an awkward ugly hack. ;-) We wanted to avoid requiring hostname-checking especially in light of parallel developments putting PYPI release files on a CDN, i.e. non pypi.python.org domains. The "rel=internal" communicates that this link is under control of the index server and the installer should not be worried and users need not know about allow-hosts etc. For example, Donald's https://crate.io is already operating in this manner and has its files on crate-cdn.com. best, holger From carl at oddbird.net Fri Mar 15 17:07:59 2013 From: carl at oddbird.net (Carl Meyer) Date: Fri, 15 Mar 2013 10:07:59 -0600 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: References: <20130315092959.GA9677@merlinux.eu> Message-ID: <5143475F.50708@oddbird.net> On 03/15/2013 09:15 AM, PJ Eby wrote: > Do we even need the internal/external rel info? I was planning to > just use the URL hostname. > > i.e., are there any use cases for designating an externally-hosted > file internal, or an internally-hosted file external? If not, it > seems the rel="" is redundant. Right; Donald and Holger already gave the rationale for this: there are good reasons for an index to not have "internal" links actually on the exact same hostname. Even just using a different subdomain would break simple host comparison. > It's also more work to implement, vs. just defaulting --allow-hosts to > be the --index-url host; a strategy ISTM pip could also use, since it > has the same two options available. Pip actually doesn't currently have --allow-hosts, although there's no good reason for that; it ought to. > Also, if we're not doing homepage/download crawling any more, I was > hoping we could just drop the code that 'parses' rel="" links in the > first place, as it's an awkward ugly hack. ;-) Well, parsing HTML links as an API is an ugly hack, but within that existing framework "rel" seems like the appropriate semantic attribute for this type of information, not really upping the hackiness quotient :-) Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: From carl at oddbird.net Fri Mar 15 17:10:41 2013 From: carl at oddbird.net (Carl Meyer) Date: Fri, 15 Mar 2013 10:10:41 -0600 Subject: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files In-Reply-To: References: <20130313112158.GO9677@merlinux.eu> Message-ID: <51434801.3090300@oddbird.net> Hi Marcus, On 03/15/2013 01:32 AM, Marcus Smith wrote: > > > In addition, maintainers of installation tools are asked to release > two updates. The first one shall provide clear warnings [...] > The second update for installation tools should change the default > mode to allow only installation of package files hosted at the index > domain, > > > sounds good to me. Excellent, having the installer-tool maintainers on-board is obviously important here :-) > It is expected that tools in this release may choose to change the > default index url to ``https://pypi.python.org/simple/-with-ext`` > in > > > so, *eventually*, the /simple interface (that has been transitioned to > only serve pypi links) could be deprecated? > (because new tools would be smart enough to responsibly navigate > /simple/-with-ext) > > but slightly ironic that we'd be left with an interface called > "simple/-with-ext", given the goal of all this, but it makes sense. Right, it was precisely this awkwardness (the likelihood that tools would want to default to -with-ext and use host-comparison to distinguish internal/external, so as to provide info about external links with a single request-response) that led us to eliminate the separate indexes in our latest V4 draft and use rel attributes to distinguish link types. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: From pje at telecommunity.com Fri Mar 15 17:51:11 2013 From: pje at telecommunity.com (PJ Eby) Date: Fri, 15 Mar 2013 12:51:11 -0400 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: <5143475F.50708@oddbird.net> References: <20130315092959.GA9677@merlinux.eu> <5143475F.50708@oddbird.net> Message-ID: On Fri, Mar 15, 2013 at 12:07 PM, Carl Meyer wrote: > On 03/15/2013 09:15 AM, PJ Eby wrote: >> Do we even need the internal/external rel info? I was planning to >> just use the URL hostname. >> >> i.e., are there any use cases for designating an externally-hosted >> file internal, or an internally-hosted file external? If not, it >> seems the rel="" is redundant. > > Right; Donald and Holger already gave the rationale for this: there are > good reasons for an index to not have "internal" links actually on the > exact same hostname. Even just using a different subdomain would break > simple host comparison. > >> It's also more work to implement, vs. just defaulting --allow-hosts to >> be the --index-url host; a strategy ISTM pip could also use, since it >> has the same two options available. > > Pip actually doesn't currently have --allow-hosts, although there's no > good reason for that; it ought to. > >> Also, if we're not doing homepage/download crawling any more, I was >> hoping we could just drop the code that 'parses' rel="" links in the >> first place, as it's an awkward ugly hack. ;-) > > Well, parsing HTML links as an API is an ugly hack, but within that > existing framework "rel" seems like the appropriate semantic attribute > for this type of information, not really upping the hackiness quotient :-) Well, to be clear, I liked previous versions of the proposal better than this one. But while I *really* don't want to do any new rel parsing, that's not the only or even the most important reason. The main reason is that I think internal vs. external is a bogus distinction: what's important (IMO) is what hosts you do and don't trust. Giving a blanket pass to all external links doesn't seem like such a good idea to me, nor does allowing the index to define what hosts the client should trust. As for the internal ones, I'm not sure why we can't at least make a subdomain requirement, or have users explicitly add a PyPI CDN to their configured --allow-hosts. To try to put it another way: there should be one, and preferably only one, obvious way to specify where you get downloads from. That way in easy_install is currently --allow-hosts. Adding new options that interact and overlap with that looks like bad UI design to me, increasing the possibility of user confusion. From donald at stufft.io Fri Mar 15 18:00:11 2013 From: donald at stufft.io (Donald Stufft) Date: Fri, 15 Mar 2013 13:00:11 -0400 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: References: <20130315092959.GA9677@merlinux.eu> <5143475F.50708@oddbird.net> Message-ID: <82AFA590-D17E-443C-A57F-6B4AB466DEB0@stufft.io> On Mar 15, 2013, at 12:51 PM, PJ Eby wrote: > On Fri, Mar 15, 2013 at 12:07 PM, Carl Meyer wrote: >> On 03/15/2013 09:15 AM, PJ Eby wrote: >>> Do we even need the internal/external rel info? I was planning to >>> just use the URL hostname. >>> >>> i.e., are there any use cases for designating an externally-hosted >>> file internal, or an internally-hosted file external? If not, it >>> seems the rel="" is redundant. >> >> Right; Donald and Holger already gave the rationale for this: there are >> good reasons for an index to not have "internal" links actually on the >> exact same hostname. Even just using a different subdomain would break >> simple host comparison. >> >>> It's also more work to implement, vs. just defaulting --allow-hosts to >>> be the --index-url host; a strategy ISTM pip could also use, since it >>> has the same two options available. >> >> Pip actually doesn't currently have --allow-hosts, although there's no >> good reason for that; it ought to. >> >>> Also, if we're not doing homepage/download crawling any more, I was >>> hoping we could just drop the code that 'parses' rel="" links in the >>> first place, as it's an awkward ugly hack. ;-) >> >> Well, parsing HTML links as an API is an ugly hack, but within that >> existing framework "rel" seems like the appropriate semantic attribute >> for this type of information, not really upping the hackiness quotient :-) > > Well, to be clear, I liked previous versions of the proposal better > than this one. But while I *really* don't want to do any new rel > parsing, that's not the only or even the most important reason. > > The main reason is that I think internal vs. external is a bogus > distinction: what's important (IMO) is what hosts you do and don't > trust. Giving a blanket pass to all external links doesn't seem like > such a good idea to me, nor does allowing the index to define what > hosts the client should trust. As for the internal ones, I'm not > sure why we can't at least make a subdomain requirement, or have users > explicitly add a PyPI CDN to their configured --allow-hosts. > > To try to put it another way: there should be one, and preferably only > one, obvious way to specify where you get downloads from. That way in > easy_install is currently --allow-hosts. Adding new options that > interact and overlap with that looks like bad UI design to me, > increasing the possibility of user confusion. > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig You can do that fwiw. That's fine. You can optionally just use the internal links as a indicator about which hosts should automatically be added to the a--allow-hosts for a particular index. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From carl at oddbird.net Fri Mar 15 18:39:58 2013 From: carl at oddbird.net (Carl Meyer) Date: Fri, 15 Mar 2013 11:39:58 -0600 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: References: <20130315092959.GA9677@merlinux.eu> <5143475F.50708@oddbird.net> Message-ID: <51435CEE.3020505@oddbird.net> On 03/15/2013 10:51 AM, PJ Eby wrote: > Giving a blanket pass to all external links doesn't seem like > such a good idea to me, This is a very good point, and it should be made clearer in the PEP that we don't recommend a single blanket option to allow all external links, but an option (like allow-hosts) that lets you specify with more granularity which external links to use. I think perhaps rel="external" confuses this point; the real purpose of the rel tags is just so that rel="internal" can be considered "part of the index." FWIW I think it would be just as reasonable UI for a hypothetical tool to let you say "I want to trust external links for the Foo project" rather than "I want to trust external links to djangoproject.com" and avoid host-comparison altogether. IOW, I don't think "hostname" is inherently a better or safer indicator of trust than "project name"; hosts can change ownership at least as easily and silently as PyPI projects! So I don't think the PEP should require all installer tools to choose trust-by-hostname (which would be implied by removing the rel tags). > nor does allowing the index to define what > hosts the client should trust. I'm not sure about this. By using an index at all, you are trusting that index to provide whatever level of reliability/stability/security/whatever you expect from it. Allowing the index itself to specify that it keeps its files on a different host in a way that is transparent to the user seems like a natural extension of this trust that doesn't harm anything and aids usability greatly. (Cases where the index is lying to you definitely fall outside the scope of what this PEP is aiming to help with.) As for the internal ones, I'm not > sure why we can't at least make a subdomain requirement, or have users > explicitly add a PyPI CDN to their configured --allow-hosts. Even a subdomain requirement can make a CDN more difficult/expensive to implement. And once you go beyond simple host-equality comparisons and into subdomain-equivalence I'm wary of the added implementation complexity we're asking of every installer tool, and the potential for subtle differences in implementation. This seems to me like a worse can of worms than rel-parsing. > To try to put it another way: there should be one, and preferably only > one, obvious way to specify where you get downloads from. That way in > easy_install is currently --allow-hosts. Adding new options that > interact and overlap with that looks like bad UI design to me, > increasing the possibility of user confusion. Like Donald says, I don't see any problem with you choosing to keep allow-hosts as the only user-facing option for easy_install. It would be up to you whether you also want to use rel="internal" as a hint for implicitly (perhaps with warning) adding to --allow-hosts, to allow better compatibility with indexes that use a different host for file-hosting (it's possible that even PyPI itself may move into this category, I haven't been following the CDN discussions carefully). PyPI wouldn't be enforcing a UI on you here, just providing metadata that you can use as you wish. I do think the internal/external distinction is meaningful and unambiguous metadata that the index is able to provide, and there's no reason for the index to withhold it. (That distinction is not new in this version of the PEP, either, it's just made via rel tags now instead of via a separate index.) Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: From mal at python.org Fri Mar 15 16:47:34 2013 From: mal at python.org (M.-A. Lemburg) Date: Fri, 15 Mar 2013 16:47:34 +0100 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: <20130315092959.GA9677@merlinux.eu> References: <20130315092959.GA9677@merlinux.eu> Message-ID: <51434296.9030503@python.org> Thanks, Holger. This version looks a lot better :-) There are still some minor quirks which would need to be addressed more explicitly, but overall, this proposal provides a good way forward. Perhaps it would also be possible to add the secured download links and the caching/proxying ideas to the PEP at some point, or we turn those into a new PEP. I can't follow up in detail today, but will have a closer look next week. On 15.03.2013 10:29, holger krekel wrote: > Hi all, in particular Philip, Marc-Andre, Donald, > > Carl and me decided to simplify the PEP and avoid the somewhat > awkward ``simple/-with-externals`` index for various reasons, among them > Marc-Andre's criticisms. This also means present-day installation tools > (shipped with Redhat/Debian/etc.) will continue to work as today for > those packages which remain in a hosting-mode that requires crawling and > scraping. They will still benefit from the fact that most packages will > soon have a hosting-mode that avoids it. Future releases of installation > tools will default to not perform crawling or using (scraped) external > links, and new PYPI projects will default to only serve uploaded files. > > The V4 pre-PEP also renames the three PyPI hosting modes to be more > descriptive. Since all three modes allow external links, "pypi-ext" vs > "pypi-only" were misleading. The new naming distinguishes the mode that both > scrapes links from metadata and crawls external pages for more links > ("pypi-scrape-crawl") from the mode that only scrapes links from metadata > ("pypi-scrape") from the mode where all links are explicit ("pypi-explicit"). > > Without the separate external index, it also turns out that the two transition > phases are separated into PyPI changes (phase one) and installer-tool > updates (phase two). There are no PyPI changes necessary in phase two. > As stated in a new open question, it should be possible to do > PEP-related installation tool updates during phase 1, that may require > a bit of clarification in the PEP's language still. > > Carl and me are happy with this PEP version now and hope you all are as > well. Donald is already working on improving the analysis tool so > we hopefully have some updated numbers soon. > > cheers, > > Holger > > > PEP: XXX > Title: Transitioning to release-file hosting on PyPI > Version: $Revision$ > Last-Modified: $Date$ > Author: Holger Krekel , Carl Meyer > Discussions-To: catalog-sig at python.org > Status: Draft (PRE-submit V4) > Type: Process > Content-Type: text/x-rst > Created: 10-Mar-2013 > Post-History: > > > Abstract > ======== > > This PEP proposes a backward-compatible two-phase transition process > to speed up, simplify and robustify installing from the > pypi.python.org (PyPI) package index. To ease the transition and > minimize client-side friction, **no changes to distutils or existing > installation tools are required in order to benefit from the first > transition phase, which will result in faster, more reliable installs > for most existing packages**. > > The first transition phase implements an easy and explicit means for a > package maintainer to control which release file links are served to > present-day installation tools. The first phase also includes the > implementation of analysis tools for present-day packages, to support > communication with package maintainers and the automated setting of > default modes for controlling release file links. The first phase > also will make new projects on PYPI use a default to only serve > links to release files which were uploaded to PYPI. > > The second transition phase concerns end-user installation tools, > which shall default to only install release files that are hosted on > PyPI and tell the user if external release files exist, offering > a choice to automatically use those external files. > > > Rationale > ========= > > .. _history: > > History and motivations for external hosting > -------------------------------------------- > > When PyPI went online, it offered release registration but had no > facility to host release files itself. When hosting was added, no > automated downloading tool existed yet. When Philip Eby implemented > automated downloading (through setuptools), he made the choice to > allow people to use download hosts of their choice. The finding of > externally-hosted packages was implemented as follows: > > #. The PyPI ``simple/`` index for a package contains all links found > by scraping them from that package's long_description metadata for > any release. Links in the "Download-URL" and "Home-page" metadata > fields are given ``rel=download`` and ``rel=homepage`` attributes, > respectively. > > #. Any of these links whose target is a file whose name appears to be > in the form of an installable source or binary distribution, with > name in the form "packagename-version.ARCHIVEEXT", is considered a > potential installation candidate by installation tools. > > #. Similarly, any links suffixed with an "#egg=packagename-version" > fragment are considered an installation candidate. > > #. Additionally, the ``rel=homepage`` and ``rel=download`` links are > crawled by installation tools and, if HTML, are themselves scraped > for release-file links in the above formats. > > Today, most packages released on PyPI host their release files on > PyPI, but a small percentage (XXX need updated data) rely on external > hosting. > > There are many reasons [2]_ why people have chosen external > hosting. To cite just a few: > > - release processes and scripts have been developed already and upload > to external sites > > - it takes too long to upload large files from some places in the > world > > - export restrictions e.g. for crypto-related software > > - company policies which require offering open source packages > through own sites > > - problems with integrating uploading to PyPI into one's release > process (because of release policies) > > - desiring download statistics different from those maintained by PyPI > > - perceived bad reliability of PyPI > > - not aware that PyPI offers file-hosting > > Irrespective of the present-day validity of these reasons, there > clearly is a history why people choose to host files externally and it > even was for some time the only way you could do things. This PEP > takes the position that there are at least some valid reasons for > external hosting. > > Problem > ------- > > **Today, python package installers (pip, easy_install, buildout, and > others) often need to query many non-PyPI URLs even if there are no > externally hosted files**. Apart from querying pypi.python.org's > simple index pages, also all homepages and download pages ever > specified with any release of a package are crawled by an installer. > The need for installers to crawl external sites slows down > installation and makes for a brittle and unreliable installation > process. Those sites and packages also don't take part in the > :pep:`381` mirroring infrastructure, further decreasing reliability > and speed of automated installation processes around the world. > > Most packages are hosted directly on pypi.python.org [1]_. Even for > these packages, installers still crawl their homepage and > download-url, if specified. Many package uploaders are not aware that > specifying the "homepage" or "download-url" in their package metadata > will needlessly slow down the installation process for all users. > > Relying on third party sites also opens up more attack vectors for > injecting malicious packages into sites using automated installs. A > simple attack might just involve getting hold of an old now-unused > homepage domain and placing malicious packages there. Moreover, > performing a Man-in-The-Middle (MITM) attack between an installation > site and any of the download sites can inject malicious packages on > the installation site. As many homepages and download locations are > using HTTP and not HTTPS, such attacks are not hard to launch. Such > MITM attacks can easily happen even for packages which never intended > to host files externally as their homepages are contacted by > installers anyway. > > There is currently no way for package maintainers to avoid > external-link crawling, other than removing all homepage/download url > metadata for all historic releases. While a script [3]_ has been > written to perform this action, it is not a good general solution > because it removes useful metadata from PyPI releases. > > Even if the sites referenced by "Homepage" and "Download-URL" links were > not scraped for further links, there is no obvious way under the current > system for a package owner to link to an installable file from a > long_description metadata field (which is shown as package documentation > on ``/pypi/PKG``) without installation tools automatically considering > that file a candidate for installation. Conversely, there is no way > to explicitely register multiple external release files without > putting them in metadata fields. > > > Goals > ----- > > These are the goals to be achieved by implementation of this PEP: > > * Package owners should be able to explicitly control which files are > presented by PyPI to installer tools as installation > candidates. Installation should not be slowed and made less reliable > by extensive and unnecessary crawling of links that package owners > did not explicitly nominate as installation files. > > * It should remain possible for package owners to choose to host their > release files on their own hosting, external to PyPI. It should be > easy for a user to request the installation of such releases using > automated installer tools. > > * Automated installer tools should not install externally-hosted > packages **by default**, but only when explicitly authorized to do > so by the user. When tools refuse to install such a package by > default, they should tell the user exactly which external link(s) > they would need to follow, and what option(s) the user can provide > to authorize the tool to follow those links. PyPI should provide all > necessary metadata for installer tools to implement this easily > and within a single request/reply interaction. > > * Migration from the status quo to the above points should be gradual > and minimize breakage. This includes tooling that makes it easy for > package owners with an existing release process that uploads to > non-PyPI hosting to also upload those release files to PyPI. > > > Solution / two transition phases > ================================ > > The first transition phase introduces a "hosting-mode" field for each > project on PyPI, allowing package owners explicit control of which > release file links are served to present-day installation tools in the > machine-readable ``simple/`` index. The first transition will, after > successful hosting-mode manipulations by individual early-adopters, > set a default hosting mode for existing packages, based on > automated analysis. **Maintainers will be notified one month ahead of > any such automated change**. At completion of the first transition > phase, **all present-day existing release and installation processes > and tools are expected to continue working**. Any remaining errors or > problems are expected to only relate to installation of individual > packages and can be easily corrected by package maintainers or PyPI > admins if maintainers are not reachable. > > Also in the first phase, each link served in the ``simple/`` index > will be explicitly marked as ``rel="internal"`` (hosted by the index > itself) or ``rel="external"`` (linking to an external site that is not > part of the index). > > In the second transition phase, PyPI client installation tools shall > be updated to default to only install ``rel="internal"`` packages > unless a user specifies option(s) to permit installing from external > links. > > Maintainers of packages which currently host release files on non-PyPI > sites shall receive instructions and tools to ease "re-hosting" of > their historic and future package release files. This re-hosting tool > MUST be available before automated hosting-mode changes are announced > to package maintainers. > > > Implementation > ============== > > Hosting modes > ------------- > > The foundation of the first transition phase is the introduction of > three "modes" of PyPI hosting for a package, affecting which links are > generated for the ``simple/`` index. These modes are implemented > without requiring changes to installation tools via changes to the > algorithm for generating the machine-readable ``simple/`` index. > > The modes are: > > - ``pypi-scrape-crawl``: no change from the current situation of > generating machine-readable links for installation tools, as > outlined in the history_. > > - ``pypi-scrape``: for a package in this mode, links to be added to > the ``simple/`` index are still scraped from package > metadata. However, the "Home-page" and "Download-url" links are > given ``rel=ext-homepage`` and ``rel=ext-download`` attributes > instead of ``rel=homepage`` and ``rel=download``. The effect of this > (with no change in installation tools necessary) is that these links > will not be followed and scraped for further candidate links by present-day > installation tools: only installable files directly hosted from PYPI or > linked directly from PyPI metadata will be considered for installation. > Installation tools MAY evolve to offer an option to use the new > rel-attribution to crawl external pages but MUST NOT default to it. > > - ``pypi-explicit``: for a package in this mode, only links to release > files uploaded to PyPI, and external links to release files > explicitly nominated by the package owner (via a new interface > exposed by PyPI) will be added to the ``simple/`` index. > > Thus the hope is that eventually all projects on PyPI can be migrated > to the ``pypi-explicit`` mode, while preserving the ability to install > release files hosted externally via installer tools. Deprecation of > hosting modes to eventually only allow the ``pypi-explicit`` mode is > NOT REGULATED by this PEP but is expected to become feasible some time > after successful implementation of the transition phases described in > this PEP. It is expected that deprecation requires **a new process to deal > with abandoned packages** because of unreachable maintainers for still > popular packages. > > > First transition phase (PyPI) > ----------------------------- > > The proposed solution consists of multiple implementation and > communication steps: > > #. Implement in PyPI the three modes described above, with an > interface for package owners to select the mode for each package > and register explicit external file URLs. > > #. For packages in all modes, label all links in the ``simple/`` index > with ``rel="internal"`` or ``rel="external"``, to make it easier > for client tools to distinguish the types of links in the second > transition phase. > > #. Default all newly-registered packages to ``pypi-explicit`` mode > (package owners can still switch to the other modes as desired). > > #. Determine (via an automated analysis tool) which packages have all > installable files available on PyPI itself (group A), which have > all installable files linked directly from PyPI metadata (group B), > and which have installable versions available that are linked only > from external homepage/download HTML pages (group C). > > #. Send mail to maintainers of projects in group A that their project > will be automatically configured to ``pypi-explicit`` mode in one > month, and similarly to maintainers of projects in group B that > their project will be automatically configured to ``pypi-scrape`` > mode. Inform them that this change is not expected to affect > installability of their project at all, but will result in faster > and safer installs for their users. Encourage them to set this > mode themselves sooner to benefit their users. > > #. Send mail to maintainers of packages in group C that their package > hosting mode is ``pypi-scrape-crawl``, list the URLs which > currently are crawled, and suggest that they either re-host their > packages directly on PyPI and switch to ``pypi-explicit``, or at > least provide direct links to release files in PyPI metadata and > switch to ``pypi-scrape``. Provide instructions and tools to help > with these transitions. > > > Second transition phase (installer tools) > ----------------------------------------- > > For the second transition phase, maintainers of installation tools are > asked to release two updates. > > The first update shall provide clear warnings if externally-hosted > release files (that is, files whose link is ``rel="external"``) are > selected for download, for which projects and URLs exactly this > happens, and warn that in future versions externally-hosted downloads > will be disabled by default. > > The second update should change the default mode to allow only > installation of ``rel="internal"`` package files, and allow > installation of externally-hosted packages only when the user supplies > an option (ideally an option specifying exactly which external domains > are to be trusted as download sources). When download of an > externally-hosted package is disallowed, the user should be notified, > with instructions for how to make the install succeed and warnings > about the implication (that a file will be downloaded from a site that > is not part of the package index). > > > Open questions / Tasks > =========================== > > - Should we introduce some form of PyPI API versioning in this PEP? > (it might complicate matters and delay the implementation but is > often seen as good practise). > > - in pypi-scrape mode: does PYPI determine itself what are installation > candidates and avoids presenting other random links (which are currently > served)? > > - consider that installation tools may choose to release updates > during transition phase 1 already, to warn about crawling and scraped > links (which are easily identifiable today and after the new rel-attribution > after transition phase 1). > > > References > ========== > > .. [1] Donald Stufft, ratio of externally hosted versus pypi-hosted, http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html (XXX need to update this data for all easy_install-supported formats) > > .. [2] Marc-Andre Lemburg, reasons for external hosting, http://mail.python.org/pipermail/catalog-sig/2013-March/005626.html > > .. [3] Holger Krekel, Script to remove homepage/download metadata for all releases http://mail.python.org/pipermail/catalog-sig/2013-February/005423.html > > Acknowledgments > ================ > > Philip Eby for precise information and the basic ideas to implement > the transition via server-side changes only. > > Donald Stufft for pushing away from external hosting and offering to > implement both a Pull Request for the necessary PyPI changes and the > analysis tool to drive the transition phase 1. > > Marc-Andre Lemburg, Nick Coghlan and catalog-sig in general for > thinking through issues regarding getting rid of "external hosting". > > Copyright > ========= > > This document has been placed in the public domain. > > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > -- Marc-Andre Lemburg PSF Vice Chairman From pje at telecommunity.com Fri Mar 15 19:59:56 2013 From: pje at telecommunity.com (PJ Eby) Date: Fri, 15 Mar 2013 14:59:56 -0400 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: <51435CEE.3020505@oddbird.net> References: <20130315092959.GA9677@merlinux.eu> <5143475F.50708@oddbird.net> <51435CEE.3020505@oddbird.net> Message-ID: On Fri, Mar 15, 2013 at 1:39 PM, Carl Meyer wrote: > up to you whether you also want to use rel="internal" as a hint for > implicitly (perhaps with warning) adding to --allow-hosts, That's the bit I don't like. The security model is that if it's not allowed by allowed-hosts, it's *not allowed*. Introducing a way to sneak something past allow-hosts is a bad idea, because it means people either have to explicitly widen their allow-hosts to arbitrary hosts, or else that you can't actually enforce an allowed-hosts policy, or that you need to learn a whole bunch of options to implement it. ISTM that this is a bad design choice for users, and I'm not comfortable with this without some way to define the allowed "internal" hosts based in some way on the base index URL. Not just for ease of automated translation, but so that *users* can know who they're dealing with, and easily predict the effects of their chosen options. A frequent refrain has been, "users don't know they're downloading stuff from places other than PyPI", so if this new approach allows downloads from somewhere other than *.pypi.python.org when you've chosen pypi.python.org as your index, ISTM the proposal is failing to meet its original goals. As the PEP is written, PyPI could change out to a different CDN each week or use different ones for different files, and users would be back in the position of not being sure where stuff is coming from. I'm fine with extending the default host matching to "indexhost,*.indexhost" if we want to leave more of an option for PyPI and other indexes to use a CDN. But I'm not sure how much point to it there is, since a /simple index is static, and small in size compared to the downloads, so you might as well host a copy of the /simple index alongside the downloads, and make the index pypicdn.com/simple or whatever in the first place. (In other words, not a lot of benefit to splitting a static index from its associated files, so why support it?) > PyPI wouldn't be enforcing a UI on you here, just providing metadata > that you can use as you wish. That's not what the PEP says. It does in fact *mandate* the use of the rel attributes. So if somebody adds an "external link" that actually points back to PyPI, technically I'm not supposed to use it unless it's been explicitly authorized. ;-) I'd really prefer to see explicit language that says the rel information is advisory only and that installers aren't required to parse it, let alone use it. At the moment, the PEP is a substantial departure from the version I agreed with. (If there were to be any meaningful distinction in the links themselves, I would think it'd more be whether, e.g. hash information is available for the download. That's a potentially relevant distinction right now, in that PyPI automatically provides #md5 info. Even so, I'm not sure that's enough of a distinction for anyone to care about.) From mal at egenix.com Fri Mar 15 22:24:36 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 15 Mar 2013 22:24:36 +0100 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: <51434296.9030503@python.org> References: <20130315092959.GA9677@merlinux.eu> <51434296.9030503@python.org> Message-ID: <51439194.2070207@egenix.com> A little off-topic, but I thought you might enjoy this in the context of all the crypto, hash and signing debate: http://xkcd.com/1181/ Cheers, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 15 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From carl at oddbird.net Sat Mar 16 00:16:19 2013 From: carl at oddbird.net (Carl Meyer) Date: Fri, 15 Mar 2013 17:16:19 -0600 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: References: <20130315092959.GA9677@merlinux.eu> <5143475F.50708@oddbird.net> <51435CEE.3020505@oddbird.net> Message-ID: <5143ABC3.8030603@oddbird.net> tl;dr: I see your points, we'll change the PEP to allow clients to use hostnames instead of the rel attributes if they prefer. More comments below: On 03/15/2013 12:59 PM, PJ Eby wrote: > That's the bit I don't like. The security model is that if it's not > allowed by allowed-hosts, it's *not allowed*. Introducing a way to > sneak something past allow-hosts is a bad idea, because it means > people either have to explicitly widen their allow-hosts to arbitrary > hosts, or else that you can't actually enforce an allowed-hosts > policy, or that you need to learn a whole bunch of options to > implement it. > > ISTM that this is a bad design choice for users, and I'm not > comfortable with this without some way to define the allowed > "internal" hosts based in some way on the base index URL. Not just > for ease of automated translation, but so that *users* can know who > they're dealing with, and easily predict the effects of their chosen > options. > > A frequent refrain has been, "users don't know they're downloading > stuff from places other than PyPI", so if this new approach allows > downloads from somewhere other than *.pypi.python.org when you've > chosen pypi.python.org as your index, ISTM the proposal is failing to > meet its original goals. As the PEP is written, PyPI could change out > to a different CDN each week or use different ones for different > files, and users would be back in the position of not being sure where > stuff is coming from. I guess the key question is the definition of "places other than PyPI." I think a CDN that is part of the index's architecture is just as much "part of PyPI" whether it's on the same domain or not. But I understand the difficulty integrating this with the --allow-hosts option in a way that maintains a clear and simple UI. > I'm fine with extending the default host matching to > "indexhost,*.indexhost" if we want to leave more of an option for PyPI > and other indexes to use a CDN. But I'm not sure how much point to it > there is, since a /simple index is static, and small in size compared > to the downloads, so you might as well host a copy of the /simple > index alongside the downloads, and make the index pypicdn.com/simple > or whatever in the first place. (In other words, not a lot of benefit > to splitting a static index from its associated files, so why support > it?) Putting the /simple/ API on a CDN isn't quite that easy because it currently involves some server-side redirects to effectively make project names case-insensitive. I think in a hypothetical re-architecture of PyPI there may be good security reasons to put user-uploaded files on a different domain from dynamic portions of the API (Donald alluded to this, more discussion at http://security.stackexchange.com/questions/11756/is-it-safe-to-serve-any-user-uploaded-file-under-only-white-listed-mime-content). So I think this issue may come up again in the future. But I'm fine with deferring it in this PEP for now... >> PyPI wouldn't be enforcing a UI on you here, just providing metadata >> that you can use as you wish. > > That's not what the PEP says. It does in fact *mandate* the use of > the rel attributes. So if somebody adds an "external link" that > actually points back to PyPI, technically I'm not supposed to use it > unless it's been explicitly authorized. ;-) > > I'd really prefer to see explicit language that says the rel > information is advisory only and that installers aren't required to > parse it, let alone use it. At the moment, the PEP is a substantial > departure from the version I agreed with. Ok, pending agreement from Holger I'll make a change in the PEP to explicitly allow clients to make decisions based on either the rel attributes or based on hostnames. Would that be sufficient to address your concerns? Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: From pje at telecommunity.com Sat Mar 16 03:01:57 2013 From: pje at telecommunity.com (PJ Eby) Date: Fri, 15 Mar 2013 22:01:57 -0400 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: <5143ABC3.8030603@oddbird.net> References: <20130315092959.GA9677@merlinux.eu> <5143475F.50708@oddbird.net> <51435CEE.3020505@oddbird.net> <5143ABC3.8030603@oddbird.net> Message-ID: On Fri, Mar 15, 2013 at 7:16 PM, Carl Meyer wrote: > Ok, pending agreement from Holger I'll make a change in the PEP to > explicitly allow clients to make decisions based on either the rel > attributes or based on hostnames. Would that be sufficient to address > your concerns? Yes. I just don't want to be in a situation down the road where there's another argument about this on Catalog-SIG when PyPI starts using a CDN that, "but it says this in the rel and you're supposed to use that", and I say, "but Carl and Holger said..." and they go, "doesn't matter, PEP says" ;-) This way, the PEP will be clear that supporting a split of PyPI's hostnames isn't in current scope. I am also okay with the PEP allowing *.indexhost instead of just indexhost as the filtering mechanism, as long as it specifies one *now*. (Again, so this doesn't have to be revisited later.) If somebody who knows something about CDNs, TUF, etc., needs to weigh in on it first, that's fine. I just want to know where things stand. > Putting the /simple/ API on a CDN isn't quite that easy because it > currently involves some server-side redirects to effectively make > project names case-insensitive. FWIW, easy_install works fine without this. If a matching index page isn't found, it checks the full package list. PyPI's redirection just reduces bandwidth usage and request overhead in the case where the case of the user's request doesn't match the actual package listing. But it could be completely static without affecting easy_install and tools that use its package-finding code. From holger at merlinux.eu Sat Mar 16 06:30:18 2013 From: holger at merlinux.eu (holger krekel) Date: Sat, 16 Mar 2013 05:30:18 +0000 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: References: <20130315092959.GA9677@merlinux.eu> <5143475F.50708@oddbird.net> <51435CEE.3020505@oddbird.net> <5143ABC3.8030603@oddbird.net> Message-ID: <20130316053018.GL9677@merlinux.eu> On Fri, Mar 15, 2013 at 22:01 -0400, PJ Eby wrote: > On Fri, Mar 15, 2013 at 7:16 PM, Carl Meyer wrote: > > Ok, pending agreement from Holger I'll make a change in the PEP to > > explicitly allow clients to make decisions based on either the rel > > attributes or based on hostnames. Would that be sufficient to address > > your concerns? > > Yes. I just don't want to be in a situation down the road where > there's another argument about this on Catalog-SIG when PyPI starts > using a CDN that, "but it says this in the rel and you're supposed to > use that", and I say, "but Carl and Holger said..." and they go, > "doesn't matter, PEP says" ;-) > > This way, the PEP will be clear that supporting a split of PyPI's > hostnames isn't in current scope. > > I am also okay with the PEP allowing *.indexhost instead of just > indexhost as the filtering mechanism, as long as it specifies one > *now*. (Again, so this doesn't have to be revisited later.) If > somebody who knows something about CDNs, TUF, etc., needs to weigh in > on it first, that's fine. I just want to know where things stand. One related question. The "rel=internal" links will contain a (md5 currently) hash so if the referenced resource resolves to a file matching that hash, we can be sure about its integrity. What kind of security does host-checking add on top? holger > > Putting the /simple/ API on a CDN isn't quite that easy because it > > currently involves some server-side redirects to effectively make > > project names case-insensitive. > > FWIW, easy_install works fine without this. If a matching index page > isn't found, it checks the full package list. PyPI's redirection just > reduces bandwidth usage and request overhead in the case where the > case of the user's request doesn't match the actual package listing. > But it could be completely static without affecting easy_install and > tools that use its package-finding code. > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From ncoghlan at gmail.com Sat Mar 16 08:15:06 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 16 Mar 2013 00:15:06 -0700 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: <5143ABC3.8030603@oddbird.net> References: <20130315092959.GA9677@merlinux.eu> <5143475F.50708@oddbird.net> <51435CEE.3020505@oddbird.net> <5143ABC3.8030603@oddbird.net> Message-ID: On 15 Mar 2013 16:16, "Carl Meyer" wrote: > > tl;dr: I see your points, we'll change the PEP to allow clients to use > hostnames instead of the rel attributes if they prefer. I will veto any such change. Clients MUST NOT assume that the architecture of the index service will be limited to a single host name, they must process the explicit metadata provided by the index that indicates which hosts the index controls. Adding a "--trust-indices" flag to make this optional in setuptools would be fine, but it seems perverse to trust every aspect of an index *except* its claims to control additional hosts. Regards, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Mon Mar 18 00:09:28 2013 From: carl at oddbird.net (Carl Meyer) Date: Sun, 17 Mar 2013 16:09:28 -0700 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: References: <20130315092959.GA9677@merlinux.eu> <5143475F.50708@oddbird.net> <51435CEE.3020505@oddbird.net> <5143ABC3.8030603@oddbird.net> Message-ID: <51464D28.3080906@oddbird.net> On 03/16/2013 12:15 AM, Nick Coghlan wrote: > On 15 Mar 2013 16:16, "Carl Meyer" > wrote: >> >> tl;dr: I see your points, we'll change the PEP to allow clients to use >> hostnames instead of the rel attributes if they prefer. > > I will veto any such change. Clients MUST NOT assume that the > architecture of the index service will be limited to a single host name, > they must process the explicit metadata provided by the index that > indicates which hosts the index controls. > > Adding a "--trust-indices" flag to make this optional in setuptools > would be fine, but it seems perverse to trust every aspect of an index > *except* its claims to control additional hosts. Ok, based on this I retract my earlier comment. I've pushed a minor update to the PEP (at https://bitbucket.org/hpk42/pep-pypi, not yet at python.org) to clarify explicitly that indexes may choose to host internal files on a separate host/domain. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: From tk47 at students.poly.edu Mon Mar 18 07:15:41 2013 From: tk47 at students.poly.edu (Trishank Karthik Kuppusamy) Date: Mon, 18 Mar 2013 02:15:41 -0400 Subject: [Catalog-sig] A modest proposal for securing PyPI with TUF In-Reply-To: <51411598.1010100@students.poly.edu> References: <51401FB3.7000408@students.poly.edu> <5140432C.7000904@students.poly.edu> <51411598.1010100@students.poly Message-ID: <5146B10D.1050606@students.poly.edu> On 3/13/13 8:11 PM, Trishank Karthik Kuppusamy wrote: > > Speaking of which, it may be the case that our design document for > integrating PyPI with TUF may not be terribly easy to understand. (After > all, you do need to understand TUF first, but TUF is fairly easy once > you understand its main ideas.) I plan to publish a friendlier document > which introduce TUF at a very high-level and instead discuss more > pragmatic issues (such as workflows). We presented a lightning talk on PyPI + TUF + pip at PyCon yesterday, and perhaps it would make things easier to understand: https://www.youtube.com/watch?v=2sx1lS6cT3g https://docs.google.com/presentation/d/1FMptD5sMH41BTgS3-PN0-7j5Zqvs_zZZ3ntsD_4u-7w/edit?usp=sharing From pje at telecommunity.com Mon Mar 18 18:22:20 2013 From: pje at telecommunity.com (PJ Eby) Date: Mon, 18 Mar 2013 13:22:20 -0400 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: References: <20130315092959.GA9677@merlinux.eu> <5143475F.50708@oddbird.net> <51435CEE.3020505@oddbird.net> <5143ABC3.8030603@oddbird.net> Message-ID: On Sat, Mar 16, 2013 at 3:15 AM, Nick Coghlan wrote: > > On 15 Mar 2013 16:16, "Carl Meyer" wrote: >> >> tl;dr: I see your points, we'll change the PEP to allow clients to use >> hostnames instead of the rel attributes if they prefer. > > I will veto any such change. Clients MUST NOT assume that the architecture > of the index service will be limited to a single host name, they must > process the explicit metadata provided by the index that indicates which > hosts the index controls. > > Adding a "--trust-indices" flag to make this optional in setuptools would be > fine, but it seems perverse to trust every aspect of an index *except* its > claims to control additional hosts. Actually, setuptools trusts redirects, so that mechanism is available for splitting the hosted files to another domain. As it stands, though, I don't see a way to support this without introducing confusion. The advantage of using allow-hosts based on the index host is that it *also* specifies what to do with dependency links provided by individual packages; the PEP does not provide any real guidance on this point. So, I have to withdraw my support for the PEP with these recent changes, as it no longer reflects the approach I previously agreed to, and as yet there have been no alternatives proposed to address the user confusion issues (which IMO at least are a big part of the point of having the PEP). Of course, if redirection is required for non-extrapolatable hostnames, or if somebody comes up with a new and brilliant scheme to manage the menage of permissions needed across dependency_links, the index, and general host trusting issues (while remaining comprehensible and predictable to end users), I'll certainly have a look again. But I took the weekend off from this discussion to try to come up with one myself, and so far I've got nothing. From pje at telecommunity.com Mon Mar 18 18:26:10 2013 From: pje at telecommunity.com (PJ Eby) Date: Mon, 18 Mar 2013 13:26:10 -0400 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: References: <20130315092959.GA9677@merlinux.eu> <5143475F.50708@oddbird.net> <51435CEE.3020505@oddbird.net> <5143ABC3.8030603@oddbird.net> Message-ID: On Mon, Mar 18, 2013 at 1:22 PM, PJ Eby wrote: > Actually, setuptools trusts redirects, so that mechanism is available > for splitting the hosted files to another domain. > > As it stands, though, I don't see a way to support this without > introducing confusion. Oops - that wasn't clear. By "this" I meant the current version of the PEP. From richard at python.org Mon Mar 18 20:02:35 2013 From: richard at python.org (Richard Jones) Date: Mon, 18 Mar 2013 12:02:35 -0700 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: <20130315092959.GA9677@merlinux.eu> References: <20130315092959.GA9677@merlinux.eu> Message-ID: Some suggested edits; I'm otherwise quite happy with the current draft. On 15 March 2013 02:29, holger krekel wrote: > History and motivations for external hosting Could we please have a reference to the Package Index "API"* here? > Today, most packages released on PyPI host their release files on > PyPI, but a small percentage (XXX need updated data) rely on > external hosting. The above should probably be re-worded since "rely" is loaded and we don't necessarily know the motivation for projects using external links. The important numbers though are: projects with any external only links: 2581 projects with only external only links: 1332 total projects: 29117 Whether the projects with links that also have hosted files (ie. the 1249 project difference between those numbers) *rely* on us retaining the external links facility is unknown. > Hosting modes > ------------- > > The foundation of the first transition phase is the introduction of > three "modes" of PyPI hosting for a package, affecting which links are > generated for the ``simple/`` index. These modes are implemented > without requiring changes to installation tools via changes to the > algorithm for generating the machine-readable ``simple/`` index. > > The modes are: > > - ``pypi-scrape-crawl``: no change from the current situation of > generating machine-readable links for installation tools, as > outlined in the history_. > > - ``pypi-scrape``: for a package in this mode, links to be added to > the ``simple/`` index are still scraped from package > metadata. However, the "Home-page" and "Download-url" links are > given ``rel=ext-homepage`` and ``rel=ext-download`` attributes > instead of ``rel=homepage`` and ``rel=download``. The effect of this > (with no change in installation tools necessary) is that these links > will not be followed and scraped for further candidate links by present-day > installation tools: only installable files directly hosted from PYPI or > linked directly from PyPI metadata will be considered for installation. > Installation tools MAY evolve to offer an option to use the new > rel-attribution to crawl external pages but MUST NOT default to it. I'd just like to confirm that the rel="download" / rel="ext-download" switch will not affect the installability of distribution downloads linked directly by download_url. > - ``pypi-explicit``: for a package in this mode, only links to release > files uploaded to PyPI, and external links to release files > explicitly nominated by the package owner (via a new interface > exposed by PyPI) will be added to the ``simple/`` index. The bracketed bit there needs to be emphasised (ie. not just a bracketed afterthought) as it changes the current packaging user experience considerably for those who wish to remain externally hosting files. Richard * http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api From donald at stufft.io Mon Mar 18 20:16:19 2013 From: donald at stufft.io (Donald Stufft) Date: Mon, 18 Mar 2013 15:16:19 -0400 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: References: <20130315092959.GA9677@merlinux.eu> Message-ID: On Mar 18, 2013, at 3:02 PM, Richard Jones wrote: > Some suggested edits; I'm otherwise quite happy with the current draft. > > On 15 March 2013 02:29, holger krekel wrote: >> History and motivations for external hosting > > Could we please have a reference to the Package Index "API"* here? > > >> Today, most packages released on PyPI host their release files on >> PyPI, but a small percentage (XXX need updated data) rely on >> external hosting. > > The above should probably be re-worded since "rely" is loaded and we > don't necessarily know the motivation for projects using external > links. The important numbers though are: > > projects with any external only links: 2581 > projects with only external only links: 1332 > total projects: 29117 > > Whether the projects with links that also have hosted files (ie. the > 1249 project difference between those numbers) *rely* on us retaining > the external links facility is unknown. > > >> Hosting modes >> ------------- >> >> The foundation of the first transition phase is the introduction of >> three "modes" of PyPI hosting for a package, affecting which links are >> generated for the ``simple/`` index. These modes are implemented >> without requiring changes to installation tools via changes to the >> algorithm for generating the machine-readable ``simple/`` index. >> >> The modes are: >> >> - ``pypi-scrape-crawl``: no change from the current situation of >> generating machine-readable links for installation tools, as >> outlined in the history_. >> >> - ``pypi-scrape``: for a package in this mode, links to be added to >> the ``simple/`` index are still scraped from package >> metadata. However, the "Home-page" and "Download-url" links are >> given ``rel=ext-homepage`` and ``rel=ext-download`` attributes >> instead of ``rel=homepage`` and ``rel=download``. The effect of this >> (with no change in installation tools necessary) is that these links >> will not be followed and scraped for further candidate links by present-day >> installation tools: only installable files directly hosted from PYPI or >> linked directly from PyPI metadata will be considered for installation. >> Installation tools MAY evolve to offer an option to use the new >> rel-attribution to crawl external pages but MUST NOT default to it. > > I'd just like to confirm that the rel="download" / rel="ext-download" > switch will not affect the installability of distribution downloads > linked directly by download_url. As far as I know all existing tools ignore the rel attribute for purposes of finding direct links. > > >> - ``pypi-explicit``: for a package in this mode, only links to release >> files uploaded to PyPI, and external links to release files >> explicitly nominated by the package owner (via a new interface >> exposed by PyPI) will be added to the ``simple/`` index. > > The bracketed bit there needs to be emphasised (ie. not just a > bracketed afterthought) as it changes the current packaging user > experience considerably for those who wish to remain externally > hosting files. > > > > Richard > > * http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From aclark at aclark.net Mon Mar 18 22:41:03 2013 From: aclark at aclark.net (Alex Clark) Date: Mon, 18 Mar 2013 17:41:03 -0400 Subject: [Catalog-sig] New PyPI stats available References: Message-ID: On 2013-02-19 12:31:33 +0000, Alex Clark said: > On 2013-02-18 22:06:56 +0000, PJ Eby said: > >> On Mon, Feb 18, 2013 at 9:55 AM, Alex Clark wrote: >>> aclark at Alexs-MacBook-Pro:~/Developer/aclark/resume/ > vanity pydstat >>> pydstat-1.0.0.tar.gz 2012-08-15 2,216 >>> pydstat-1.0.1.tar.gz 2012-08-23 4,367 >>> -------------------------------------------- >>> pydstat has been downloaded 6,583 times! >> >> Nice -- any chance you could add version filtering? "vanity >> setuptools" reports ~8.4 million downloads for setuptools, but the >> current release actually stands at only around 4.8 million. ;-) > > > Sure, can you specify what you want > here?https://github.com/aclark4life/vanity/issues/7. I assume you > mean:allow for easy reporting of the number of downloads for each > releasee.g. the current release. (Vanity currently displays all the > releasetotals then the sum.) > > (Of course, as I'm testing this, vanity is not working. Did XML-RPC > onPyPI go away recently? Maybe I should switch to json > e.g.https://pypi.python.org/pypi/setuptools/json) > > >> (Also, the formatting is off for the most popular downloads, because >> the count column isn't wide enough to show 7 significant figures.) > > > Thanks, reported: https://github.com/aclark4life/vanity/issues/8 And? done in 1.2.5: https://pypi.python.org/pypi/vanity/1.2.5 > > > Alex -- Alex Clark ? http://about.me/alex.clark From carl at oddbird.net Tue Mar 19 00:37:53 2013 From: carl at oddbird.net (Carl Meyer) Date: Mon, 18 Mar 2013 16:37:53 -0700 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: References: <20130315092959.GA9677@merlinux.eu> <5143475F.50708@oddbird.net> <51435CEE.3020505@oddbird.net> <5143ABC3.8030603@oddbird.net> Message-ID: <5147A551.4050506@oddbird.net> On 03/18/2013 10:22 AM, PJ Eby wrote: > Actually, setuptools trusts redirects, so that mechanism is available > for splitting the hosted files to another domain. By "trusts redirects" you mean that redirects bypass allow-hosts? This seems to contradict your line of argument up to this point (that allow-hosts must be simple and without exceptions or users will be confused). > As it stands, though, I don't see a way to support this without > introducing confusion. The advantage of using allow-hosts based on > the index host is that it *also* specifies what to do with dependency > links provided by individual packages; the PEP does not provide any > real guidance on this point. I'm updating the PEP to eliminate rel="external" (as it causes this confusion and provides no additional value) and clarify that any link, from anywhere, that is not rel="internal" should be considered an external link. > So, I have to withdraw my support for the PEP with these recent > changes, as it no longer reflects the approach I previously agreed to, > and as yet there have been no alternatives proposed to address the > user confusion issues (which IMO at least are a big part of the point > of having the PEP). I don't think there is any "user confusion" problem for an installer that does not already provide allow-hosts: just use a per-project "I want to trust external links provided by Django" option instead. And I don't really even think there is a user confusion problem in providing both allow-hosts and a new option on this model - they are options at different levels of abstraction and with different use cases (though I think the value of allow-hosts is weak if redirects bypass it anyway). IOW, this is not a problem with the PEP, this is a backwards-compatibility question and UI choice for easy_install maintainers. The PEP provides the right metadata, and there are reasonable options (in general) for installer UIs to make use of this metadata. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: From carl at oddbird.net Tue Mar 19 01:48:52 2013 From: carl at oddbird.net (Carl Meyer) Date: Mon, 18 Mar 2013 17:48:52 -0700 Subject: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI In-Reply-To: References: <20130315092959.GA9677@merlinux.eu> Message-ID: <5147B5F4.3050707@oddbird.net> Hi Richard, On 03/18/2013 12:02 PM, Richard Jones wrote: > Some suggested edits; I'm otherwise quite happy with the current draft. > > On 15 March 2013 02:29, holger krekel wrote: >> History and motivations for external hosting > > Could we please have a reference to the Package Index "API"* here? Added. >> Today, most packages released on PyPI host their release files on >> PyPI, but a small percentage (XXX need updated data) rely on >> external hosting. > > The above should probably be re-worded since "rely" is loaded and we > don't necessarily know the motivation for projects using external > links. The important numbers though are: > > projects with any external only links: 2581 > projects with only external only links: 1332 > total projects: 29117 > > Whether the projects with links that also have hosted files (ie. the > 1249 project difference between those numbers) *rely* on us retaining > the external links facility is unknown. Done: updated to include the latest numbers, re-worded to remove the word "rely", and added a link to the data and analysis tool source code at https://github.com/dstufft/pypi.linkcheck >> Hosting modes >> ------------- >> >> The foundation of the first transition phase is the introduction of >> three "modes" of PyPI hosting for a package, affecting which links are >> generated for the ``simple/`` index. These modes are implemented >> without requiring changes to installation tools via changes to the >> algorithm for generating the machine-readable ``simple/`` index. >> >> The modes are: >> >> - ``pypi-scrape-crawl``: no change from the current situation of >> generating machine-readable links for installation tools, as >> outlined in the history_. >> >> - ``pypi-scrape``: for a package in this mode, links to be added to >> the ``simple/`` index are still scraped from package >> metadata. However, the "Home-page" and "Download-url" links are >> given ``rel=ext-homepage`` and ``rel=ext-download`` attributes >> instead of ``rel=homepage`` and ``rel=download``. The effect of this >> (with no change in installation tools necessary) is that these links >> will not be followed and scraped for further candidate links by present-day >> installation tools: only installable files directly hosted from PYPI or >> linked directly from PyPI metadata will be considered for installation. >> Installation tools MAY evolve to offer an option to use the new >> rel-attribution to crawl external pages but MUST NOT default to it. > > I'd just like to confirm that the rel="download" / rel="ext-download" > switch will not affect the installability of distribution downloads > linked directly by download_url. It won't. The rel attribute impacts only whether a link to a non-archive (HTML) resource is scraped for further links, it doesn't impact a direct archive link. >> - ``pypi-explicit``: for a package in this mode, only links to release >> files uploaded to PyPI, and external links to release files >> explicitly nominated by the package owner (via a new interface >> exposed by PyPI) will be added to the ``simple/`` index. > > The bracketed bit there needs to be emphasised (ie. not just a > bracketed afterthought) as it changes the current packaging user > experience considerably for those who wish to remain externally > hosting files. Done, and added the requirement that external links must include hashes, as we just discussed in person. All of these updates are in https://bitbucket.org/hpk42/pep-pypi - feel free to sync to python.org at your leisure. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: From r1chardj0n3s at gmail.com Wed Mar 20 19:26:45 2013 From: r1chardj0n3s at gmail.com (Richard Jones) Date: Wed, 20 Mar 2013 11:26:45 -0700 Subject: [Catalog-sig] PEP 438 implementation on testpypi Message-ID: Thanks to Donald Stufft for his implementation of the PEP 438 changes, I've made them live on testpypi.python.org - specifically the "urls" page of package administration. Please poke and play. Richard From mal at egenix.com Wed Mar 20 20:31:23 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 20 Mar 2013 20:31:23 +0100 Subject: [Catalog-sig] PEP 438 implementation on testpypi In-Reply-To: References: Message-ID: <514A0E8B.5030500@egenix.com> On 20.03.2013 19:26, Richard Jones wrote: > Thanks to Donald Stufft for his implementation of the PEP 438 changes, > I've made them live on testpypi.python.org - specifically the "urls" > page of package administration. Please poke and play. Nice... first tests: * Going to "urls" and then clicking on [Change] gives an error: """ Name and version are required Name and version are required """ It doesn't matter which choice you select. * Will there be an RPC interface to register URLs with PyPI ? Doing this manually for a large number of files is, well, not ideal :-) * Adding URLs should do some more tests, I think: It's possible to register "test#md5=123" (without http/ftp and without providing the full MD5 sum). It's possible to register "../test/#md5=123", i.e. point to different files on PyPI itself. Not sure whether this is a bug or feature ;-) It's possible to register "test#md5=123&sha1=123". This is actually a good thing, since it allows implementing the hash tag extensions proposed by Christian Heimes. I'm just mentioning this, so that it becomes a supported feature. * I'm missing an option: [ ] Ask tools to scrape only the Download URL. This should result in the download_url being put on the /simple/ index page with rel="download" being set. Reasoning: This is the designated URL where packages should be downloaded from. With the current list of choices, I'd have to select the last option, which includes the old long description links and the homepage URL. Other things: ------------- * Would it be possible to add a link to the corresponding /simple/ index page on the package menu (the one with files, urls, etc.) ? * Could you add a link to the PKG-INFO file from pypi?:action=display_pkginfo to the /simple/ page as -PKG-INFO (to match the other links) ? Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 20 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Wed Mar 20 20:35:26 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 20 Mar 2013 20:35:26 +0100 Subject: [Catalog-sig] PEP 438 implementation on testpypi In-Reply-To: <514A0E8B.5030500@egenix.com> References: <514A0E8B.5030500@egenix.com> Message-ID: <514A0F7E.2090103@egenix.com> On 20.03.2013 20:31, M.-A. Lemburg wrote: > Other things: > ------------- > > * Would it be possible to add a link to the corresponding > /simple/ index page on the package menu (the one with files, > urls, etc.) ? > > * Could you add a link to the PKG-INFO file from > pypi?:action=display_pkginfo to the /simple/ page as > -PKG-INFO (to match the other links) ? Or even better and more suitable for the CDN... Have PyPI publish the PKG-INFO under the /simple/ index URL: /simple/package/-PKG-INFO (instead of just setting a link to the /pypi/ page) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 20 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From r1chardj0n3s at gmail.com Wed Mar 20 21:16:11 2013 From: r1chardj0n3s at gmail.com (Richard Jones) Date: Wed, 20 Mar 2013 13:16:11 -0700 Subject: [Catalog-sig] PEP 438 implementation on testpypi In-Reply-To: <514A0E8B.5030500@egenix.com> References: <514A0E8B.5030500@egenix.com> Message-ID: On 20 March 2013 12:31, M.-A. Lemburg wrote: > On 20.03.2013 19:26, Richard Jones wrote: >> Thanks to Donald Stufft for his implementation of the PEP 438 changes, >> I've made them live on testpypi.python.org - specifically the "urls" >> page of package administration. Please poke and play. > > Nice... first tests: > > * Going to "urls" and then clicking on [Change] gives an error: > > """ > Name and version are required > > Name and version are required > """ > > It doesn't matter which choice you select. Oops. This is fixed. You'll have to reload the page to get the correct form code. > * Will there be an RPC interface to register URLs with PyPI ? > > Doing this manually for a large number of files is, well, > not ideal :-) It's just a HTTP POST and there's plans for a tool. > * Adding URLs should do some more tests, I think: I thought about it, but didn't see any benefit. It's documented... > * I'm missing an option: > > [ ] Ask tools to scrape only the Download URL. This is not part of the planned implementation. The download_url was never well-specified, and only allows for one URL, hence the implementation we have. > * Would it be possible to add a link to the corresponding > /simple/ index page on the package menu (the one with files, > urls, etc.) ? I guess this could be added, yes. > * Could you add a link to the PKG-INFO file from > pypi?:action=display_pkginfo to the /simple/ page as > -PKG-INFO (to match the other links) ? We could think about it - what's the use-case? Richard From mal at egenix.com Wed Mar 20 21:27:38 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 20 Mar 2013 21:27:38 +0100 Subject: [Catalog-sig] PEP 438 implementation on testpypi In-Reply-To: References: <514A0E8B.5030500@egenix.com> Message-ID: <514A1BBA.8010605@egenix.com> On 20.03.2013 21:16, Richard Jones wrote: > On 20 March 2013 12:31, M.-A. Lemburg wrote: >> * Will there be an RPC interface to register URLs with PyPI ? >> >> Doing this manually for a large number of files is, well, >> not ideal :-) > > It's just a HTTP POST and there's plans for a tool. Is this documented somewhere ? I'd like to add support for it to our release process. >> * Adding URLs should do some more tests, I think: > > I thought about it, but didn't see any benefit. It's documented... Hmm, where ? :-) >> * I'm missing an option: >> >> [ ] Ask tools to scrape only the Download URL. > > This is not part of the planned implementation. The download_url was > never well-specified, and only allows for one URL, hence the > implementation we have. I know it's not in PEP 438 at the moment, but was one of the nits I mentioned to Holger last week. It's specified in the meta-data format 1.1 as "A string containing the URL from which this version of the package can be downloaded.": http://www.python.org/dev/peps/pep-0314/ Having such an option would allow cleaning up the /simple/ index pages a lot, without any changes on the tools side. It would also be needed for the my proposal of securing external downloads, where you point to a hashed download page with the download_url. >> * Would it be possible to add a link to the corresponding >> /simple/ index page on the package menu (the one with files, >> urls, etc.) ? > > I guess this could be added, yes. Great. >> * Could you add a link to the PKG-INFO file from >> pypi?:action=display_pkginfo to the /simple/ page as >> -PKG-INFO (to match the other links) ? > > We could think about it - what's the use-case? This would allow tools to easily and safely access meta-data of a package release without downloading, extracting and running the release files' setup.py. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 20 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From holger at merlinux.eu Wed Mar 20 22:02:50 2013 From: holger at merlinux.eu (holger krekel) Date: Wed, 20 Mar 2013 21:02:50 +0000 Subject: [Catalog-sig] PEP 438 implementation on testpypi In-Reply-To: <514A1BBA.8010605@egenix.com> References: <514A0E8B.5030500@egenix.com> <514A1BBA.8010605@egenix.com> Message-ID: <20130320210250.GL9677@merlinux.eu> On Wed, Mar 20, 2013 at 21:27 +0100, M.-A. Lemburg wrote: > On 20.03.2013 21:16, Richard Jones wrote: > > On 20 March 2013 12:31, M.-A. Lemburg wrote: > >> * I'm missing an option: > >> > >> [ ] Ask tools to scrape only the Download URL. > > > > This is not part of the planned implementation. The download_url was > > never well-specified, and only allows for one URL, hence the > > implementation we have. > > I know it's not in PEP 438 at the moment, but was one of the > nits I mentioned to Holger last week. It's specified in the > meta-data format 1.1 as "A string containing the URL from > which this version of the package can be downloaded.": > > http://www.python.org/dev/peps/pep-0314/ > > Having such an option would allow cleaning up the /simple/ > index pages a lot, without any changes on the tools side. > > It would also be needed for the my proposal of securing > external downloads, where you point to a hashed download > page with the download_url. I think it's better to just go for a tool which a maintainer can use to register external urls (with hashes) from crawling and scraping links once from an external page. This way client installers worldwide do not need to visit and scrape that external page just to obtain release file links. As you have mostly automated your release process do you foresee any issues with adding an automated step of registering externals and putting your package hosting mode to "pypi-explicit"? holger > >> * Would it be possible to add a link to the corresponding > >> /simple/ index page on the package menu (the one with files, > >> urls, etc.) ? > > > > I guess this could be added, yes. > > Great. > > >> * Could you add a link to the PKG-INFO file from > >> pypi?:action=display_pkginfo to the /simple/ page as > >> -PKG-INFO (to match the other links) ? > > > > We could think about it - what's the use-case? > > This would allow tools to easily and safely access meta-data > of a package release without downloading, extracting and > running the release files' setup.py. > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 20 2013) > >>> Python Projects, Consulting and Support ... http://www.egenix.com/ > >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From r1chardj0n3s at gmail.com Wed Mar 20 22:17:05 2013 From: r1chardj0n3s at gmail.com (Richard Jones) Date: Wed, 20 Mar 2013 14:17:05 -0700 Subject: [Catalog-sig] PEP 438 implementation on testpypi In-Reply-To: <514A1BBA.8010605@egenix.com> References: <514A0E8B.5030500@egenix.com> <514A1BBA.8010605@egenix.com> Message-ID: On 20 March 2013 13:27, M.-A. Lemburg wrote: > On 20.03.2013 21:16, Richard Jones wrote: >> On 20 March 2013 12:31, M.-A. Lemburg wrote: >>> * Will there be an RPC interface to register URLs with PyPI ? >>> >>> Doing this manually for a large number of files is, well, >>> not ideal :-) >> >> It's just a HTTP POST and there's plans for a tool. > > Is this documented somewhere ? I'd like to add support for it > to our release process. I'll think about adding this to the PEP. >>> * Adding URLs should do some more tests, I think: >> >> I thought about it, but didn't see any benefit. It's documented... > > Hmm, where ? :-) In the HTML page just above the add form :-) Richard From mal at egenix.com Wed Mar 20 22:53:29 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 20 Mar 2013 22:53:29 +0100 Subject: [Catalog-sig] PEP 438 implementation on testpypi In-Reply-To: <20130320210250.GL9677@merlinux.eu> References: <514A0E8B.5030500@egenix.com> <514A1BBA.8010605@egenix.com> <20130320210250.GL9677@merlinux.eu> Message-ID: <514A2FD9.9030702@egenix.com> On 20.03.2013 22:02, holger krekel wrote: > On Wed, Mar 20, 2013 at 21:27 +0100, M.-A. Lemburg wrote: >> On 20.03.2013 21:16, Richard Jones wrote: >>> On 20 March 2013 12:31, M.-A. Lemburg wrote: >>>> * I'm missing an option: >>>> >>>> [ ] Ask tools to scrape only the Download URL. >>> >>> This is not part of the planned implementation. The download_url was >>> never well-specified, and only allows for one URL, hence the >>> implementation we have. >> >> I know it's not in PEP 438 at the moment, but was one of the >> nits I mentioned to Holger last week. It's specified in the >> meta-data format 1.1 as "A string containing the URL from >> which this version of the package can be downloaded.": >> >> http://www.python.org/dev/peps/pep-0314/ >> >> Having such an option would allow cleaning up the /simple/ >> index pages a lot, without any changes on the tools side. >> >> It would also be needed for the my proposal of securing >> external downloads, where you point to a hashed download >> page with the download_url. > > I think it's better to just go for a tool which a maintainer can > use to register external urls (with hashes) from crawling and scraping > links once from an external page. This way client installers worldwide > do not need to visit and scrape that external page just to obtain > release file links. As you have mostly automated your release process > do you foresee any issues with adding an automated step of registering > externals and putting your package hosting mode to "pypi-explicit"? I don't have a problem with adding support to our release process (provided there's some stable way to access the needed API). I'm thinking about other package owners who have the download_url already point to a page with the distribution file(s) and don't have a release process they can easily adapt. For them, it would be a nice possibility to speed up installation of their packages. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 20 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Wed Mar 20 22:56:54 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 20 Mar 2013 22:56:54 +0100 Subject: [Catalog-sig] PEP 438 implementation on testpypi In-Reply-To: References: <514A0E8B.5030500@egenix.com> <514A1BBA.8010605@egenix.com> Message-ID: <514A30A6.3080505@egenix.com> On 20.03.2013 22:17, Richard Jones wrote: > On 20 March 2013 13:27, M.-A. Lemburg wrote: >> On 20.03.2013 21:16, Richard Jones wrote: >>> On 20 March 2013 12:31, M.-A. Lemburg wrote: >>>> * Will there be an RPC interface to register URLs with PyPI ? >>>> >>>> Doing this manually for a large number of files is, well, >>>> not ideal :-) >>> >>> It's just a HTTP POST and there's plans for a tool. >> >> Is this documented somewhere ? I'd like to add support for it >> to our release process. > > I'll think about adding this to the PEP. > > >>>> * Adding URLs should do some more tests, I think: >>> >>> I thought about it, but didn't see any benefit. It's documented... >> >> Hmm, where ? :-) > > In the HTML page just above the add form :-) Could you change "The URL must end with the MD5 hash of the file contents" to "The URL must include the MD5 hash of the file contents" ? (See my original test report for the reason :-)) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 20 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From r1chardj0n3s at gmail.com Wed Mar 20 23:01:21 2013 From: r1chardj0n3s at gmail.com (Richard Jones) Date: Wed, 20 Mar 2013 15:01:21 -0700 Subject: [Catalog-sig] PEP 438 implementation on testpypi In-Reply-To: <514A30A6.3080505@egenix.com> References: <514A0E8B.5030500@egenix.com> <514A1BBA.8010605@egenix.com> <514A30A6.3080505@egenix.com> Message-ID: On 20 March 2013 14:56, M.-A. Lemburg wrote: > Could you change "The URL must end with the MD5 hash of the file > contents" to "The URL must include the MD5 hash of the file contents" ? > > (See my original test report for the reason :-)) Hm. The wording was passed by one of the pip maintainers so I'll defer to them on what the URL format should be. Richard From mal at egenix.com Wed Mar 20 23:19:06 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 20 Mar 2013 23:19:06 +0100 Subject: [Catalog-sig] PEP 438 implementation on testpypi In-Reply-To: References: <514A0E8B.5030500@egenix.com> <514A1BBA.8010605@egenix.com> <514A30A6.3080505@egenix.com> Message-ID: <514A35DA.1020400@egenix.com> On 20.03.2013 23:01, Richard Jones wrote: > On 20 March 2013 14:56, M.-A. Lemburg wrote: >> Could you change "The URL must end with the MD5 hash of the file >> contents" to "The URL must include the MD5 hash of the file contents" ? >> >> (See my original test report for the reason :-)) > > Hm. The wording was passed by one of the pip maintainers so I'll defer > to them on what the URL format should be. The format should be defined in the PEP 438. If we adopt the hash tag extensions, then the URL fragment will just start with the md5= part and not necessarily also end with it. pip and easy_install will then have to implement the extension mechanism; and package authors will have to decide whether or not they want to stay compatible to versions of those tools that don't have these implemented. I was just asking for the text on the page to be in line with what PyPI actually checks. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 20 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From r1chardj0n3s at gmail.com Wed Mar 20 23:19:19 2013 From: r1chardj0n3s at gmail.com (Richard Jones) Date: Wed, 20 Mar 2013 15:19:19 -0700 Subject: [Catalog-sig] PEP 438 implementation on testpypi In-Reply-To: References: <514A0E8B.5030500@egenix.com> <514A1BBA.8010605@egenix.com> <514A30A6.3080505@egenix.com> Message-ID: On 20 March 2013 15:01, Richard Jones wrote: > On 20 March 2013 14:56, M.-A. Lemburg wrote: >> Could you change "The URL must end with the MD5 hash of the file >> contents" to "The URL must include the MD5 hash of the file contents" ? >> >> (See my original test report for the reason :-)) > > Hm. The wording was passed by one of the pip maintainers so I'll defer > to them on what the URL format should be. Having discussed this further offline I've now modified the text as above (with a tweak.) Richard From mal at egenix.com Wed Mar 20 23:20:37 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 20 Mar 2013 23:20:37 +0100 Subject: [Catalog-sig] PEP 438 implementation on testpypi In-Reply-To: References: <514A0E8B.5030500@egenix.com> <514A1BBA.8010605@egenix.com> <514A30A6.3080505@egenix.com> Message-ID: <514A3635.2000804@egenix.com> On 20.03.2013 23:19, Richard Jones wrote: > On 20 March 2013 15:01, Richard Jones wrote: >> On 20 March 2013 14:56, M.-A. Lemburg wrote: >>> Could you change "The URL must end with the MD5 hash of the file >>> contents" to "The URL must include the MD5 hash of the file contents" ? >>> >>> (See my original test report for the reason :-)) >> >> Hm. The wording was passed by one of the pip maintainers so I'll defer >> to them on what the URL format should be. > > Having discussed this further offline I've now modified the text as > above (with a tweak.) Thanks. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 20 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From r1chardj0n3s at gmail.com Wed Mar 20 23:28:16 2013 From: r1chardj0n3s at gmail.com (Richard Jones) Date: Wed, 20 Mar 2013 15:28:16 -0700 Subject: [Catalog-sig] PEP 438 implementation on testpypi In-Reply-To: References: <514A0E8B.5030500@egenix.com> <514A1BBA.8010605@egenix.com> Message-ID: On 20 March 2013 14:17, Richard Jones wrote: > On 20 March 2013 13:27, M.-A. Lemburg wrote: >> On 20.03.2013 21:16, Richard Jones wrote: >>> On 20 March 2013 12:31, M.-A. Lemburg wrote: >>>> * Will there be an RPC interface to register URLs with PyPI ? >>>> >>>> Doing this manually for a large number of files is, well, >>>> not ideal :-) >>> >>> It's just a HTTP POST and there's plans for a tool. >> >> Is this documented somewhere ? I'd like to add support for it >> to our release process. > > I'll think about adding this to the PEP. This is now in the PEP. Richard From mal at egenix.com Thu Mar 21 00:23:50 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 21 Mar 2013 00:23:50 +0100 Subject: [Catalog-sig] PEP 438 implementation on testpypi In-Reply-To: References: <514A0E8B.5030500@egenix.com> <514A1BBA.8010605@egenix.com> Message-ID: <514A4506.70802@egenix.com> On 20.03.2013 23:28, Richard Jones wrote: > On 20 March 2013 14:17, Richard Jones wrote: >> On 20 March 2013 13:27, M.-A. Lemburg wrote: >>> On 20.03.2013 21:16, Richard Jones wrote: >>>> On 20 March 2013 12:31, M.-A. Lemburg wrote: >>>>> * Will there be an RPC interface to register URLs with PyPI ? >>>>> >>>>> Doing this manually for a large number of files is, well, >>>>> not ideal :-) >>>> >>>> It's just a HTTP POST and there's plans for a tool. >>> >>> Is this documented somewhere ? I'd like to add support for it >>> to our release process. >> >> I'll think about adding this to the PEP. > > This is now in the PEP. Hmm, looks like the PEP update process isn't working on the site: http://www.python.org/dev/peps/pep-0438/ Last-Modified: 2013-03-15 22:51:25 -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 21 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From r1chardj0n3s at gmail.com Thu Mar 21 00:32:03 2013 From: r1chardj0n3s at gmail.com (Richard Jones) Date: Wed, 20 Mar 2013 16:32:03 -0700 Subject: [Catalog-sig] PEP 438 implementation on testpypi In-Reply-To: <514A4506.70802@egenix.com> References: <514A0E8B.5030500@egenix.com> <514A1BBA.8010605@egenix.com> <514A4506.70802@egenix.com> Message-ID: On 20 March 2013 16:23, M.-A. Lemburg wrote: > On 20.03.2013 23:28, Richard Jones wrote: >> On 20 March 2013 14:17, Richard Jones wrote: >>> On 20 March 2013 13:27, M.-A. Lemburg wrote: >>>> On 20.03.2013 21:16, Richard Jones wrote: >>>>> On 20 March 2013 12:31, M.-A. Lemburg wrote: >>>>>> * Will there be an RPC interface to register URLs with PyPI ? >>>>>> >>>>>> Doing this manually for a large number of files is, well, >>>>>> not ideal :-) >>>>> >>>>> It's just a HTTP POST and there's plans for a tool. >>>> >>>> Is this documented somewhere ? I'd like to add support for it >>>> to our release process. >>> >>> I'll think about adding this to the PEP. >> >> This is now in the PEP. > > Hmm, looks like the PEP update process isn't working on the site: > > http://www.python.org/dev/peps/pep-0438/ > > Last-Modified: 2013-03-15 22:51:25 It's being edited in a separate repos. I've not submitted the latest from Holger's repos to the pep editors (yes, I have commit privs but I'm not fully up to speed on the process so will leave it to those who are.) Richard From ct at gocept.com Thu Mar 21 00:59:21 2013 From: ct at gocept.com (Christian Theune) Date: Wed, 20 Mar 2013 16:59:21 -0700 Subject: [Catalog-sig] Replacement client for pep381client Message-ID: Hi, as you might be aware, I've done my share on bitching about my mirror (f.pypi.python.org) breaking. I have picked pep381client apart yesterday and rebuilt it - mostly from ground up. You can find a working version here: https://bitbucket.org/ctheune/bandersnatch The focus has been on making it a lot more robust and a lot easier to repair a mirror when it's known to be broken. To achieve that I: - refactored the code, trying to make it more intentional, less mechanical - stop parsing the simple pages' html and make more use of the XML-RPC API - add Tarek's worker/queue approach for parallelizing it - keep as little state as possible on the client - switch form timestamps to serial counters for checking what and how much to update - handle locking of concurrent runs more gracefully I think I have a good grasp of what's going on now so that I can keep maintining this in the future. I'm currently re-initializing my own mirror. This basically can be run in-place by just removing the existing state data and calling my sync script (bsn-mirror) instead of pep381run with the same parameters. Tomorrow I'll update the documentation, make it use a config file and put some lipstick on the main entry point. After that I should be ready for a release. If you want to give it a try already, you just do this: $ hg clone https://bitbucket/org/ctheune/bandersnatch $ cd bandersnatch $ virtualenv-2.7 . $ bin/python bootstrap.py $ bin/buildout $ bin/bsn-mirror /my/mirror/path Cheers, Christian -------------- next part -------------- An HTML attachment was scrubbed... URL: From r1chardj0n3s at gmail.com Thu Mar 21 01:30:09 2013 From: r1chardj0n3s at gmail.com (Richard Jones) Date: Wed, 20 Mar 2013 17:30:09 -0700 Subject: [Catalog-sig] Updated PEP 438 Message-ID: I've pushed the latest PEP to the repos. It has all the recent clarifications and the API docs. Just need to wait for the website to rebuild or something. Unless there's any last-minute problems I'll accept the PEP in this form and push the implementation to the production PyPI next week after I fly home. Richard From ct at gocept.com Thu Mar 21 03:27:30 2013 From: ct at gocept.com (Christian Theune) Date: Wed, 20 Mar 2013 19:27:30 -0700 Subject: [Catalog-sig] ResponseNotReady error while trying to do fresh sync References: Message-ID: On 2013-03-14 04:17:35 +0000, Qijiang Fan said: > Hello, > I'm maintaining e.pypi.python.org (with Aron Xu). > We met some issues on our network attached storage, so we decided to > do a fresh sync of pypi. > We met an issue while doing that, If you're interested: check out bandersnatch. I just had it recover my broken index nicely in about 2.5 hours. Christian From ct at gocept.com Thu Mar 21 03:27:53 2013 From: ct at gocept.com (Christian Theune) Date: Wed, 20 Mar 2013 19:27:53 -0700 Subject: [Catalog-sig] Replacement client for pep381client References: Message-ID: On 2013-03-20 23:59:21 +0000, Christian Theune said: > > I'm currently re-initializing my own mirror. This basically can be run > in-place by just removing the existing state data and calling my sync > script (bsn-mirror) instead of pep381run with the same parameters. This worked nicely for me - I'm running my mirror on bandersnatch now. Christian From holger at merlinux.eu Thu Mar 21 07:22:37 2013 From: holger at merlinux.eu (holger krekel) Date: Thu, 21 Mar 2013 06:22:37 +0000 Subject: [Catalog-sig] Updated PEP 438 In-Reply-To: References: Message-ID: <20130321062237.GN9677@merlinux.eu> Hi Richard, all, On Wed, Mar 20, 2013 at 17:30 -0700, Richard Jones wrote: > I've pushed the latest PEP to the repos. It has all the recent > clarifications and the API docs. Just need to wait for the website to > rebuild or something. It's online now. Current references to PEP438 (also inlined below): http://www.python.org/dev/peps/pep-0438/ https://bitbucket.org/hpk42/pep-pypi/src/c0cbd3f3508991f5c47eb0fdb036c6e25ef45047/PEP-438.txt?at=default > Unless there's any last-minute problems I'll accept the PEP in this > form and push the implementation to the production PyPI next week > after I fly home. testpypi.python.org keeps 502ing on me - probably makes sense to first have that stable and reviewed for a few days at least. best and thanks everybody, holger PEP: 438 Title: Transitioning to release-file hosting on PyPI Version: $Revision$ Last-Modified: $Date$ Author: Holger Krekel , Carl Meyer BDFL-Delegate: Richard Jones Discussions-To: catalog-sig at python.org Status: Draft Type: Process Content-Type: text/x-rst Created: 15-Mar-2013 Post-History: Abstract ======== This PEP proposes a backward-compatible two-phase transition process to speed up, simplify and robustify installing from the pypi.python.org (PyPI) package index. To ease the transition and minimize client-side friction, **no changes to distutils or existing installation tools are required in order to benefit from the first transition phase, which will result in faster, more reliable installs for most existing packages**. The first transition phase implements easy and explicit means for a package maintainer to control which release file links are served to present-day installation tools. The first phase also includes the implementation of analysis tools for present-day packages, to support communication with package maintainers and the automated setting of default modes for controlling release file links. The first phase also will default newly-registered projects on PyPI to only serve links to release files which were uploaded to PyPI. The second transition phase concerns end-user installation tools, which shall default to only install release files that are hosted on PyPI and tell the user if external release files exist, offering a choice to automatically use those external files. External release files shall in the future be registered together with a checksum hash so that installation tools can verify the integrity of the eventual download (PyPI-hosted release files always carry such a checksum). Alternative PyPI server implementations should implement the new simple index serving behaviour of transition phase 1 to avoid installation tools treating their release links as external ones in phase 2. Rationale ========= .. _history: History and motivations for external hosting -------------------------------------------- When PyPI went online, it offered release registration but had no facility to host release files itself. When hosting was added, no automated downloading tool existed yet. When Philip Eby implemented automated downloading (through setuptools), he made the choice to allow people to use download hosts of their choice. The finding of externally-hosted packages was implemented as follows: #. The PyPI ``simple/`` index for a package contains all links found by scraping them from that package's long_description metadata for any release. Links in the "Download-URL" and "Home-page" metadata fields are given ``rel=download`` and ``rel=homepage`` attributes, respectively. #. Any of these links whose target is a file whose name appears to be in the form of an installable source or binary distribution, with name in the form "packagename-version.ARCHIVEEXT", is considered a potential installation candidate by installation tools. #. Similarly, any links suffixed with an "#egg=packagename-version" fragment are considered an installation candidate. #. Additionally, the ``rel=homepage`` and ``rel=download`` links are crawled by installation tools and, if HTML, are themselves scraped for release-file links in the above formats. See the easy_install documentation for a complete description of this behavior. [1]_ Today, most packages indexed on PyPI host their release files on PyPI. Out of 29,117 total projects on PyPI, only 2,581 (less than 10%) include any links to installable files that are available only off-PyPI. [2]_ There are many reasons [3]_ why people have chosen external hosting. To cite just a few: - release processes and scripts have been developed already and upload to external sites - it takes too long to upload large files from some places in the world - export restrictions e.g. for crypto-related software - company policies which require offering open source packages through own sites - problems with integrating uploading to PyPI into one's release process (because of release policies) - desiring download statistics different from those maintained by PyPI - perceived bad reliability of PyPI - not aware that PyPI offers file-hosting Irrespective of the present-day validity of these reasons, there clearly is a history why people choose to host files externally and it even was for some time the only way you could do things. This PEP takes the position that there remain some valid reasons for external hosting even today. Problem ------- **Today, python package installers (pip, easy_install, buildout, and others) often need to query many non-PyPI URLs even if there are no externally hosted files**. Apart from querying pypi.python.org's simple index pages, also all homepages and download pages ever specified with any release of a package are crawled by an installer. The need for installers to crawl external sites slows down installation and makes for a brittle and unreliable installation process. Those sites and packages also don't take part in the :pep:`381` mirroring infrastructure, further decreasing reliability and speed of automated installation processes around the world. Most packages are hosted directly on pypi.python.org [2]_. Even for these packages, installers still crawl their homepage and download-url, if specified. Many package uploaders are not aware that specifying the "homepage" or "download-url" in their package metadata will needlessly slow down the installation process for all users. Relying on third party sites also opens up more attack vectors for injecting malicious packages into sites using automated installs. A simple attack might just involve getting hold of an old now-unused homepage domain and placing malicious packages there. Moreover, performing a Man-in-The-Middle (MITM) attack between an installation site and any of the download sites can inject malicious packages on the installation site. As many homepages and download locations are using HTTP and not HTTPS, such attacks are not hard to launch. Such MITM attacks can easily happen even for packages which never intended to host files externally as their homepages are contacted by installers anyway. There is currently no way for package maintainers to avoid external-link crawling, other than removing all homepage/download url metadata for all historic releases. While a script [4]_ has been written to perform this action, it is not a good general solution because it removes useful metadata from PyPI releases. Even if the sites referenced by "Homepage" and "Download-URL" links were not scraped for further links, there is no obvious way under the current system for a package owner to link to an installable file from a long_description metadata field (which is shown as package documentation on ``/pypi/PKG``) without installation tools automatically considering that file a candidate for installation. Conversely, there is no way to explicitly register multiple external release files without putting them in metadata fields. Goals ----- These are the goals to be achieved by implementation of this PEP: * Package owners should be able to explicitly control which files are presented by PyPI to installer tools as installation candidates. Installation should not be slowed and made less reliable by extensive and unnecessary crawling of links that package owners did not explicitly nominate as installation files. * It should remain possible for package owners to choose to host their release files on their own hosting, external to PyPI. It should be easy for a user to request the installation of such releases using automated installer tools, especially if the external release files were registered together with a checksum hash. * Automated installer tools should not install externally-hosted packages **by default**, but require explicit authorization to do so by the user. When tools refuse to install such a package by default, they should tell the user exactly which external link(s) the installer needs to follow, and what option(s) the user can provide to authorize the tool to follow those links. PyPI should provide all necessary metadata for installer tools to implement this easily and within a single request/reply interaction. * Migration from the status quo to the above points should be gradual and minimize breakage. This includes tooling that makes it easy for package owners with an existing release process that uploads to non-PyPI hosting to also upload those release files to PyPI. Solution / two transition phases ================================ The first transition phase introduces a "hosting-mode" field for each project on PyPI, allowing package owners explicit control of which release file links are served to present-day installation tools in the machine-readable ``simple/`` index. The first transition will, after successful hosting-mode manipulations by individual early-adopters, set a default hosting mode for existing packages, based on automated analysis. **Maintainers will be notified one month ahead of any such automated change**. At completion of the first transition phase, **all present-day existing release and installation processes and tools are expected to continue working**. Any remaining errors or problems are expected to only relate to installation of individual packages and can be easily corrected by package maintainers or PyPI admins if maintainers are not reachable. Also in the first phase, each link served in the ``simple/`` index will be explicitly marked as ``rel="internal"`` if it is hosted by the index itself (even if on a separate domain, which may be the case if the index uses a CDN for file-serving). Any link not so marked will be considered an external link. In the second transition phase, PyPI client installation tools shall be updated to default to only install ``rel="internal"`` packages unless a user specifies option(s) to permit installing from external links. See `second transition phase`_ for details on how installers should behave. Maintainers of packages which currently host release files on non-PyPI sites shall receive instructions and tools to ease "re-hosting" of their historic and future package release files. This re-hosting tool MUST be available before automated hosting-mode changes are announced to package maintainers. Implementation ============== Hosting modes ------------- The foundation of the first transition phase is the introduction of three "modes" of PyPI hosting for a package, affecting which links are generated for the ``simple/`` index. These modes are implemented without requiring changes to installation tools via changes to the algorithm for generating the machine-readable ``simple/`` index. The modes are: - ``pypi-scrape-crawl``: no change from the current situation of generating machine-readable links for installation tools, as outlined in the history_. - ``pypi-scrape``: for a package in this mode, links to be added to the ``simple/`` index are still scraped from package metadata. However, the "Home-page" and "Download-url" links are given ``rel=ext-homepage`` and ``rel=ext-download`` attributes instead of ``rel=homepage`` and ``rel=download``. The effect of this (with no change in installation tools necessary) is that these links will not be followed and scraped for further candidate links by present-day installation tools: only installable files directly hosted from PyPI or linked directly from PyPI metadata will be considered for installation. Installation tools MAY evolve to offer an option to use the new rel-attribution to crawl external pages but MUST NOT default to it. - ``pypi-explicit``: for a package in this mode, only links to release files uploaded to PyPI, and external links to release files explicitly nominated by the package owner, will be added to the ``simple/`` index. PyPI will provide a new interface for package owners to supply external release-file URLs. These URLs MUST include a URL fragment in the form "#hashtype=hashvalue" specifying a hash of the externally-linked file which installer tools MUST use to validate that they have downloaded the intended file. Thus the hope is that eventually all projects on PyPI can be migrated to the ``pypi-explicit`` mode, while preserving the ability to install release files hosted externally via installer tools. Deprecation of hosting modes to eventually only allow the ``pypi-explicit`` mode is NOT REGULATED by this PEP but is expected to become feasible some time after successful implementation of the transition phases described in this PEP. It is expected that deprecation requires **a new process to deal with abandoned packages** because of unreachable maintainers for still popular packages. First transition phase (PyPI) ----------------------------- The proposed solution consists of multiple implementation and communication steps: #. Implement in PyPI the three modes described above, with an interface for package owners to select the mode for each package and register explicit external file URLs. #. For packages in all modes, label links in the ``simple/`` index to index-hosted files with ``rel="internal"``, to make it easier for client tools to distinguish these links in the second phase. #. Add an HTML tag ```` to all ``simple/`` index pages, to allow clients to distinguish between indexes providing the ``rel="internal"`` metadata and older ones that do not. #. Default all newly-registered packages to ``pypi-explicit`` mode (package owners can still switch to the other modes as desired). #. Determine (via automated analysis [2]_) which packages have all installable files available on PyPI itself (group A), which have all installable files on PyPI or linked directly from PyPI metadata (group B), and which have installable versions available that are linked only from external homepage/download HTML pages (group C). #. Send mail to maintainers of projects in group A that their project will be automatically configured to ``pypi-explicit`` mode in one month, and similarly to maintainers of projects in group B that their project will be automatically configured to ``pypi-scrape`` mode. Inform them that this change is not expected to affect installability of their project at all, but will result in faster and safer installs for their users. Encourage them to set this mode themselves sooner to benefit their users. #. Send mail to maintainers of packages in group C that their package hosting mode is ``pypi-scrape-crawl``, list the URLs which currently are crawled, and suggest that they either re-host their packages directly on PyPI and switch to ``pypi-explicit``, or at least provide direct links to release files in PyPI metadata and switch to ``pypi-scrape``. Provide instructions and tools to help with these transitions. .. _`second transition phase`: Second transition phase (installer tools) ----------------------------------------- For the second transition phase, maintainers of installation tools are asked to release two updates. The first update shall provide clear warnings if externally-hosted release files (that is, files whose link does not include ``rel="internal"``) are selected for download, for which projects and URLs exactly this happens, and warn that in future versions externally-hosted downloads will be disabled by default. The second update should change the default mode to allow only installation of ``rel="internal"`` package files, and allow installation of externally-hosted packages only when the user supplies an option. The installer should distinguish between verifiable and non-verifiable external links. A verifiable external link is a direct link to an installable file from the PyPI ``simple/`` index that includes a hash in the URL fragment ("#hashtype=hashvalue") which can be used to verify the integrity of the downloaded file. A non-verifiable external link is any link (other than those explicitly supplied by the user of an installer tool) without a hash, scraped from external HTML, or injected into the search via some other non-PyPI source (e.g. setuptools' ``dependency_links`` feature). Installers should provide a blanket option to allow installing any verifiable external link. Non-verifiable external links should only be installed if the user-provided option specifies exactly which external domains can be used or for which specific package names external links can be used. When download of an externally-hosted package is disallowed by the default configuration, the user should be notified, with instructions for how to make the install succeed and warnings about the implication (that a file will be downloaded from a site that is not part of the package index). The warning given for non-verifiable links should clearly state that the installer cannot verify the integrity of the downloaded file. The warning given for verifiable external links should simply note that the file will be downloaded from an external URL, but that the file integrity can be verified by checksum. Alternative PyPI-compatible index implementations should upgrade to begin providing the ``rel="internal"`` metadata and the ```` tag as soon as possible. For alternative indexes which do not yet provide the meta tag in their ``simple/`` pages, installation tools should provide backwards-compatible fallback behavior (treat links as internal as in pre-PEP times and provide a warning). API For Submitting External Distribution URLs --------------------------------------------- New distribution URLs may be submitted by performing a HTTP POST to the URL: https://pypi.python.org/pypi With the following form-encoded data: ============== ================================ Name Value -------------- -------------------------------- :action The string "urls" name The package name as a string version The release version as a string new-url The new URL to store submit_new_url The string "yes" ============== ================================ The POST must be accompanied by an HTTP Basic Auth header encoding the username and password of the user authorized to maintain the package on PyPI. The HTTP response to this request will be one of: ======= ============ ================================================ Code Meaning URL submission implications ------- ------------ ------------------------------------------------ 200 OK Everything worked just fine 400 Bad request Data provided for submission was malformed 401 Unauthorised The username or password supplied were incorrect 403 Forbidden User does not have permission to update the package information (not Owner or Maintainer) ======= ============ ================================================ References ========== .. [1] Philip Eby, easy_install 'Package Index "API"' documentation, http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api .. [2] Donald Stufft, automated analysis of PyPI project links, https://github.com/dstufft/pypi.linkcheck .. [3] Marc-Andre Lemburg, reasons for external hosting, http://mail.python.org/pipermail/catalog-sig/2013-March/005626.html .. [4] Holger Krekel, script to remove homepage/download metadata for all releases http://mail.python.org/pipermail/catalog-sig/2013-February/005423.html Acknowledgments =============== Philip Eby for precise information and the basic ideas to implement the transition via server-side changes only. Donald Stufft for pushing away from external hosting and offering to implement both a Pull Request for the necessary PyPI changes and the analysis tool to drive the transition phase 1. Marc-Andre Lemburg, Nick Coghlan and catalog-sig in general for thinking through issues regarding getting rid of "external hosting". Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From r1chardj0n3s at gmail.com Thu Mar 21 07:45:56 2013 From: r1chardj0n3s at gmail.com (Richard Jones) Date: Wed, 20 Mar 2013 23:45:56 -0700 Subject: [Catalog-sig] Updated PEP 438 In-Reply-To: <20130321062237.GN9677@merlinux.eu> References: <20130321062237.GN9677@merlinux.eu> Message-ID: On 20 March 2013 23:22, holger krekel wrote: > On Wed, Mar 20, 2013 at 17:30 -0700, Richard Jones wrote: >> I've pushed the latest PEP to the repos. It has all the recent >> clarifications and the API docs. Just need to wait for the website to >> rebuild or something. > > It's online now. Current references to PEP438 (also inlined below): > > http://www.python.org/dev/peps/pep-0438/ > https://bitbucket.org/hpk42/pep-pypi/src/c0cbd3f3508991f5c47eb0fdb036c6e25ef45047/PEP-438.txt?at=default > >> Unless there's any last-minute problems I'll accept the PEP in this >> form and push the implementation to the production PyPI next week >> after I fly home. > > testpypi.python.org keeps 502ing on me - probably makes sense to first have > that stable and reviewed for a few days at least. Dammit, I don't know why but uwsgi just keeps bloody dying :-( Richard From holger at merlinux.eu Thu Mar 21 11:28:24 2013 From: holger at merlinux.eu (holger krekel) Date: Thu, 21 Mar 2013 10:28:24 +0000 Subject: [Catalog-sig] Replacement client for pep381client In-Reply-To: References: Message-ID: <20130321102824.GQ9677@merlinux.eu> On Wed, Mar 20, 2013 at 19:27 -0700, Christian Theune wrote: > On 2013-03-20 23:59:21 +0000, Christian Theune said: > > > >I'm currently re-initializing my own mirror. This basically can be > >run in-place by just removing the existing state data and calling > >my sync script (bsn-mirror) instead of pep381run with the same > >parameters. > > This worked nicely for me - I'm running my mirror on bandersnatch now. I got so far 3 errors like this one:: 2013-03-21 14:23:19,759 bandersnatch.package INFO: Downloading: https://pypi.python.org/packages/source/C/Clay/Clay-0.13.tar.gz 2013-03-21 14:23:20,384 bandersnatch.package ERROR: Error syncing package: Coopr Traceback (most recent call last): File "/home/hpk/bandersnatch/src/bandersnatch/package.py", line 50, in sync self.sync_release_files() File "/home/hpk/bandersnatch/src/bandersnatch/package.py", line 68, in sync_release_files self.download_file(release_file['url'], release_file['md5_digest']) File "/home/hpk/bandersnatch/src/bandersnatch/package.py", line 144, in download_file url, existing_hash, md5sum)) ValueError: https://pypi.python.org/packages/source/C/Coopr/Coopr-1.1.zip has hash 97cb7ae47656df10d243533c4f0c63c1 instead of 7ed6916702b2afccd254b423450ac4af and the command terminates. I can restart fine, though. Will continue to do continue and see how far i get. Seems to perform quickly, btw :) holger > Christian > > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From christian at python.org Thu Mar 21 13:06:07 2013 From: christian at python.org (Christian Heimes) Date: Thu, 21 Mar 2013 13:06:07 +0100 Subject: [Catalog-sig] Access to Windows' cert store Message-ID: <514AF7AF.7000304@python.org> Hi, the message is slightly off-topic but it might be interesting for pip, setuptools and other developers that are working on HTTPS for PyPI. I while ago I found C++ example code that shows how to dump CA and CRL certs from Windows's system cert store. The system cert store contains the certificates used by Windows, IE etc. Yesterday I reimplemented the C++ code with Python and ctypes. I have tested it with Python 2.6 to 3.3 (x86 and x86_64) on Windows 7. It should work with Windows XP / Windows Server 2003 and all newer versions of Windows. The output is usabl by Python's SSL module but you have to dump the certs to a file first. I'm planing to add the feature to Python 3.4, too. http://bugs.python.org/issue17134 You can download the code from https://bitbucket.org/tiran/wincertstore Regards, Christian From mal at egenix.com Thu Mar 21 13:58:34 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 21 Mar 2013 13:58:34 +0100 Subject: [Catalog-sig] Access to Windows' cert store In-Reply-To: <514AF7AF.7000304@python.org> References: <514AF7AF.7000304@python.org> Message-ID: <514B03FA.7060605@egenix.com> On 21.03.2013 13:06, Christian Heimes wrote: > Hi, > > the message is slightly off-topic but it might be interesting for pip, > setuptools and other developers that are working on HTTPS for PyPI. > > I while ago I found C++ example code that shows how to dump CA and CRL > certs from Windows's system cert store. The system cert store contains > the certificates used by Windows, IE etc. Why not simply use the Firefox certs ? We started adding these to our pyOpenSSL distribution with the last release: https://cms.egenix.com/products/python/pyOpenSSL/doc/#Module_OpenSSL.ca_bundle > Yesterday I reimplemented the C++ code with Python and ctypes. I have > tested it with Python 2.6 to 3.3 (x86 and x86_64) on Windows 7. It > should work with Windows XP / Windows Server 2003 and all newer versions > of Windows. The output is usabl by Python's SSL module but you have to > dump the certs to a file first. You can setup OpenSSL Contexts to validate based in-memory certificate as well: just add the certs one by one to the Context using the X509Store object you can obtain using context.get_cert_store(). > I'm planing to add the feature to Python 3.4, too. > http://bugs.python.org/issue17134 > > You can download the code from > > https://bitbucket.org/tiran/wincertstore I think this would be useful addition for pyOpenSSL as well - if it's possible to extract the Windows certificates without admin rights. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 21 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From christian at python.org Thu Mar 21 14:32:24 2013 From: christian at python.org (Christian Heimes) Date: Thu, 21 Mar 2013 14:32:24 +0100 Subject: [Catalog-sig] Access to Windows' cert store In-Reply-To: <514B03FA.7060605@egenix.com> References: <514AF7AF.7000304@python.org> <514B03FA.7060605@egenix.com> Message-ID: <514B0BE8.6010006@python.org> Am 21.03.2013 13:58, schrieb M.-A. Lemburg: > Why not simply use the Firefox certs ? > > We started adding these to our pyOpenSSL distribution with the last release: > https://cms.egenix.com/products/python/pyOpenSSL/doc/#Module_OpenSSL.ca_bundle Sure, that's another viable option. But IIRC some people have raised license concerns. > You can setup OpenSSL Contexts to validate based in-memory > certificate as well: just add the certs one by one to the > Context using the X509Store object you can obtain using > context.get_cert_store(). I assume you are talking about pyOpenSSL? I was referring to Python's SSL module. It can only load CA certs from a file or directory. It would be a useful feature for Python's SSL module, too. > I think this would be useful addition for pyOpenSSL as well - if > it's possible to extract the Windows certificates without admin > rights. The code works without special privileges. The MSDN references don't mention any restrictions, too. The code is rather simple -- I'm only using four functions and three structs. Christian From donald at stufft.io Thu Mar 21 14:40:15 2013 From: donald at stufft.io (Donald Stufft) Date: Thu, 21 Mar 2013 09:40:15 -0400 Subject: [Catalog-sig] Access to Windows' cert store In-Reply-To: <514B0BE8.6010006@python.org> References: <514AF7AF.7000304@python.org> <514B03FA.7060605@egenix.com> <514B0BE8.6010006@python.org> Message-ID: On Mar 21, 2013, at 9:32 AM, Christian Heimes wrote: > Am 21.03.2013 13:58, schrieb M.-A. Lemburg: >> Why not simply use the Firefox certs ? >> >> We started adding these to our pyOpenSSL distribution with the last release: >> https://cms.egenix.com/products/python/pyOpenSSL/doc/#Module_OpenSSL.ca_bundle > > Sure, that's another viable option. But IIRC some people have raised > license concerns. Firefox bundle is releases under the MPL which only applies to the individual files and not the entire project. > >> You can setup OpenSSL Contexts to validate based in-memory >> certificate as well: just add the certs one by one to the >> Context using the X509Store object you can obtain using >> context.get_cert_store(). > > I assume you are talking about pyOpenSSL? I was referring to Python's > SSL module. It can only load CA certs from a file or directory. It would > be a useful feature for Python's SSL module, too. > >> I think this would be useful addition for pyOpenSSL as well - if >> it's possible to extract the Windows certificates without admin >> rights. > > The code works without special privileges. The MSDN references don't > mention any restrictions, too. The code is rather simple -- I'm only > using four functions and three structs. I would love to see this added to Python Core. As it is right now if OpenSSL is configured correctly you can do `urllib.request.urlopen("?", cadefault=True)` and things will just work. This breaks down on Windows though. > > Christian > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From mal at egenix.com Thu Mar 21 15:01:08 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 21 Mar 2013 15:01:08 +0100 Subject: [Catalog-sig] Access to Windows' cert store In-Reply-To: <514B0BE8.6010006@python.org> References: <514AF7AF.7000304@python.org> <514B03FA.7060605@egenix.com> <514B0BE8.6010006@python.org> Message-ID: <514B12A4.4090105@egenix.com> On 21.03.2013 14:32, Christian Heimes wrote: > Am 21.03.2013 13:58, schrieb M.-A. Lemburg: >> Why not simply use the Firefox certs ? >> >> We started adding these to our pyOpenSSL distribution with the last release: >> https://cms.egenix.com/products/python/pyOpenSSL/doc/#Module_OpenSSL.ca_bundle > > Sure, that's another viable option. But IIRC some people have raised > license concerns. I think the more problematic aspect is not being able to easily update the CA list. Firefox and Windows do this automatically for you, but for Python, this could only be done with patch level releases. Still, it's better than not having access to any such CA list, so would be a good fallback solution. >> You can setup OpenSSL Contexts to validate based in-memory >> certificate as well: just add the certs one by one to the >> Context using the X509Store object you can obtain using >> context.get_cert_store(). > > I assume you are talking about pyOpenSSL? I was referring to Python's > SSL module. It can only load CA certs from a file or directory. It would > be a useful feature for Python's SSL module, too. Ah, right. >> I think this would be useful addition for pyOpenSSL as well - if >> it's possible to extract the Windows certificates without admin >> rights. > > The code works without special privileges. The MSDN references don't > mention any restrictions, too. The code is rather simple -- I'm only > using four functions and three structs. Nice. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 21 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Thu Mar 21 15:12:12 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 21 Mar 2013 14:12:12 +0000 (UTC) Subject: [Catalog-sig] Access to Windows' cert store References: <514AF7AF.7000304@python.org> Message-ID: Christian Heimes python.org> writes: > > I'm planing to add the feature to Python 3.4, too. > http://bugs.python.org/issue17134 > > You can download the code from > > https://bitbucket.org/tiran/wincertstore This is nice, but can you follow up on the bug tracker? It would be much more appropriate than catalog-sig. Also you shouldn't need to encode the certs into PEM format. AFAICT, SSL_CTX_get_cert_store(), d2i_X509_AUX() and X509_STORE_add_cert() should be sufficient. Regards Antoine. From pje at telecommunity.com Thu Mar 21 16:29:21 2013 From: pje at telecommunity.com (PJ Eby) Date: Thu, 21 Mar 2013 11:29:21 -0400 Subject: [Catalog-sig] Access to Windows' cert store In-Reply-To: <514AF7AF.7000304@python.org> References: <514AF7AF.7000304@python.org> Message-ID: On Thu, Mar 21, 2013 at 8:06 AM, Christian Heimes wrote: > Hi, > > the message is slightly off-topic but it might be interesting for pip, > setuptools and other developers that are working on HTTPS for PyPI. > > I while ago I found C++ example code that shows how to dump CA and CRL > certs from Windows's system cert store. The system cert store contains > the certificates used by Windows, IE etc. > > Yesterday I reimplemented the C++ code with Python and ctypes. I have > tested it with Python 2.6 to 3.3 (x86 and x86_64) on Windows 7. It > should work with Windows XP / Windows Server 2003 and all newer versions > of Windows. The output is usabl by Python's SSL module but you have to > dump the certs to a file first. > > I'm planing to add the feature to Python 3.4, too. > http://bugs.python.org/issue17134 > > You can download the code from > > https://bitbucket.org/tiran/wincertstore > Very nice! I definitely would like to use this for setuptools, but I actually want it for versions 2.3-2.5, which can't use requests or urllib3 or anything like that. So I hacked on the code a bit and got it to work (or at least got the __main__ stub to spit out a bunch of data) with Python 2.3 and ctypes 1.0.2 (the last standalone release for which Windows binaries are available). Would you like a patch? (Note: absolute_import, decorators, and the actual use of "with:" and generator expressions had to go, but this doesn't change any API or semantics as far as I can tell, just a bit of appearance here and there, and the code still runs with 2.4, 2.5, 2.7, 3.1, and 3.2 that I tried.) From christian at python.org Thu Mar 21 17:11:45 2013 From: christian at python.org (Christian Heimes) Date: Thu, 21 Mar 2013 17:11:45 +0100 Subject: [Catalog-sig] Access to Windows' cert store In-Reply-To: References: <514AF7AF.7000304@python.org> Message-ID: <514B3141.8040504@python.org> Am 21.03.2013 16:29, schrieb PJ Eby: > Very nice! I definitely would like to use this for setuptools, but I > actually want it for versions 2.3-2.5, which can't use requests or > urllib3 or anything like that. So I hacked on the code a bit and got > it to work (or at least got the __main__ stub to spit out a bunch of > data) with Python 2.3 and ctypes 1.0.2 (the last standalone release > for which Windows binaries are available). Would you like a patch? > > (Note: absolute_import, decorators, and the actual use of "with:" and > generator expressions had to go, but this doesn't change any API or > semantics as far as I can tell, just a bit of appearance here and > there, and the code still runs with 2.4, 2.5, 2.7, 3.1, and 3.2 that I > tried.) Sure, send me your patch and I'll add it later. Feel free to include a copy of the code in setuptools if you like. I don't mind as long as it keeps our users happy. ;) Christian From christian at python.org Thu Mar 21 17:22:19 2013 From: christian at python.org (Christian Heimes) Date: Thu, 21 Mar 2013 17:22:19 +0100 Subject: [Catalog-sig] Access to Windows' cert store In-Reply-To: References: <514AF7AF.7000304@python.org> Message-ID: <514B33BB.4070609@python.org> Am 21.03.2013 15:12, schrieb Antoine Pitrou: > This is nice, but can you follow up on the bug tracker? It would be much > more appropriate than catalog-sig. > > Also you shouldn't need to encode the certs into PEM format. AFAICT, > SSL_CTX_get_cert_store(), d2i_X509_AUX() and X509_STORE_add_cert() should > be sufficient. The code is a proof-of-concept. I want to test the feature and provide something that works without modification of Python stdlib code or a C extension. It's the only viable option for PIP and setuptools as it works out of the box. For Python 3.4 I don't want to use ctypes or PEM. The crypt32 API provides the certificates and CRLs either as PKCS#7 or DER binary data. I'll update the ticket as soon as I'm done with testing. Christian From richard at python.org Thu Mar 21 18:31:20 2013 From: richard at python.org (Richard Jones) Date: Thu, 21 Mar 2013 10:31:20 -0700 Subject: [Catalog-sig] Replacement client for pep381client In-Reply-To: References: Message-ID: On 20 March 2013 19:27, Christian Theune wrote: > On 2013-03-20 23:59:21 +0000, Christian Theune said: >> >> >> I'm currently re-initializing my own mirror. This basically can be run >> in-place by just removing the existing state data and calling my sync script >> (bsn-mirror) instead of pep381run with the same parameters. > > > This worked nicely for me - I'm running my mirror on bandersnatch now. Nice work, Christian, thanks! Richard From ct at gocept.com Thu Mar 21 18:18:15 2013 From: ct at gocept.com (Christian Theune) Date: Thu, 21 Mar 2013 10:18:15 -0700 Subject: [Catalog-sig] Replacement client for pep381client In-Reply-To: <20130321102824.GQ9677@merlinux.eu> References: <20130321102824.GQ9677@merlinux.eu> Message-ID: <23104ED9-7614-4165-8CE8-452553979BAC@gocept.com> On Mar 21, 2013, at 3:28 AM, holger krekel wrote: > On Wed, Mar 20, 2013 at 19:27 -0700, Christian Theune wrote: >> On 2013-03-20 23:59:21 +0000, Christian Theune said: >>> >>> I'm currently re-initializing my own mirror. This basically can be >>> run in-place by just removing the existing state data and calling >>> my sync script (bsn-mirror) instead of pep381run with the same >>> parameters. >> >> This worked nicely for me - I'm running my mirror on bandersnatch now. > > I got so far 3 errors like this one:: > > 2013-03-21 14:23:19,759 bandersnatch.package INFO: Downloading: https://pypi.python.org/packages/source/C/Clay/Clay-0.13.tar.gz > 2013-03-21 14:23:20,384 bandersnatch.package ERROR: Error syncing package: Coopr > Traceback (most recent call last): > File "/home/hpk/bandersnatch/src/bandersnatch/package.py", line 50, in sync > self.sync_release_files() > File "/home/hpk/bandersnatch/src/bandersnatch/package.py", line 68, in sync_release_files > self.download_file(release_file['url'], release_file['md5_digest']) > File "/home/hpk/bandersnatch/src/bandersnatch/package.py", line 144, in download_file > url, existing_hash, md5sum)) > ValueError: https://pypi.python.org/packages/source/C/Coopr/Coopr-1.1.zip has hash 97cb7ae47656df10d243533c4f0c63c1 instead of 7ed6916702b2afccd254b423450ac4af > > and the command terminates. I can restart fine, though. Will continue > to do continue and see how far i get. Seems to perform quickly, btw :) This is an interesting case: the data was downloaded from PyPI but didn't actually fit the md5sum that was announced. This kind of "should never happen" - but a subsequent run will retry gracefully. Good to hear that it feels fast. :) Christian -- Christian Theune ? ct at gocept.com gocept gmbh & co. kg ? Forsterstra?e 29 ? 06112 Halle (Saale) ? Germany http://gocept.com ? Tel +49 345 1229889-7 Python, Pyramid, Plone, Zope ? consulting, development, hosting, operations -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4334 bytes Desc: not available URL: From ct at gocept.com Thu Mar 21 23:15:34 2013 From: ct at gocept.com (Christian Theune) Date: Thu, 21 Mar 2013 15:15:34 -0700 Subject: [Catalog-sig] Replacement client for pep381client References: Message-ID: Hi, I'm slowly wrapping up my sprint. Here's what happened today: - fixed some errors reported by users - allow running a non-deleting mirror (with the hint that official ones must not do this) - add config file handling to avoid complicated command lines including some documentation how to handle them - add test coverage - add jenkins integration I got one error regarding filesystem encoding where I noticed that we expect that the mirror runs with UTF-8 as the filesystem encoding. I'm not sure whether just simply encoding the filenames myself is the right thing or whether I need to ask operators to tune their environment accordingly. I *guess* that just encoding manually to UTF-8 would be the right thing here. Can someone agree or disagree with this? If you already started using bandersnatch then you need to adapt your command line calls once again (the last time) and create a config file. Christian From lists at zopyx.com Fri Mar 22 05:44:29 2013 From: lists at zopyx.com (Andreas Jung) Date: Fri, 22 Mar 2013 05:44:29 +0100 Subject: [Catalog-sig] Replacement client for pep381client In-Reply-To: References: Message-ID: <514BE1AD.2040202@zopyx.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Christian Theune wrote: > Hi, > > I'm slowly wrapping up my sprint. Here's what happened today: > > - fixed some errors reported by users - allow running a non-deleting > mirror (with the hint that official ones must not do this) - add > config file handling to avoid complicated command lines including > some documentation how to handle them - add test coverage - add > jenkins integration > > I got one error regarding filesystem encoding where I noticed that > we expect that the mirror runs with UTF-8 as the filesystem > encoding. I'm not sure whether just simply encoding the filenames > myself is the right thing or whether I need to ask operators to tune > their environment accordingly. > > I *guess* that just encoding manually to UTF-8 would be the right > thing here. Can someone agree or disagree with this? I don't know much about filesystem encodings but if a FS encoding like >>> sys.getfilesystemencoding() 'ANSI_X3.4-1968' is a system-wide setting then it is unlikely that you make an encoding change a mandatory requirement. 'ANSI_X3.4-1968' is at least returned on my CentOS and Ubuntu box. Andreas -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQGUBAEBAgAGBQJRS+GtAAoJEADcfz7u4AZjs4QLwKo6fPhQQLEwy5LeMQ/BY8Ow Efh8ERnHxX+PJs684ie4w1ZUwj0hDx/TlK6NHVNIZarNKYo88M3+YJKD2NgHl2O+ FmFo3Pii/Lc0Wj5cX3wdl06Xn/YmDGFmxoBNOd9e2xnBkBhk9r6KtlJMAW1gnfAv qIsAN37uWsGnfDFyDvQTDbkjr7HxRoQ8PFNL66DzhDntgrBSHwX3U7dGraVFPSlD mRvxt+r+IlJEeE5GrD75t1N0MlrNZmcvGHyag1PSnmm1AAAqpflJKxAPZ8sV1KG2 BlxLRB0i4WboNWs0/OoNIH7fNdY0nng1mOCwNA5v5DEaWx1Gy59bK4LkbpNyB4kQ yRUnjf340b4qUNr/KGb2A4ePoV4TNzSB3eli1JMxGpEdJzdm2nVfICEjRIDc3m2K cRYjVC5FgGENPeQZ4kDteHmgA/Iu4Pxw6nFrxArKBBz9F6C9OWrPf6jiqWsKxZ0v fYLssMbGT8XkQb38TOn5yEharguEXBk= =SPdt -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: lists.vcf Type: text/x-vcard Size: 353 bytes Desc: not available URL: From techtonik at gmail.com Fri Mar 22 08:37:18 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 22 Mar 2013 10:37:18 +0300 Subject: [Catalog-sig] API for uploading packages to PyPI Message-ID: Hi, I understand that this will make PyPI a potential target for automated spam bots, but still it will be awesome to have an API to upload packages to PyPI. For example, I have a code that extract all necessary meta data for the package from the source file itself. It is even able to generate setup.py from this data. https://bitbucket.org/techtonik/astdump The next logical step in this chain is to teach it to upload stuff to PyPI. Now I thought that this setup.py is an unnecessary complication. What I need, ideally is just upload single .py file, or a JSON and a .tar.gz FWIW. Is there a straightforward API for things like that? Please, CC. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ronaldoussoren at mac.com Fri Mar 22 09:16:34 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 22 Mar 2013 09:16:34 +0100 Subject: [Catalog-sig] API for uploading packages to PyPI In-Reply-To: References: Message-ID: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com> On 22 Mar, 2013, at 8:37, anatoly techtonik wrote: > Hi, > > I understand that this will make PyPI a potential target for automated spam bots, but still it will be awesome to have an API to upload packages to PyPI. > > For example, I have a code that extract all necessary meta data for the package from the source file itself. It is even able to generate setup.py from this data. https://bitbucket.org/techtonik/astdump The next logical step in this chain is to teach it to upload stuff to PyPI. > > Now I thought that this setup.py is an unnecessary complication. What I need, ideally is just upload single .py file, or a JSON and a .tar.gz FWIW. Is there a straightforward API for things like that? Several APIs are documented on pages linked directly from the PyPI homepage (the Infrastructure box) Ronald > > Please, CC. > -- > anatoly t. > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Fri Mar 22 09:26:25 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 22 Mar 2013 11:26:25 +0300 Subject: [Catalog-sig] PyPI web interface UX (Was: API for uploading packages to PyPI) Message-ID: OMG. I didn't even looked at the boxes. IMHO somebody should reduce the amount of duplication and choices between menu and boxes. It is really-really overburdened. For example, no need to say "use search above" when it is evident the you need to find the button, or to use that "browse all packages" link when it is actually a first item on the menu. RSS should be moved out of scope of main menu. It is just "yikes!". The whole menu section with links to main Python web site should be moved out of place (is there a quick way to measure how many users follow these links)? Yes, I can send a patch if everyone agrees. -- anatoly t. On Fri, Mar 22, 2013 at 11:16 AM, Ronald Oussoren wrote: > > On 22 Mar, 2013, at 8:37, anatoly techtonik wrote: > > Hi, > > I understand that this will make PyPI a potential target for automated > spam bots, but still it will be awesome to have an API to upload packages > to PyPI. > > For example, I have a code that extract all necessary meta data for the > package from the source file itself. It is even able to generate setup.py > from this data. https://bitbucket.org/techtonik/astdump The next logical > step in this chain is to teach it to upload stuff to PyPI. > > Now I thought that this setup.py is an unnecessary complication. What I > need, ideally is just upload single .py file, or a JSON and a .tar.gz FWIW. > Is there a straightforward API for things like that? > > > Several APIs are documented on pages linked directly from the PyPI > homepage (the Infrastructure box) > > Ronald > > > Please, CC. > -- > anatoly t. > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Fri Mar 22 09:32:00 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 22 Mar 2013 11:32:00 +0300 Subject: [Catalog-sig] PyPI Crediting Message-ID: Does anybody think that PyPI source code base should include the names of the people who contributed to its development? -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Fri Mar 22 09:58:35 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 22 Mar 2013 11:58:35 +0300 Subject: [Catalog-sig] API for uploading packages to PyPI In-Reply-To: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com> References: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com> Message-ID: On Fri, Mar 22, 2013 at 11:16 AM, Ronald Oussoren wrote: > > On 22 Mar, 2013, at 8:37, anatoly techtonik wrote: > > Hi, > > I understand that this will make PyPI a potential target for automated > spam bots, but still it will be awesome to have an API to upload packages > to PyPI. > > For example, I have a code that extract all necessary meta data for the > package from the source file itself. It is even able to generate setup.py > from this data. https://bitbucket.org/techtonik/astdump The next logical > step in this chain is to teach it to upload stuff to PyPI. > > Now I thought that this setup.py is an unnecessary complication. What I > need, ideally is just upload single .py file, or a JSON and a .tar.gz FWIW. > Is there a straightforward API for things like that? > > > Several APIs are documented on pages linked directly from the PyPI > homepage (the Infrastructure box) > Thanks for the pointer. Some links are broken. I added redirects for wiki pages, but it will be better to fix links too. https://bitbucket.org/loewis/pypi/pull-request/4 Among those it seems that only OAuth API can be used to upload stuff. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ronaldoussoren at mac.com Fri Mar 22 10:04:24 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 22 Mar 2013 10:04:24 +0100 Subject: [Catalog-sig] API for uploading packages to PyPI In-Reply-To: References: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com> Message-ID: <1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com> On 22 Mar, 2013, at 9:58, anatoly techtonik wrote: > On Fri, Mar 22, 2013 at 11:16 AM, Ronald Oussoren wrote: > > On 22 Mar, 2013, at 8:37, anatoly techtonik wrote: > >> Hi, >> >> I understand that this will make PyPI a potential target for automated spam bots, but still it will be awesome to have an API to upload packages to PyPI. >> >> For example, I have a code that extract all necessary meta data for the package from the source file itself. It is even able to generate setup.py from this data. https://bitbucket.org/techtonik/astdump The next logical step in this chain is to teach it to upload stuff to PyPI. >> >> Now I thought that this setup.py is an unnecessary complication. What I need, ideally is just upload single .py file, or a JSON and a .tar.gz FWIW. Is there a straightforward API for things like that? > > Several APIs are documented on pages linked directly from the PyPI homepage (the Infrastructure box) > > Thanks for the pointer. > > Some links are broken. I added redirects for wiki pages, but it will be better to fix links too. The OAuth link appears to be broken, and that's likely part of the fallout of the wiki.python.org breakin. > https://bitbucket.org/loewis/pypi/pull-request/4 > > Among those it seems that only OAuth API can be used to upload stuff. I haven't looked at the code yet, but that's unlikely as distutils uses the HTTP API to upload files and AFAIK distutils doesn't implement OAuth. IIRC OAuth was added fairly recently to make it possible for users to delegate some permissions to external web applications (such as pythonpackages.com) without storing their password in those applications. Ronald -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Fri Mar 22 10:14:15 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 22 Mar 2013 10:14:15 +0100 Subject: [Catalog-sig] API for uploading packages to PyPI In-Reply-To: <1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com> References: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com> <1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com> Message-ID: <514C20E7.5040101@egenix.com> On 22.03.2013 10:04, Ronald Oussoren wrote: > > On 22 Mar, 2013, at 9:58, anatoly techtonik wrote: >> Some links are broken. I added redirects for wiki pages, but it will be better to fix links too. > The OAuth link appears to be broken, and that's likely part of the fallout of the wiki.python.org breakin. It is broken because of Anatoly's renaming. The new name is http://wiki.python.org/moin/PyPiOauth Anatoly: I don't consider such renaming for some perceived level of consistency important enough to warrant the breakage you are introducing to external links. Please don't ! -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 22 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Fri Mar 22 10:16:10 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 22 Mar 2013 10:16:10 +0100 Subject: [Catalog-sig] API for uploading packages to PyPI In-Reply-To: <514C20E7.5040101@egenix.com> References: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com> <1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com> <514C20E7.5040101@egenix.com> Message-ID: <514C215A.9010107@egenix.com> On 22.03.2013 10:14, M.-A. Lemburg wrote: > On 22.03.2013 10:04, Ronald Oussoren wrote: >> >> On 22 Mar, 2013, at 9:58, anatoly techtonik wrote: >>> Some links are broken. I added redirects for wiki pages, but it will be better to fix links too. >> The OAuth link appears to be broken, and that's likely part of the fallout of the wiki.python.org breakin. > > It is broken because of Anatoly's renaming. > > The new name is http://wiki.python.org/moin/PyPiOauth Sorry, that was the old name, which is now gone. The new name is http://wiki.python.org/moin/PyPIOAuth > Anatoly: I don't consider such renaming for some perceived level of > consistency important enough to warrant the breakage you are introducing > to external links. Please don't ! > -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 22 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Fri Mar 22 11:01:43 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 22 Mar 2013 11:01:43 +0100 Subject: [Catalog-sig] API for uploading packages to PyPI In-Reply-To: References: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com> Message-ID: <514C2C07.5090209@egenix.com> On 22.03.2013 09:58, anatoly techtonik wrote: > On Fri, Mar 22, 2013 at 11:16 AM, Ronald Oussoren wrote: > >> >> On 22 Mar, 2013, at 8:37, anatoly techtonik wrote: >> >> Hi, >> >> I understand that this will make PyPI a potential target for automated >> spam bots, but still it will be awesome to have an API to upload packages >> to PyPI. >> >> For example, I have a code that extract all necessary meta data for the >> package from the source file itself. It is even able to generate setup.py >> from this data. https://bitbucket.org/techtonik/astdump The next logical >> step in this chain is to teach it to upload stuff to PyPI. >> >> Now I thought that this setup.py is an unnecessary complication. What I >> need, ideally is just upload single .py file, or a JSON and a .tar.gz FWIW. >> Is there a straightforward API for things like that? Yes: The distutils upload command implements the API. It essentially uses the same HTML form interface as the PyPI UI. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 22 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Fri Mar 22 11:10:10 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 22 Mar 2013 11:10:10 +0100 Subject: [Catalog-sig] API for uploading packages to PyPI In-Reply-To: <514C215A.9010107@egenix.com> References: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com> <1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com> <514C20E7.5040101@egenix.com> <514C215A.9010107@egenix.com> Message-ID: <514C2E02.5060707@egenix.com> On 22.03.2013 10:16, M.-A. Lemburg wrote: > > > On 22.03.2013 10:14, M.-A. Lemburg wrote: >> On 22.03.2013 10:04, Ronald Oussoren wrote: >>> >>> On 22 Mar, 2013, at 9:58, anatoly techtonik wrote: >>>> Some links are broken. I added redirects for wiki pages, but it will be better to fix links too. >>> The OAuth link appears to be broken, and that's likely part of the fallout of the wiki.python.org breakin. >> >> It is broken because of Anatoly's renaming. >> >> The new name is http://wiki.python.org/moin/PyPiOauth > > Sorry, that was the old name, which is now gone. The new name is > http://wiki.python.org/moin/PyPIOAuth I added a redirect now to keep the old URL working. >> Anatoly: I don't consider such renaming for some perceived level of >> consistency important enough to warrant the breakage you are introducing >> to external links. Please don't ! -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 22 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From techtonik at gmail.com Fri Mar 22 11:25:35 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 22 Mar 2013 13:25:35 +0300 Subject: [Catalog-sig] API for uploading packages to PyPI In-Reply-To: <514C20E7.5040101@egenix.com> References: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com> <1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com> <514C20E7.5040101@egenix.com> Message-ID: On Fri, Mar 22, 2013 at 12:14 PM, M.-A. Lemburg wrote: > On 22.03.2013 10:04, Ronald Oussoren wrote: > > > > On 22 Mar, 2013, at 9:58, anatoly techtonik wrote: > >> Some links are broken. I added redirects for wiki pages, but it will be > better to fix links too. > > The OAuth link appears to be broken, and that's likely part of the > fallout of the wiki.python.org breakin. > > It is broken because of Anatoly's renaming. > > The new name is http://wiki.python.org/moin/PyPiOauth > > Anatoly: I don't consider such renaming for some perceived level of > consistency important enough to warrant the breakage you are introducing > to external links. Please don't ! > I've renamed PyPIOAuth this long before today and fixed all link on the wiki. I don't have any tools to monitor any external links in MoinMoin. It will be nice if you add this request to the internal backlog of tasks for the next order to pydotorg redesign from PSF. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ronaldoussoren at mac.com Fri Mar 22 11:31:12 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 22 Mar 2013 11:31:12 +0100 Subject: [Catalog-sig] API for uploading packages to PyPI In-Reply-To: References: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com> <1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com> <514C20E7.5040101@egenix.com> Message-ID: <07251834-8243-418F-BFCC-2D01565CAF0F@mac.com> On 22 Mar, 2013, at 11:25, anatoly techtonik wrote: > On Fri, Mar 22, 2013 at 12:14 PM, M.-A. Lemburg wrote: > On 22.03.2013 10:04, Ronald Oussoren wrote: > > > > On 22 Mar, 2013, at 9:58, anatoly techtonik wrote: > >> Some links are broken. I added redirects for wiki pages, but it will be better to fix links too. > > The OAuth link appears to be broken, and that's likely part of the fallout of the wiki.python.org breakin. > > It is broken because of Anatoly's renaming. > > The new name is http://wiki.python.org/moin/PyPiOauth > > Anatoly: I don't consider such renaming for some perceived level of > consistency important enough to warrant the breakage you are introducing > to external links. Please don't ! > > I've renamed PyPIOAuth this long before today and fixed all link on the wiki. I don't have any tools to monitor any external links in MoinMoin. It will be nice if you add this request to the internal backlog of tasks for the next order to pydotorg redesign from PSF. How would the PSF change links on other websites? Changing page names shouldn't be done lightly because this can, and for projects as popular as python almost certainly will, break links on other websites. Ronald -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Fri Mar 22 11:31:15 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 22 Mar 2013 13:31:15 +0300 Subject: [Catalog-sig] API for uploading packages to PyPI In-Reply-To: <514C2C07.5090209@egenix.com> References: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com> <514C2C07.5090209@egenix.com> Message-ID: On Fri, Mar 22, 2013 at 1:01 PM, M.-A. Lemburg wrote: > On 22.03.2013 09:58, anatoly techtonik wrote: > > On Fri, Mar 22, 2013 at 11:16 AM, Ronald Oussoren < > ronaldoussoren at mac.com>wrote: > > > >> > >> On 22 Mar, 2013, at 8:37, anatoly techtonik > wrote: > >> > >> Hi, > >> > >> I understand that this will make PyPI a potential target for automated > >> spam bots, but still it will be awesome to have an API to upload > packages > >> to PyPI. > >> > >> For example, I have a code that extract all necessary meta data for the > >> package from the source file itself. It is even able to generate > setup.py > >> from this data. https://bitbucket.org/techtonik/astdump The next > logical > >> step in this chain is to teach it to upload stuff to PyPI. > >> > >> Now I thought that this setup.py is an unnecessary complication. What I > >> need, ideally is just upload single .py file, or a JSON and a .tar.gz > FWIW. > >> Is there a straightforward API for things like that? > > Yes: The distutils upload command implements the API. It essentially > uses the same HTML form interface as the PyPI UI. And where is this API defined? -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Fri Mar 22 11:42:45 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 22 Mar 2013 13:42:45 +0300 Subject: [Catalog-sig] API for uploading packages to PyPI In-Reply-To: <07251834-8243-418F-BFCC-2D01565CAF0F@mac.com> References: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com> <1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com> <514C20E7.5040101@egenix.com> <07251834-8243-418F-BFCC-2D01565CAF0F@mac.com> Message-ID: On Fri, Mar 22, 2013 at 1:31 PM, Ronald Oussoren wrote: > > On 22 Mar, 2013, at 11:25, anatoly techtonik wrote: > > On Fri, Mar 22, 2013 at 12:14 PM, M.-A. Lemburg wrote: > >> On 22.03.2013 10:04, Ronald Oussoren wrote: >> > >> > On 22 Mar, 2013, at 9:58, anatoly techtonik >> wrote: >> >> Some links are broken. I added redirects for wiki pages, but it will >> be better to fix links too. >> > The OAuth link appears to be broken, and that's likely part of the >> fallout of the wiki.python.org breakin. >> >> It is broken because of Anatoly's renaming. >> >> The new name is http://wiki.python.org/moin/PyPiOauth >> >> Anatoly: I don't consider such renaming for some perceived level of >> consistency important enough to warrant the breakage you are introducing >> to external links. Please don't ! >> > > I've renamed PyPIOAuth this long before today and fixed all link on the > wiki. I don't have any tools to monitor any external links in MoinMoin. It > will be nice if you add this request to the internal backlog of tasks for > the next order to pydotorg redesign from PSF. > > > How would the PSF change links on other websites? Changing page names > shouldn't be done lightly because this can, and for projects as popular as > python almost certainly will, break links on other websites. > 1. I am the editor of my changes. Not that PSF guy who owns all the stuff out there and makes himself important by "taking responsibility" over what I do. If I had the information about external sources linking to this page, I'd considered contacting these source for update. I mean written the letter here earlier. =) 2. The change requested it to enable tracking of incoming sources for MoinMoin pages. It is exactly for the purpose you mentioned - to remove any fear, uncertainty and despair from people editing the wiki that their change may or may not break anything. Many wiki pages don't have any external references at all and should be reorganized to make somewhat logical structure from that pile of data. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Fri Mar 22 11:49:12 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 22 Mar 2013 11:49:12 +0100 Subject: [Catalog-sig] API for uploading packages to PyPI In-Reply-To: References: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com> <1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com> <514C20E7.5040101@egenix.com> Message-ID: <514C3728.5090106@egenix.com> On 22.03.2013 11:25, anatoly techtonik wrote: > On Fri, Mar 22, 2013 at 12:14 PM, M.-A. Lemburg wrote: > >> On 22.03.2013 10:04, Ronald Oussoren wrote: >>> >>> On 22 Mar, 2013, at 9:58, anatoly techtonik wrote: >>>> Some links are broken. I added redirects for wiki pages, but it will be >> better to fix links too. >>> The OAuth link appears to be broken, and that's likely part of the >> fallout of the wiki.python.org breakin. >> >> It is broken because of Anatoly's renaming. >> >> The new name is http://wiki.python.org/moin/PyPiOauth >> >> Anatoly: I don't consider such renaming for some perceived level of >> consistency important enough to warrant the breakage you are introducing >> to external links. Please don't ! >> > > I've renamed PyPIOAuth this long before today and fixed all link on the > wiki. I don't have any tools to monitor any external links in MoinMoin. It > will be nice if you add this request to the internal backlog of tasks for > the next order to pydotorg redesign from PSF. There's no point in adding more work for everyone just because you feel there's an inconsistency in naming. It's also quite impossible to change all the links on the Internet pointing to our wiki pages, even if you knew who to contact. Again: Please don't do this. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 22 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ronaldoussoren at mac.com Fri Mar 22 11:49:20 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 22 Mar 2013 11:49:20 +0100 Subject: [Catalog-sig] API for uploading packages to PyPI In-Reply-To: References: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com> <1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com> <514C20E7.5040101@egenix.com> <07251834-8243-418F-BFCC-2D01565CAF0F@mac.com> Message-ID: <8E7AAF1A-E3E0-4A32-B45F-8EEBF8413633@mac.com> On 22 Mar, 2013, at 11:42, anatoly techtonik wrote: >> >> >> I've renamed PyPIOAuth this long before today and fixed all link on the wiki. I don't have any tools to monitor any external links in MoinMoin. It will be nice if you add this request to the internal backlog of tasks for the next order to pydotorg redesign from PSF. > > > How would the PSF change links on other websites? Changing page names shouldn't be done lightly because this can, and for projects as popular as python almost certainly will, break links on other websites. > > 1. I am the editor of my changes. Not that PSF guy who owns all the stuff out there and makes himself important by "taking responsibility" over what I do. If I had the information about external sources linking to this page, I'd considered contacting these source for update. I mean written the letter here earlier. =) You do know how the internet works do you? It is possible to scrape logs for referer URLs, but that's a guestimate at best and won't find referals from locations that aren't websites (such as books pointing to an URL for more information, or links from desktop application). Finding contact information for websites is non-trivial as well. Ronald -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Fri Mar 22 13:20:15 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 22 Mar 2013 15:20:15 +0300 Subject: [Catalog-sig] API for uploading packages to PyPI In-Reply-To: <514C3728.5090106@egenix.com> References: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com> <1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com> <514C20E7.5040101@egenix.com> <514C3728.5090106@egenix.com> Message-ID: On Fri, Mar 22, 2013 at 1:49 PM, M.-A. Lemburg wrote: > On 22.03.2013 11:25, anatoly techtonik wrote: > > On Fri, Mar 22, 2013 at 12:14 PM, M.-A. Lemburg wrote: > > > >> On 22.03.2013 10:04, Ronald Oussoren wrote: > >>> > >>> On 22 Mar, 2013, at 9:58, anatoly techtonik > wrote: > >>>> Some links are broken. I added redirects for wiki pages, but it will > be > >> better to fix links too. > >>> The OAuth link appears to be broken, and that's likely part of the > >> fallout of the wiki.python.org breakin. > >> > >> It is broken because of Anatoly's renaming. > >> > >> The new name is http://wiki.python.org/moin/PyPiOauth > >> > >> Anatoly: I don't consider such renaming for some perceived level of > >> consistency important enough to warrant the breakage you are introducing > >> to external links. Please don't ! > >> > > > > I've renamed PyPIOAuth this long before today and fixed all link on the > > wiki. I don't have any tools to monitor any external links in MoinMoin. > It > > will be nice if you add this request to the internal backlog of tasks for > > the next order to pydotorg redesign from PSF. > > There's no point in adding more work for everyone just because > you feel there's an inconsistency in naming. It's also quite impossible > to change all the links on the Internet pointing to our wiki > pages, even if you knew who to contact. > hg clone https://bitbucket.org/loewis/pypi cd pypi hg pull https://bitbucket.org/techtonik/pypi-contents hg push For changing these links it should be proven that they exist first. Anyway, I don't want to fix all the links on the internet, but since I've already fixed those on PyPI, all it takes to apply the fix is to copy/paste these 4 commands into the console. Not much work, really. ;) > Again: Please don't do this. > I think you're not against renaming pages, but against renaming without redirects. In fact, if MoinMoin could automatically insert #REDIRECT directives when a page is renamed, then there won't be any problem like this at all. I hope that pydotorg@ or infrastructure@ have this item on their feature lists. OT: Speaking of the links and leaving them as-is. IMHO having a clean outlook has direct influence on the attractiveness of the project. Having some obvious stuff to fix on the main page is a motivation for me (as a bad coder who like to hack) to go download and fix the stuff. But inconsistency in URL design and nits in overall site image (no credits, no license, no "fork me on github", strange layout and no solid design, no reference to framework used in the footer and many more other subjective factors) have direct influence on desire to contribute from somebody more serious. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Fri Mar 22 13:26:30 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 22 Mar 2013 13:26:30 +0100 Subject: [Catalog-sig] API for uploading packages to PyPI In-Reply-To: References: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com> <1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com> <514C20E7.5040101@egenix.com> <514C3728.5090106@egenix.com> Message-ID: <514C4DF6.7050803@egenix.com> On 22.03.2013 13:20, anatoly techtonik wrote: > On Fri, Mar 22, 2013 at 1:49 PM, M.-A. Lemburg wrote: >> Again: Please don't do this. >> > > I think you're not against renaming pages, but against renaming without > redirects. In fact, if MoinMoin could automatically insert #REDIRECT > directives when a page is renamed, then there won't be any problem like > this at all. I hope that pydotorg@ or infrastructure@ have this item on > their feature lists. You can add redirects from the page names you think are more correct to the existing ones, but please don't rename the pages themselves. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 22 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From techtonik at gmail.com Fri Mar 22 13:38:51 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 22 Mar 2013 15:38:51 +0300 Subject: [Catalog-sig] API for uploading packages to PyPI In-Reply-To: <514C4DF6.7050803@egenix.com> References: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com> <1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com> <514C20E7.5040101@egenix.com> <514C3728.5090106@egenix.com> <514C4DF6.7050803@egenix.com> Message-ID: On Fri, Mar 22, 2013 at 3:26 PM, M.-A. Lemburg wrote: > On 22.03.2013 13:20, anatoly techtonik wrote: > > On Fri, Mar 22, 2013 at 1:49 PM, M.-A. Lemburg wrote: > >> Again: Please don't do this. > >> > > > > I think you're not against renaming pages, but against renaming without > > redirects. In fact, if MoinMoin could automatically insert #REDIRECT > > directives when a page is renamed, then there won't be any problem like > > this at all. I hope that pydotorg@ or infrastructure@ have this item on > > their feature lists. > > You can add redirects from the page names you think are more > correct to the existing ones, but please don't rename the pages > themselves. You need to expand that, because I don't get it. Why do you want the canonical pages about PyPI JSON API to bear the name of PyPiJson? This name is hard to synthesize if you want to type in directly into the URL without waiting for the page to load to click a link or use search field. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Fri Mar 22 14:17:04 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 22 Mar 2013 14:17:04 +0100 Subject: [Catalog-sig] API for uploading packages to PyPI In-Reply-To: References: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com> <1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com> <514C20E7.5040101@egenix.com> <514C3728.5090106@egenix.com> <514C4DF6.7050803@egenix.com> Message-ID: <514C59D0.4040607@egenix.com> On 22.03.2013 13:38, anatoly techtonik wrote: > On Fri, Mar 22, 2013 at 3:26 PM, M.-A. Lemburg wrote: > >> On 22.03.2013 13:20, anatoly techtonik wrote: >>> On Fri, Mar 22, 2013 at 1:49 PM, M.-A. Lemburg wrote: >>>> Again: Please don't do this. >>>> >>> >>> I think you're not against renaming pages, but against renaming without >>> redirects. In fact, if MoinMoin could automatically insert #REDIRECT >>> directives when a page is renamed, then there won't be any problem like >>> this at all. I hope that pydotorg@ or infrastructure@ have this item on >>> their feature lists. >> >> You can add redirects from the page names you think are more >> correct to the existing ones, but please don't rename the pages >> themselves. > > > You need to expand that, because I don't get it. Why do you want the > canonical pages about PyPI JSON API to bear the name of PyPiJson? This name > is hard to synthesize if you want to type in directly into the URL without > waiting for the page to load to click a link or use search field. It's not about which name I want. It's about the name of the page that was used to add content and which has been around long enough to assume that others have linked to it. With the redirect from the new name to the existing one, you get what you want and all others can continue to use the existing name. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 22 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From techtonik at gmail.com Fri Mar 22 23:33:13 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Sat, 23 Mar 2013 01:33:13 +0300 Subject: [Catalog-sig] API for uploading packages to PyPI In-Reply-To: <514C59D0.4040607@egenix.com> References: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com> <1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com> <514C20E7.5040101@egenix.com> <514C3728.5090106@egenix.com> <514C4DF6.7050803@egenix.com> <514C59D0.4040607@egenix.com> Message-ID: On Fri, Mar 22, 2013 at 4:17 PM, M.-A. Lemburg wrote: > On 22.03.2013 13:38, anatoly techtonik wrote: > > On Fri, Mar 22, 2013 at 3:26 PM, M.-A. Lemburg wrote: > > > >> On 22.03.2013 13:20, anatoly techtonik wrote: > >>> On Fri, Mar 22, 2013 at 1:49 PM, M.-A. Lemburg wrote: > >>>> Again: Please don't do this. > >>>> > >>> > >>> I think you're not against renaming pages, but against renaming without > >>> redirects. In fact, if MoinMoin could automatically insert #REDIRECT > >>> directives when a page is renamed, then there won't be any problem like > >>> this at all. I hope that pydotorg@ or infrastructure@ have this item > on > >>> their feature lists. > >> > >> You can add redirects from the page names you think are more > >> correct to the existing ones, but please don't rename the pages > >> themselves. > > > > > > You need to expand that, because I don't get it. Why do you want the > > canonical pages about PyPI JSON API to bear the name of PyPiJson? This > name > > is hard to synthesize if you want to type in directly into the URL > without > > waiting for the page to load to click a link or use search field. > > It's not about which name I want. It's about the name of the page > that was used to add content and which has been around long enough > to assume that others have linked to it. > > With the redirect from the new name to the existing one, > you get what you want and all others can continue to use > the existing name. All right. So it is the matter of using old name or the new name. But both names lead to the same page. So the point of conflict here is what should be the end name of this page. If you say that it is not about which name do you want, then say why this name should not be the name I want? I want canonical names for pages. Names that are consistent, which capitalization is easy to remember and reproduce, and I want that people linked to these names directly to avoid double redirects. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Sat Mar 23 13:11:17 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 23 Mar 2013 13:11:17 +0100 Subject: [Catalog-sig] API for uploading packages to PyPI In-Reply-To: References: <2ABAC39D-08E9-4EBD-94FB-E47AE6573A35@mac.com> <1DD1B39E-4A44-4749-A0CE-602D47D137DF@mac.com> <514C20E7.5040101@egenix.com> <514C3728.5090106@egenix.com> <514C4DF6.7050803@egenix.com> <514C59D0.4040607@egenix.com> Message-ID: <514D9BE5.5090200@egenix.com> On 22.03.2013 23:33, anatoly techtonik wrote: > On Fri, Mar 22, 2013 at 4:17 PM, M.-A. Lemburg wrote: > >> On 22.03.2013 13:38, anatoly techtonik wrote: >>> On Fri, Mar 22, 2013 at 3:26 PM, M.-A. Lemburg wrote: >>> >>>> On 22.03.2013 13:20, anatoly techtonik wrote: >>>>> On Fri, Mar 22, 2013 at 1:49 PM, M.-A. Lemburg wrote: >>>>>> Again: Please don't do this. >>>>>> >>>>> >>>>> I think you're not against renaming pages, but against renaming without >>>>> redirects. In fact, if MoinMoin could automatically insert #REDIRECT >>>>> directives when a page is renamed, then there won't be any problem like >>>>> this at all. I hope that pydotorg@ or infrastructure@ have this item >> on >>>>> their feature lists. >>>> >>>> You can add redirects from the page names you think are more >>>> correct to the existing ones, but please don't rename the pages >>>> themselves. >>> >>> >>> You need to expand that, because I don't get it. Why do you want the >>> canonical pages about PyPI JSON API to bear the name of PyPiJson? This >> name >>> is hard to synthesize if you want to type in directly into the URL >> without >>> waiting for the page to load to click a link or use search field. >> >> It's not about which name I want. It's about the name of the page >> that was used to add content and which has been around long enough >> to assume that others have linked to it. >> >> With the redirect from the new name to the existing one, >> you get what you want and all others can continue to use >> the existing name. > > > All right. So it is the matter of using old name or the new name. But both > names lead to the same page. So the point of conflict here is what should > be the end name of this page. If you say that it is not about which name > do you want, then say why this name should not be the name I want? The person who created the pages got to chose. There's nothing much to argue here. > I want canonical names for pages. Names that are consistent, which > capitalization is easy to remember and reproduce, and I want that people > linked to these names directly to avoid double redirects. That's fine: for pages that you create, you get to chose. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 23 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-03-13: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go39 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ct at gocept.com Mon Mar 25 18:45:26 2013 From: ct at gocept.com (Christian Theune) Date: Mon, 25 Mar 2013 18:45:26 +0100 Subject: [Catalog-sig] Replacement client for pep381client In-Reply-To: <514BE1AD.2040202@zopyx.com> References: <514BE1AD.2040202@zopyx.com> Message-ID: <8C6E3EE5-4B41-456F-BD1C-1FC9B191EA01@gocept.com> Hi, On Mar 22, 2013, at 5:44 AM, Andreas Jung wrote: > > I don't know much about filesystem encodings but if a FS encoding like > >>>> sys.getfilesystemencoding() > 'ANSI_X3.4-1968' > > is a system-wide setting then it is unlikely that you make an encoding > change a mandatory requirement. 'ANSI_X3.4-1968' is at least returned > on my CentOS and Ubuntu box. Reading up on the VFS unicode handling it appears that we just need to treat everything as bytestrings and encode it ourselves. The locale setting is really just an environment variable influencing library behaviour (like glib) - the kernel doesn't seem to care except for '/' and '\0'. However, you may also need to make sure that your web server treats the unicode URLs correctly and uses UTF-8 as the encoding for looking up the filenames. I have applied a fix forcing the filenames to always be encoded as UTF-8. Christian -- Christian Theune ? ct at gocept.com gocept gmbh & co. kg ? Forsterstra?e 29 ? 06112 Halle (Saale) ? Germany http://gocept.com ? Tel +49 345 1229889-7 Python, Pyramid, Plone, Zope ? consulting, development, hosting, operations -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4334 bytes Desc: not available URL: From chris at simplistix.co.uk Tue Mar 26 15:02:21 2013 From: chris at simplistix.co.uk (Chris Withers) Date: Tue, 26 Mar 2013 14:02:21 +0000 Subject: [Catalog-sig] error trying to upload by package Message-ID: <5151AA6D.5080704@simplistix.co.uk> Hi All, I have a package called files: https://github.com/Simplistix/files ...but I get a 403 when I try to register it on PyPI. Why is that? cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From donald at stufft.io Tue Mar 26 15:04:50 2013 From: donald at stufft.io (Donald Stufft) Date: Tue, 26 Mar 2013 10:04:50 -0400 Subject: [Catalog-sig] error trying to upload by package In-Reply-To: <5151AA6D.5080704@simplistix.co.uk> References: <5151AA6D.5080704@simplistix.co.uk> Message-ID: On Mar 26, 2013, at 10:02 AM, Chris Withers wrote: > Hi All, > > I have a package called files: https://github.com/Simplistix/files > > ...but I get a 403 when I try to register it on PyPI. > > Why is that? > > cheers, > > Chris > > -- > Simplistix - Content Management, Batch Processing & Python Consulting > - http://www.simplistix.co.uk > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig Someone already has a package by that name. https://pypi.python.org/pypi/files ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From jcea at jcea.es Tue Mar 26 21:07:43 2013 From: jcea at jcea.es (Jesus Cea) Date: Tue, 26 Mar 2013 21:07:43 +0100 Subject: [Catalog-sig] Suscribing to PYPI projects Message-ID: <5152000F.6050308@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I wonder if would be too difficult to be able to subscribe to projects in PYPI, to be notified if a new version is available. An option to PIP & family to verify local versions with PYPI versions, and report old version would be useful too. - -- Jes?s Cea Avi?n _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ Twitter: @jcea _/_/ _/_/ _/_/_/_/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQCVAwUBUVIAD5lgi5GaxT1NAQJtfwQAhTeby09fEx/0smKy+FKKP+YacAHyfvY1 HvuxsipLanFaiCcRaxWzyzN9+2hUqD88BtUGzgNdqGS52ePxDg5dTC8u4IC0grMU vk96tl0zMg3R4GraCzsShKGJm8arpdUfWJZXGy+FxMh7XYnrHWZkItUAHTWLuf7A beTiCnZhuGY= =s0VI -----END PGP SIGNATURE----- From richard at python.org Wed Mar 27 05:26:49 2013 From: richard at python.org (Richard Jones) Date: Wed, 27 Mar 2013 15:26:49 +1100 Subject: [Catalog-sig] Suscribing to PYPI projects In-Reply-To: <5152000F.6050308@jcea.es> References: <5152000F.6050308@jcea.es> Message-ID: This does come up a fair bit but is not something that's planned for the current incarnation of PyPI. Richard On 27 March 2013 07:07, Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I wonder if would be too difficult to be able to subscribe to projects > in PYPI, to be notified if a new version is available. > > An option to PIP & family to verify local versions with PYPI versions, > and report old version would be useful too. > > - -- > Jes?s Cea Avi?n _/_/ _/_/_/ _/_/_/ > jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ > Twitter: @jcea _/_/ _/_/ _/_/_/_/_/ > jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/ _/_/ _/_/ > "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ > "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ > "El amor es poner tu felicidad en la felicidad de otro" - Leibniz > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iQCVAwUBUVIAD5lgi5GaxT1NAQJtfwQAhTeby09fEx/0smKy+FKKP+YacAHyfvY1 > HvuxsipLanFaiCcRaxWzyzN9+2hUqD88BtUGzgNdqGS52ePxDg5dTC8u4IC0grMU > vk96tl0zMg3R4GraCzsShKGJm8arpdUfWJZXGy+FxMh7XYnrHWZkItUAHTWLuf7A > beTiCnZhuGY= > =s0VI > -----END PGP SIGNATURE----- > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From aclark at aclark.net Wed Mar 27 16:27:50 2013 From: aclark at aclark.net (Alex Clark) Date: Wed, 27 Mar 2013 11:27:50 -0400 Subject: [Catalog-sig] Suscribing to PYPI projects References: <5152000F.6050308@jcea.es> Message-ID: On 2013-03-26 20:07:43 +0000, Jesus Cea said: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I wonder if would be too difficult to be able to subscribe to projects > in PYPI, to be notified if a new version is available. Have you seen: https://bundlescout.com/ > > An option to PIP & family to verify local versions with PYPI versions, > and report old version would be useful too. > > - --Jes?s Cea Avi?n _/_/ _/_/_/ _/_/_/ > jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ > Twitter: @jcea _/_/ _/_/ _/_/_/_/_/ > jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/ _/_/ _/_/ > "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ > "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ > "El amor es poner tu felicidad en la felicidad de otro" - Leibniz > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iQCVAwUBUVIAD5lgi5GaxT1NAQJtfwQAhTeby09fEx/0smKy+FKKP+YacAHyfvY1 > HvuxsipLanFaiCcRaxWzyzN9+2hUqD88BtUGzgNdqGS52ePxDg5dTC8u4IC0grMU > vk96tl0zMg3R4GraCzsShKGJm8arpdUfWJZXGy+FxMh7XYnrHWZkItUAHTWLuf7A > beTiCnZhuGY=s0VI > -----END PGP SIGNATURE----- -- Alex Clark ? http://about.me/alex.clark From lists at zopyx.com Wed Mar 27 16:54:19 2013 From: lists at zopyx.com (Andreas Jung) Date: Wed, 27 Mar 2013 16:54:19 +0100 Subject: [Catalog-sig] c.pypi.python.org - IP address change Message-ID: <5153162B.5030103@zopyx.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi there, I moved my c.pypi.python.org mirror to a new faster machine. Please update the DNS entry to 176.9.146.29. This mirror is running on top of Christian Theune's bandersnatch implementation. Andrreas - -- ZOPYX Limited | Python | Zope | Plone | MongoDB Hundskapfklinge 33 | Consulting & Development D-72074 T?bingen | Electronic Publishing Solutions www.zopyx.com | Scalable Web Solutions - -------------------------------------------------- Produce & Publish - www.produce-and-publish.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQGUBAEBAgAGBQJRUxYrAAoJEADcfz7u4AZj+/kLwLo1gWKQfaVXMjJxN+6Ouens 0+ODnkdGhXFBAcwHM+VpBumFCodST8Cc3iEIT6EGK9HVZEMh7w9cBBO/jKrFX87K 7FYNdybEu81BLa1DxuZh3ux8xDC/bDj4lArYJLF3VcjSL2ZtQTaNyScb/u3n5VR2 pWFKppwF6VQ3P1n5RdmzAHIzF6XGixlR7kpKRJVS37ADfl8yR7ZB7frXzhux6qDn f5c32QccT5RLKUk6R46GQU8+nHRVRVqum/hep5hX2wXVTeKfuEa8+MZOa/Ooot9r P8Z1nBIbteivg0hpmX5b0G00h+DQkd29TP7wF/JZwwzu1bc5wXNCVpnNeXDV4Bi3 ON9uZnKFCSxLEKznPQaf3ZiPagxwX8fs/RrK/isO0MyW3HKqaDb77N0biAdWipt5 Mnv6XSyKK5CHte5JVtnpT4UqbLFKMEydQoK8JhYEBwgJABaNuHkYfKNVVDNN8G6K X7mHX5f7ykirQXwUSfneLY2JYz7jn7o= =+lq1 -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: lists.vcf Type: text/x-vcard Size: 353 bytes Desc: not available URL: From r1chardj0n3s at gmail.com Thu Mar 28 01:29:54 2013 From: r1chardj0n3s at gmail.com (Richard Jones) Date: Thu, 28 Mar 2013 11:29:54 +1100 Subject: [Catalog-sig] PEP 438 progress update Message-ID: Hi all, It was my intention to formally accept the PEP and deploy the implementation to the production PyPI when I got back home this week, but things have been quite hectic and I've not found the time to perform the pre-deployment tasks needed (specifically steps 5, 6 and 7 of the first transition phase; determining the various email lists I need to inform people of the changes.) At the moment I'm not sure when I'll have time to do that; hopefully next week some time but it could be as long as four weeks before I get sufficient tuits(round). Richard From lists at zopyx.com Thu Mar 28 11:33:43 2013 From: lists at zopyx.com (Andreas Jung) Date: Thu, 28 Mar 2013 11:33:43 +0100 Subject: [Catalog-sig] c.pypi.python.org - IP address change In-Reply-To: <5153162B.5030103@zopyx.com> References: <5153162B.5030103@zopyx.com> Message-ID: <51541C87.6090508@zopyx.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Will somone change the DNS entry? Andreas Andreas Jung wrote: > Hi there, > > I moved my c.pypi.python.org mirror to a new faster machine. > > Please update the DNS entry to 176.9.146.29. > > This mirror is running on top of Christian Theune's bandersnatch > implementation. > > Andrreas > > > _______________________________________________ Catalog-SIG mailing > list Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig - -- ZOPYX Limited | Python | Zope | Plone | MongoDB Hundskapfklinge 33 | Consulting & Development D-72074 T?bingen | Electronic Publishing Solutions www.zopyx.com | Scalable Web Solutions - -------------------------------------------------- Produce & Publish - www.produce-and-publish.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQGUBAEBAgAGBQJRVByHAAoJEADcfz7u4AZjFKQLwNIbXmIzphbmQvYGwDHouwVp G2hblX/OHB7kTrrQHVwa+KnacIOL37dwEkjAqI7aK8l4UF3Prizn7P3XoS0KKvhS 43A4uGgJTvD4d3c6k+pkAQeHbDgqdojQ6jTZf4s2ogWp8lQXuZkETXBpqx8vPpJ3 Y9dUfjP/EhjhsBuZuJNApC/9xHYe+MfdgpYLHXrqk2QQQ2QxyuoMR+W9FR4GWh1U KLAXVKp7lTXvZGrQ1cayZQo7IA5U5f8+N3HyISZ6bD+AvNKaKRaWgNSggYs4y5tQ fwqlQp08BoDj6Xni2JzbCJ7ZkzsHbkG0IJ9ZZpDyBTeOWFQBXV2AFSZ8Zx1nPmGm Z2Mbp4lLrUvp6WVCjSQ/rvOEe6yk2OxaWvlBiJPJRmfzlco0XNX93bRnxiKPkcpH eGvgRXQ2nNJEYWUD6nBeBUA4bJen59/4b+Pm4AMoOo+fhHwd7kjIBK1/e8PUqSqT r07IChjG+jwp8vjclD35GS9PMH0KQwY= =GcLZ -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: lists.vcf Type: text/x-vcard Size: 353 bytes Desc: not available URL: From donald at stufft.io Thu Mar 28 19:22:59 2013 From: donald at stufft.io (Donald Stufft) Date: Thu, 28 Mar 2013 14:22:59 -0400 Subject: [Catalog-sig] Merge catalog-sig and distutils-sig Message-ID: Is there much point in keeping catalog-sig and distutils-sig separate? It seems to me that most of the same people are on both lists, and the topics almost always have consequences to both sides of the coin. So much so that it's often hard to pick *which* of the two (or both) lists you post too. Further confused by the fact that distutils is hopefully someday going to go away :) Not sure if there's some official process for requesting it or not, but I think we should merge the two lists and just make packaging-sig to umbrella the entire packaging topics. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From jacob at jacobian.org Thu Mar 28 19:26:05 2013 From: jacob at jacobian.org (Jacob Kaplan-Moss) Date: Thu, 28 Mar 2013 13:26:05 -0500 Subject: [Catalog-sig] Merge catalog-sig and distutils-sig In-Reply-To: References: Message-ID: As a mostly-lurker on both who would love to cut down on the number of lists I have to follow: a hearty +1! Jacob On Thu, Mar 28, 2013 at 1:22 PM, Donald Stufft wrote: > Is there much point in keeping catalog-sig and distutils-sig separate? > > It seems to me that most of the same people are on both lists, and the topics almost always have consequences to both sides of the coin. So much so that it's often hard to pick *which* of the two (or both) lists you post too. Further confused by the fact that distutils is hopefully someday going to go away :) > > Not sure if there's some official process for requesting it or not, but I think we should merge the two lists and just make packaging-sig to umbrella the entire packaging topics. > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From jim at zope.com Thu Mar 28 19:28:35 2013 From: jim at zope.com (Jim Fulton) Date: Thu, 28 Mar 2013 14:28:35 -0400 Subject: [Catalog-sig] Merge catalog-sig and distutils-sig In-Reply-To: References: Message-ID: On Thu, Mar 28, 2013 at 2:22 PM, Donald Stufft wrote: > Is there much point in keeping catalog-sig and distutils-sig separate? Not IMO. > It seems to me that most of the same people are on both lists, and the topics almost always have consequences to both sides of the coin. So much so that it's often hard to pick *which* of the two (or both) lists you post too. Further confused by the fact that distutils is hopefully someday going to go away :) > > Not sure if there's some official process for requesting it or not, but I think we should merge the two lists and just make packaging-sig to umbrella the entire packaging topics. +1 Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton From holger at merlinux.eu Thu Mar 28 20:11:44 2013 From: holger at merlinux.eu (holger krekel) Date: Thu, 28 Mar 2013 19:11:44 +0000 Subject: [Catalog-sig] Merge catalog-sig and distutils-sig In-Reply-To: References: Message-ID: <20130328191144.GL9677@merlinux.eu> On Thu, Mar 28, 2013 at 14:22 -0400, Donald Stufft wrote: > Is there much point in keeping catalog-sig and distutils-sig separate? > > It seems to me that most of the same people are on both lists, and the topics almost always have consequences to both sides of the coin. So much so that it's often hard to pick *which* of the two (or both) lists you post too. Further confused by the fact that distutils is hopefully someday going to go away :) +1 > Not sure if there's some official process for requesting it or not, but I think we should merge the two lists and just make packaging-sig to umbrella the entire packaging topics. > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig From fred at fdrake.net Thu Mar 28 20:14:24 2013 From: fred at fdrake.net (Fred Drake) Date: Thu, 28 Mar 2013 15:14:24 -0400 Subject: [Catalog-sig] Merge catalog-sig and distutils-sig In-Reply-To: References: Message-ID: On Thu, Mar 28, 2013 at 2:22 PM, Donald Stufft wrote: > Is there much point in keeping catalog-sig and distutils-sig separate? No. The last time this was brought up, there were objections, but I don't remember what they were. I'll let people who think there's a point worry about that. > Not sure if there's some official process for requesting it or not, but > I think we should merge the two lists and just make packaging-sig to > umbrella the entire packaging topics. There is the meta-sig, but the description is out-dated: http://mail.python.org/mailman/listinfo/meta-sig and the last message in the archives is dated 2011, and sparked no discussion: http://mail.python.org/pipermail/meta-sig/2011-June.txt +1 on merging the lists. -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From qwcode at gmail.com Thu Mar 28 20:25:59 2013 From: qwcode at gmail.com (Marcus Smith) Date: Thu, 28 Mar 2013 12:25:59 -0700 Subject: [Catalog-sig] Merge catalog-sig and distutils-sig In-Reply-To: References: Message-ID: +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Thu Mar 28 20:39:38 2013 From: pje at telecommunity.com (PJ Eby) Date: Thu, 28 Mar 2013 15:39:38 -0400 Subject: [Catalog-sig] Merge catalog-sig and distutils-sig In-Reply-To: References: Message-ID: On Thu, Mar 28, 2013 at 3:14 PM, Fred Drake wrote: > On Thu, Mar 28, 2013 at 2:22 PM, Donald Stufft wrote: >> Is there much point in keeping catalog-sig and distutils-sig separate? > > No. > > The last time this was brought up, there were objections, but I don't > remember what they were. I'll let people who think there's a point > worry about that. > >> Not sure if there's some official process for requesting it or not, but >> I think we should merge the two lists and just make packaging-sig to >> umbrella the entire packaging topics. > > There is the meta-sig, but the description is out-dated: > > http://mail.python.org/mailman/listinfo/meta-sig > > and the last message in the archives is dated 2011, and sparked no > discussion: > > http://mail.python.org/pipermail/meta-sig/2011-June.txt > > +1 on merging the lists. Can we do it by just dropping catalog-sig and keeping distutils-sig? I'm afraid we might lose some important distutils-sig population if the process involves renaming the list, resubscribing, etc. I also *really* don't want to invalidate archive links to the distutils-sig archive. All in all, +1 on not having two lists, but I'm really worried about "breaking" distutils-sig. We're still going to be talking about "distribution utilities", after all. From donald at stufft.io Thu Mar 28 20:42:07 2013 From: donald at stufft.io (Donald Stufft) Date: Thu, 28 Mar 2013 15:42:07 -0400 Subject: [Catalog-sig] Merge catalog-sig and distutils-sig In-Reply-To: References: Message-ID: <3BF298C9-293D-40FF-A86F-76206A88D162@stufft.io> On Mar 28, 2013, at 3:39 PM, PJ Eby wrote: > On Thu, Mar 28, 2013 at 3:14 PM, Fred Drake wrote: >> On Thu, Mar 28, 2013 at 2:22 PM, Donald Stufft wrote: >>> Is there much point in keeping catalog-sig and distutils-sig separate? >> >> No. >> >> The last time this was brought up, there were objections, but I don't >> remember what they were. I'll let people who think there's a point >> worry about that. >> >>> Not sure if there's some official process for requesting it or not, but >>> I think we should merge the two lists and just make packaging-sig to >>> umbrella the entire packaging topics. >> >> There is the meta-sig, but the description is out-dated: >> >> http://mail.python.org/mailman/listinfo/meta-sig >> >> and the last message in the archives is dated 2011, and sparked no >> discussion: >> >> http://mail.python.org/pipermail/meta-sig/2011-June.txt >> >> +1 on merging the lists. > > Can we do it by just dropping catalog-sig and keeping distutils-sig? > I'm afraid we might lose some important distutils-sig population if > the process involves renaming the list, resubscribing, etc. I also > *really* don't want to invalidate archive links to the distutils-sig > archive. > > All in all, +1 on not having two lists, but I'm really worried about > "breaking" distutils-sig. We're still going to be talking about > "distribution utilities", after all. Don't care how it's done. I don't know Mailman enough to know what is possible or how easy things are. I thought packaging-sig sounded nice but if you can't rename + redirect or merge or something in mailman I'm down for whatever. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From donald at stufft.io Thu Mar 28 20:43:07 2013 From: donald at stufft.io (Donald Stufft) Date: Thu, 28 Mar 2013 15:43:07 -0400 Subject: [Catalog-sig] Merge catalog-sig and distutils-sig In-Reply-To: References: Message-ID: <3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io> On Mar 28, 2013, at 3:39 PM, PJ Eby wrote: > On Thu, Mar 28, 2013 at 3:14 PM, Fred Drake wrote: >> On Thu, Mar 28, 2013 at 2:22 PM, Donald Stufft wrote: >>> Is there much point in keeping catalog-sig and distutils-sig separate? >> >> No. >> >> The last time this was brought up, there were objections, but I don't >> remember what they were. I'll let people who think there's a point >> worry about that. >> >>> Not sure if there's some official process for requesting it or not, but >>> I think we should merge the two lists and just make packaging-sig to >>> umbrella the entire packaging topics. >> >> There is the meta-sig, but the description is out-dated: >> >> http://mail.python.org/mailman/listinfo/meta-sig >> >> and the last message in the archives is dated 2011, and sparked no >> discussion: >> >> http://mail.python.org/pipermail/meta-sig/2011-June.txt >> >> +1 on merging the lists. > > Can we do it by just dropping catalog-sig and keeping distutils-sig? > I'm afraid we might lose some important distutils-sig population if > the process involves renaming the list, resubscribing, etc. I also > *really* don't want to invalidate archive links to the distutils-sig > archive. > > All in all, +1 on not having two lists, but I'm really worried about > "breaking" distutils-sig. We're still going to be talking about > "distribution utilities", after all. Worst case I'm sure subscribers can be transferred and the existing archive kept intact. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From holger at merlinux.eu Thu Mar 28 21:04:19 2013 From: holger at merlinux.eu (holger krekel) Date: Thu, 28 Mar 2013 20:04:19 +0000 Subject: [Catalog-sig] Merge catalog-sig and distutils-sig In-Reply-To: <3BF298C9-293D-40FF-A86F-76206A88D162@stufft.io> References: <3BF298C9-293D-40FF-A86F-76206A88D162@stufft.io> Message-ID: <20130328200419.GM9677@merlinux.eu> On Thu, Mar 28, 2013 at 15:42 -0400, Donald Stufft wrote: > On Mar 28, 2013, at 3:39 PM, PJ Eby wrote: > > > On Thu, Mar 28, 2013 at 3:14 PM, Fred Drake wrote: > >> On Thu, Mar 28, 2013 at 2:22 PM, Donald Stufft wrote: > >>> Is there much point in keeping catalog-sig and distutils-sig separate? > >> > >> No. > >> > >> The last time this was brought up, there were objections, but I don't > >> remember what they were. I'll let people who think there's a point > >> worry about that. > >> > >>> Not sure if there's some official process for requesting it or not, but > >>> I think we should merge the two lists and just make packaging-sig to > >>> umbrella the entire packaging topics. > >> > >> There is the meta-sig, but the description is out-dated: > >> > >> http://mail.python.org/mailman/listinfo/meta-sig > >> > >> and the last message in the archives is dated 2011, and sparked no > >> discussion: > >> > >> http://mail.python.org/pipermail/meta-sig/2011-June.txt > >> > >> +1 on merging the lists. > > > > Can we do it by just dropping catalog-sig and keeping distutils-sig? > > I'm afraid we might lose some important distutils-sig population if > > the process involves renaming the list, resubscribing, etc. I also > > *really* don't want to invalidate archive links to the distutils-sig > > archive. > > > > All in all, +1 on not having two lists, but I'm really worried about > > "breaking" distutils-sig. We're still going to be talking about > > "distribution utilities", after all. > > Don't care how it's done. I don't know Mailman enough to know what is possible or how easy things are. I thought packaging-sig sounded nice but if you can't rename + redirect or merge or something in mailman I'm down for whatever. I've moved lists even from external sites to python.org and renamed them (latest was pytest-dev). That part works nicely and people can continue to use the old ML address. Merging two lists however makes it harder to get redirects for the old archives. But why not just keep distutils-sig and catalog-sig archives, but have all their mail arrive at a new packaging-sig and begin a new archive for the latter? holger > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig From dholth at gmail.com Thu Mar 28 21:08:44 2013 From: dholth at gmail.com (Daniel Holth) Date: Thu, 28 Mar 2013 16:08:44 -0400 Subject: [Catalog-sig] Merge catalog-sig and distutils-sig In-Reply-To: <20130328200419.GM9677@merlinux.eu> References: <3BF298C9-293D-40FF-A86F-76206A88D162@stufft.io> <20130328200419.GM9677@merlinux.eu> Message-ID: That should work. Sounds like a plan. On Thu, Mar 28, 2013 at 4:04 PM, holger krekel wrote: > On Thu, Mar 28, 2013 at 15:42 -0400, Donald Stufft wrote: >> On Mar 28, 2013, at 3:39 PM, PJ Eby wrote: >> >> > On Thu, Mar 28, 2013 at 3:14 PM, Fred Drake wrote: >> >> On Thu, Mar 28, 2013 at 2:22 PM, Donald Stufft wrote: >> >>> Is there much point in keeping catalog-sig and distutils-sig separate? >> >> >> >> No. >> >> >> >> The last time this was brought up, there were objections, but I don't >> >> remember what they were. I'll let people who think there's a point >> >> worry about that. >> >> >> >>> Not sure if there's some official process for requesting it or not, but >> >>> I think we should merge the two lists and just make packaging-sig to >> >>> umbrella the entire packaging topics. >> >> >> >> There is the meta-sig, but the description is out-dated: >> >> >> >> http://mail.python.org/mailman/listinfo/meta-sig >> >> >> >> and the last message in the archives is dated 2011, and sparked no >> >> discussion: >> >> >> >> http://mail.python.org/pipermail/meta-sig/2011-June.txt >> >> >> >> +1 on merging the lists. >> > >> > Can we do it by just dropping catalog-sig and keeping distutils-sig? >> > I'm afraid we might lose some important distutils-sig population if >> > the process involves renaming the list, resubscribing, etc. I also >> > *really* don't want to invalidate archive links to the distutils-sig >> > archive. >> > >> > All in all, +1 on not having two lists, but I'm really worried about >> > "breaking" distutils-sig. We're still going to be talking about >> > "distribution utilities", after all. >> >> Don't care how it's done. I don't know Mailman enough to know what is possible or how easy things are. I thought packaging-sig sounded nice but if you can't rename + redirect or merge or something in mailman I'm down for whatever. > > I've moved lists even from external sites to python.org and renamed them > (latest was pytest-dev). That part works nicely and people can continue > to use the old ML address. Merging two lists however makes it harder > to get redirects for the old archives. But why not just keep distutils-sig > and catalog-sig archives, but have all their mail arrive at > a new packaging-sig and begin a new archive for the latter? > > holger > > >> ----------------- >> Donald Stufft >> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA >> > > > >> _______________________________________________ >> Catalog-SIG mailing list >> Catalog-SIG at python.org >> http://mail.python.org/mailman/listinfo/catalog-sig > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig From nawkboy at gmail.com Thu Mar 28 20:57:09 2013 From: nawkboy at gmail.com (James Carpenter) Date: Thu, 28 Mar 2013 14:57:09 -0500 Subject: [Catalog-sig] How to determine if archive is an sdist or bdist Message-ID: Is there an easy way to programmatically tell if an archive (tar.gz, zip, etc.) in the dist directory is a binary or sdist? I would like to post-process the contents of a dist directory and classify each build artifact there (egg, sdist, bdist, etc.). Currently the only approach I know of is to have my own command that is run along with the relevant build command. For example: python setup.py sdist be_funky or: python setup.py sdist bdist bdist_egg be_funky Using this approach the tuples in self.distribution.dist_files provide the command, python version and file created. Unfortunately this solution is slightly more complicated in my use case than simply having an easy way to classify each build artifact and extract it's pkg-info. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Thu Mar 28 21:32:26 2013 From: pje at telecommunity.com (PJ Eby) Date: Thu, 28 Mar 2013 16:32:26 -0400 Subject: [Catalog-sig] Merge catalog-sig and distutils-sig In-Reply-To: <3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io> References: <3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io> Message-ID: On Thu, Mar 28, 2013 at 3:43 PM, Donald Stufft wrote: > On Mar 28, 2013, at 3:39 PM, PJ Eby wrote: >> Can we do it by just dropping catalog-sig and keeping distutils-sig? >> I'm afraid we might lose some important distutils-sig population if >> the process involves renaming the list, resubscribing, etc. I also >> *really* don't want to invalidate archive links to the distutils-sig >> archive. >> >> All in all, +1 on not having two lists, but I'm really worried about >> "breaking" distutils-sig. We're still going to be talking about >> "distribution utilities", after all. > > Worst case I'm sure subscribers can be transferred and the existing archive kept intact. That's a great way to have a bunch of people complaining that they never subscribed to packaging-sig, not to mention the part where it breaks everyone's mail filters. I really don't see any gains for renaming the list. It's not like we can go and scrub the entire internet of references to distutils-sig. From donald at stufft.io Thu Mar 28 21:32:16 2013 From: donald at stufft.io (Donald Stufft) Date: Thu, 28 Mar 2013 16:32:16 -0400 Subject: [Catalog-sig] Merge catalog-sig and distutils-sig In-Reply-To: <20130328200419.GM9677@merlinux.eu> References: <3BF298C9-293D-40FF-A86F-76206A88D162@stufft.io> <20130328200419.GM9677@merlinux.eu> Message-ID: <4BDF3B12-B394-4823-9186-73D1E742E78F@stufft.io> On Mar 28, 2013, at 4:04 PM, holger krekel wrote: > On Thu, Mar 28, 2013 at 15:42 -0400, Donald Stufft wrote: >> On Mar 28, 2013, at 3:39 PM, PJ Eby wrote: >> >>> On Thu, Mar 28, 2013 at 3:14 PM, Fred Drake wrote: >>>> On Thu, Mar 28, 2013 at 2:22 PM, Donald Stufft wrote: >>>>> Is there much point in keeping catalog-sig and distutils-sig separate? >>>> >>>> No. >>>> >>>> The last time this was brought up, there were objections, but I don't >>>> remember what they were. I'll let people who think there's a point >>>> worry about that. >>>> >>>>> Not sure if there's some official process for requesting it or not, but >>>>> I think we should merge the two lists and just make packaging-sig to >>>>> umbrella the entire packaging topics. >>>> >>>> There is the meta-sig, but the description is out-dated: >>>> >>>> http://mail.python.org/mailman/listinfo/meta-sig >>>> >>>> and the last message in the archives is dated 2011, and sparked no >>>> discussion: >>>> >>>> http://mail.python.org/pipermail/meta-sig/2011-June.txt >>>> >>>> +1 on merging the lists. >>> >>> Can we do it by just dropping catalog-sig and keeping distutils-sig? >>> I'm afraid we might lose some important distutils-sig population if >>> the process involves renaming the list, resubscribing, etc. I also >>> *really* don't want to invalidate archive links to the distutils-sig >>> archive. >>> >>> All in all, +1 on not having two lists, but I'm really worried about >>> "breaking" distutils-sig. We're still going to be talking about >>> "distribution utilities", after all. >> >> Don't care how it's done. I don't know Mailman enough to know what is possible or how easy things are. I thought packaging-sig sounded nice but if you can't rename + redirect or merge or something in mailman I'm down for whatever. > > I've moved lists even from external sites to python.org and renamed them > (latest was pytest-dev). That part works nicely and people can continue > to use the old ML address. Merging two lists however makes it harder > to get redirects for the old archives. But why not just keep distutils-sig > and catalog-sig archives, but have all their mail arrive at > a new packaging-sig and begin a new archive for the latter? > > holger > > >> ----------------- >> Donald Stufft >> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA >> > > > >> _______________________________________________ >> Catalog-SIG mailing list >> Catalog-SIG at python.org >> http://mail.python.org/mailman/listinfo/catalog-sig > sounds good to me. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From pje at telecommunity.com Thu Mar 28 21:36:22 2013 From: pje at telecommunity.com (PJ Eby) Date: Thu, 28 Mar 2013 16:36:22 -0400 Subject: [Catalog-sig] How to determine if archive is an sdist or bdist In-Reply-To: References: Message-ID: On Thu, Mar 28, 2013 at 3:57 PM, James Carpenter wrote: > Is there an easy way to programmatically tell if an archive (tar.gz, zip, > etc.) in the dist directory is a binary or sdist? I would like to > post-process the contents of a dist directory and classify each build > artifact there (egg, sdist, bdist, etc.). An sdist always has a single subdirectory in the archive's root directory, named for the package+version, and containing a PKG-INFO and setup.py (plus a bunch of other stuff). A bdist_dumb will not have such a subdirectory in the archive root; instead it will have one or more directories like /usr, /opt, /Program Files. Other bdist formats? Hard to say. From jacob at jacobian.org Thu Mar 28 22:15:56 2013 From: jacob at jacobian.org (Jacob Kaplan-Moss) Date: Thu, 28 Mar 2013 16:15:56 -0500 Subject: [Catalog-sig] Merge catalog-sig and distutils-sig In-Reply-To: References: <3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io> Message-ID: C'mon, folks, we're arguing about a name. That's about as close to literal bikeshedding as we could get. How about we just let whoever has the keys make the change in whatever way's easiest and most logical for them? Jacob From richard at python.org Thu Mar 28 22:42:06 2013 From: richard at python.org (Richard Jones) Date: Fri, 29 Mar 2013 08:42:06 +1100 Subject: [Catalog-sig] [Distutils] Merge catalog-sig and distutils-sig In-Reply-To: References: <3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io> Message-ID: I think I'm the only one on the list who probably would have objected but I'm on both now so whatever :-) Richard On 29 March 2013 07:32, PJ Eby wrote: > On Thu, Mar 28, 2013 at 3:43 PM, Donald Stufft wrote: >> On Mar 28, 2013, at 3:39 PM, PJ Eby wrote: >>> Can we do it by just dropping catalog-sig and keeping distutils-sig? >>> I'm afraid we might lose some important distutils-sig population if >>> the process involves renaming the list, resubscribing, etc. I also >>> *really* don't want to invalidate archive links to the distutils-sig >>> archive. >>> >>> All in all, +1 on not having two lists, but I'm really worried about >>> "breaking" distutils-sig. We're still going to be talking about >>> "distribution utilities", after all. >> >> Worst case I'm sure subscribers can be transferred and the existing archive kept intact. > > That's a great way to have a bunch of people complaining that they > never subscribed to packaging-sig, not to mention the part where it > breaks everyone's mail filters. > > I really don't see any gains for renaming the list. It's not like we > can go and scrub the entire internet of references to distutils-sig. > _______________________________________________ > Distutils-SIG maillist - Distutils-SIG at python.org > http://mail.python.org/mailman/listinfo/distutils-sig From donald at stufft.io Thu Mar 28 22:57:11 2013 From: donald at stufft.io (Donald Stufft) Date: Thu, 28 Mar 2013 17:57:11 -0400 Subject: [Catalog-sig] [Distutils] Merge catalog-sig and distutils-sig In-Reply-To: References: <3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io> Message-ID: On Mar 28, 2013, at 5:42 PM, Tres Seaver wrote: > Signed PGP part > On 03/28/2013 04:32 PM, PJ Eby wrote: > > On Thu, Mar 28, 2013 at 3:43 PM, Donald Stufft > > wrote: > >> On Mar 28, 2013, at 3:39 PM, PJ Eby wrote: > >>> Can we do it by just dropping catalog-sig and keeping > >>> distutils-sig? I'm afraid we might lose some important > >>> distutils-sig population if the process involves renaming the > >>> list, resubscribing, etc. I also *really* don't want to > >>> invalidate archive links to the distutils-sig archive. > >>> > >>> All in all, +1 on not having two lists, but I'm really worried > >>> about "breaking" distutils-sig. We're still going to be talking > >>> about "distribution utilities", after all. > >> > >> Worst case I'm sure subscribers can be transferred and the existing > >> archive kept intact. > > > > That's a great way to have a bunch of people complaining that they > > never subscribed to packaging-sig, not to mention the part where it > > breaks everyone's mail filters. > > > > I really don't see any gains for renaming the list. It's not like we > > can go and scrub the entire internet of references to distutils-sig. > > Not to mention breaking the gmane.org gateway, and those of us who sip > the firehose there instead of trying to swallow it via e-mail. > > > Tres. > - -- > =================================================================== > Tres Seaver +1 540-429-0999 tseaver at palladion.com > Palladion Software "Excellence by Design" http://palladion.com > > > _______________________________________________ > Distutils-SIG maillist - Distutils-SIG at python.org > http://mail.python.org/mailman/listinfo/distutils-sig This problem is inherent no matter what name is picked. GMane will need updated and some messages will need sent to tell people about the new name. No matter what at least one name isn't going to be used anymore. It's not that big of a deal. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From aclark at aclark.net Fri Mar 29 00:01:45 2013 From: aclark at aclark.net (Alex Clark) Date: Thu, 28 Mar 2013 19:01:45 -0400 Subject: [Catalog-sig] [Distutils] Merge catalog-sig and distutils-sig References: <3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io> Message-ID: On 2013-03-28 21:57:11 +0000, Donald Stufft said: > > On Mar 28, 2013, at 5:42 PM, Tres Seaver wrote: > >> Signed PGP part >> On 03/28/2013 04:32 PM, PJ Eby wrote: >>> On Thu, Mar 28, 2013 at 3:43 PM, Donald Stufft >>> wrote: >>>> On Mar 28, 2013, at 3:39 PM, PJ Eby wrote: >>>>> Can we do it by just dropping catalog-sig and keeping >>>>> distutils-sig? I'm afraid we might lose some important >>>>> distutils-sig population if the process involves renaming the >>>>> list, resubscribing, etc. I also *really* don't want to >>>>> invalidate archive links to the distutils-sig archive. >>>>> >>>>> All in all, +1 on not having two lists, but I'm really worried >>>>> about "breaking" distutils-sig. We're still going to be talking >>>>> about "distribution utilities", after all. >>>> >>>> Worst case I'm sure subscribers can be transferred and the existing >>>> archive kept intact. >>> >>> That's a great way to have a bunch of people complaining that they >>> never subscribed to packaging-sig, not to mention the part where it >>> breaks everyone's mail filters. >>> >>> I really don't see any gains for renaming the list. It's not like we >>> can go and scrub the entire internet of references to distutils-sig. >> >> Not to mention breaking the gmane.org gateway, and those of us who sip >> the firehose there instead of trying to swallow it via e-mail. >> >> >> Tres. >> - -- >> ==================================================================> >> Tres Seaver +1 540-429-0999 tseaver at palladion.com >> Palladion Software "Excellence by Design" http://palladion.com >> >> >> _______________________________________________ >> Distutils-SIG maillist - Distutils-SIG at python.org >> http://mail.python.org/mailman/listinfo/distutils-sig > > This problem is inherent no matter what name is picked. GMane will need > updated and some messages will need sent to tell people about the new > name. No matter what at least one name isn't going to be used anymore. > > It's not that big of a deal. FWIW: I am a GMANE-sipper and I'm willing to rejoin a new packaging-sig list (as well as register the new list with GMANE if no one else does). Seems to me another viable option is to simply turn off catalog-sig and distutils-sig (while preserving the archives forever, of course) and just start chatting on packaging-sig. Send an email to both lists "Last post, please join packaging-sig" and you are done. > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig -- Alex Clark ? http://about.me/alex.clark From pje at telecommunity.com Fri Mar 29 00:28:14 2013 From: pje at telecommunity.com (PJ Eby) Date: Thu, 28 Mar 2013 19:28:14 -0400 Subject: [Catalog-sig] Merge catalog-sig and distutils-sig In-Reply-To: References: <3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io> Message-ID: On Thu, Mar 28, 2013 at 5:15 PM, Jacob Kaplan-Moss wrote: > C'mon, folks, we're arguing about a name. That's about as close to > literal bikeshedding as we could get. I'm not arguing about the *name*. I just don't see the point in making everybody subscribe to a new list and change their mail filters (and update every book and webpage out there that mentions the distutils-sig), because a few people want to *change* the name -- a change that AFAICT doesn't actually provide any tangible benefit to anybody whatsoever. > How about we just let whoever has the keys make the change in whatever way's easiest and most logical for them? Because it's not up to just the person with the keys. Neither SIG is a mere mailing list, it's a Python special interest group, and SIGs have their own formation and termination processes. In particular, if you're going to start a new SIG, one of the requirements to be met is "in particular, no other SIG nor the general Python newsgroup is already more suitable" (per the Python SIG Creation Guidelines). It's hard to argue that distutils-sig isn't already more suitable than whatever is being proposed to take its place. From donald at stufft.io Fri Mar 29 00:45:55 2013 From: donald at stufft.io (Donald Stufft) Date: Thu, 28 Mar 2013 19:45:55 -0400 Subject: [Catalog-sig] Merge catalog-sig and distutils-sig In-Reply-To: References: <3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io> Message-ID: On Mar 28, 2013, at 7:28 PM, PJ Eby wrote: > On Thu, Mar 28, 2013 at 5:15 PM, Jacob Kaplan-Moss wrote: >> C'mon, folks, we're arguing about a name. That's about as close to >> literal bikeshedding as we could get. > > I'm not arguing about the *name*. I just don't see the point in > making everybody subscribe to a new list and change their mail filters > (and update every book and webpage out there that mentions the > distutils-sig), because a few people want to *change* the name -- a > change that AFAICT doesn't actually provide any tangible benefit to > anybody whatsoever. > > >> How about we just let whoever has the keys make the change in whatever way's easiest and most logical for them? > > Because it's not up to just the person with the keys. Neither SIG is > a mere mailing list, it's a Python special interest group, and SIGs > have their own formation and termination processes. > > In particular, if you're going to start a new SIG, one of the > requirements to be met is "in particular, no other SIG nor the general > Python newsgroup is already more suitable" (per the Python SIG > Creation Guidelines). It's hard to argue that distutils-sig isn't > already more suitable than whatever is being proposed to take its > place. A requirement for a SIG is also that it has a clear goal and a start and end date. distutils-sig's goal is the distutils module. And the "end date" requirements seems to be completely ignored anymore so arguing strict adherence to the rules seems to be a wash. I suggested packaging-sig because discussion jumps back and forth between distutils-sig and catalog-sig and neither name nor stated goal really reflected what the sig was actually about which was packaging in python in general. I also suggested packaging because it matched the other current sigs which are generic topics and not about a single module. But whatever, I hate the pointless duplication and just want to kill the overlap. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From dennis.coldwell at gmail.com Fri Mar 29 01:19:54 2013 From: dennis.coldwell at gmail.com (Dennis Coldwell) Date: Thu, 28 Mar 2013 17:19:54 -0700 Subject: [Catalog-sig] [Distutils] Merge catalog-sig and distutils-sig In-Reply-To: References: <3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io> Message-ID: > But whatever, I hate the pointless duplication and just want to kill the overlap. Agree, +1 to merging into one list. On Thu, Mar 28, 2013 at 4:45 PM, Donald Stufft wrote: > > On Mar 28, 2013, at 7:28 PM, PJ Eby wrote: > > > On Thu, Mar 28, 2013 at 5:15 PM, Jacob Kaplan-Moss > wrote: > >> C'mon, folks, we're arguing about a name. That's about as close to > >> literal bikeshedding as we could get. > > > > I'm not arguing about the *name*. I just don't see the point in > > making everybody subscribe to a new list and change their mail filters > > (and update every book and webpage out there that mentions the > > distutils-sig), because a few people want to *change* the name -- a > > change that AFAICT doesn't actually provide any tangible benefit to > > anybody whatsoever. > > > > > >> How about we just let whoever has the keys make the change in whatever > way's easiest and most logical for them? > > > > Because it's not up to just the person with the keys. Neither SIG is > > a mere mailing list, it's a Python special interest group, and SIGs > > have their own formation and termination processes. > > > > In particular, if you're going to start a new SIG, one of the > > requirements to be met is "in particular, no other SIG nor the general > > Python newsgroup is already more suitable" (per the Python SIG > > Creation Guidelines). It's hard to argue that distutils-sig isn't > > already more suitable than whatever is being proposed to take its > > place. > > A requirement for a SIG is also that it has a clear goal and a start and > end date. distutils-sig's goal is the distutils module. And the "end date" > requirements seems to be completely ignored anymore so arguing strict > adherence to the rules seems to be a wash. > > I suggested packaging-sig because discussion jumps back and forth between > distutils-sig and catalog-sig and neither name nor stated goal really > reflected what the sig was actually about which was packaging in python in > general. I also suggested packaging because it matched the other current > sigs which are generic topics and not about a single module. But whatever, > I hate the pointless duplication and just want to kill the overlap. > > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 > DCFA > > > _______________________________________________ > Distutils-SIG maillist - Distutils-SIG at python.org > http://mail.python.org/mailman/listinfo/distutils-sig > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Fri Mar 29 03:59:11 2013 From: barry at python.org (Barry Warsaw) Date: Thu, 28 Mar 2013 22:59:11 -0400 Subject: [Catalog-sig] [Distutils] Merge catalog-sig and distutils-sig In-Reply-To: References: Message-ID: <20130328225911.513250fe@anarchist> On Mar 28, 2013, at 02:22 PM, Donald Stufft wrote: >Is there much point in keeping catalog-sig and distutils-sig separate? Without yet reading the whole thread, I'll just mention that it's probably easier to just retire one or the other mailing lists and divert all discussion to the other one. Of course, the archives for the retired list would be retained for historical purposes. In fact, sigs are *supposed* to be periodically reviewed for renewal or retirement, though I think practically speaking we haven't done that in a very long time. If there's consensus on what you want to do, please contact postmaster@ and let them know. Let's say you just want to retire catalog-sig: we can set up forwards to distutils-sig and let the former be an "acceptable alias" to the latter so postings will be accepted when addressed to either. Of course, folks on the defunct list should manually subscribe to the good list (i.e. opt-in). -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From tseaver at palladion.com Fri Mar 29 04:45:52 2013 From: tseaver at palladion.com (Tres Seaver) Date: Thu, 28 Mar 2013 23:45:52 -0400 Subject: [Catalog-sig] [Distutils] Merge catalog-sig and distutils-sig In-Reply-To: References: <3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/28/2013 05:57 PM, Donald Stufft wrote: > > On Mar 28, 2013, at 5:42 PM, Tres Seaver > wrote: > >> On 03/28/2013 04:32 PM, PJ Eby wrote: >>> I really don't see any gains for renaming the list. It's not like >>> we can go and scrub the entire internet of references to >>> distutils-sig. >> >> Not to mention breaking the gmane.org gateway, and those of us who >> sip the firehose there instead of trying to swallow it via e-mail. > This problem is inherent no matter what name is picked. GMane will > need updated and some messages will need sent to tell people about the > new name. No matter what at least one name isn't going to be used > anymore. > > It's not that big of a deal. If we leave the main list the 'distutils-sig', and just announce that 'catalog-sig' is retired, folks who want to follow the new list just switch over. All the archives (mailman / gmane / etc.) stay valid, but the list goes into moderated mode. Creating a third list and retiring both the existing ones is extra hassle for no value, aside for a "cleanliness" issue on its name. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iEYEARECAAYFAlFVDmoACgkQ+gerLs4ltQ4yDQCfZCEUpSTIhQKNNDilIYIRc6Jj Fu0AoM6RKaflwbeek0VFGsX1USIzUhlC =gWkJ -----END PGP SIGNATURE----- From richard at python.org Fri Mar 29 10:47:48 2013 From: richard at python.org (Richard Jones) Date: Fri, 29 Mar 2013 20:47:48 +1100 Subject: [Catalog-sig] [Distutils] Merge catalog-sig and distutils-sig In-Reply-To: References: <3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io> Message-ID: On 29 March 2013 14:45, Tres Seaver wrote: > If we leave the main list the 'distutils-sig', and just announce that > 'catalog-sig' is retired, folks who want to follow the new list just > switch over. All the archives (mailman / gmane / etc.) stay valid, but > the list goes into moderated mode. Whoever has the power to do this, do it please. Richard From pje at telecommunity.com Fri Mar 29 19:54:22 2013 From: pje at telecommunity.com (PJ Eby) Date: Fri, 29 Mar 2013 14:54:22 -0400 Subject: [Catalog-sig] How to determine if archive is an sdist or bdist In-Reply-To: References: Message-ID: On Fri, Mar 29, 2013 at 11:00 AM, James Carpenter wrote: > Looks like the idea of using a custom command is a better approach then. I'm not sure why you think that. The only kinds of archives whose file types are ambiguous from the name, are sdist, bdist_dumb, and random raw source dumps. Everything else has a unique extension like .egg, .exe, .msi, rpm, etc. If you have a .zip, .tar.gz, .tgz, or some other archive name, you can find out if it's an sdist by inspecting its contents as I described. And if it's not an sdist, you can usually tell if it's a raw source dump by checking for a setup.py in the archive root or a depth-1 subdirectory off the root. (That's what easy_install does, anyway, when it's given an archive it doesn't know what to do with.) > > Is a custom command my only choice or can I register pre/post hooks to any > given command? > > > On Thu, Mar 28, 2013 at 3:36 PM, PJ Eby wrote: >> >> On Thu, Mar 28, 2013 at 3:57 PM, James Carpenter >> wrote: >> > Is there an easy way to programmatically tell if an archive (tar.gz, >> > zip, >> > etc.) in the dist directory is a binary or sdist? I would like to >> > post-process the contents of a dist directory and classify each build >> > artifact there (egg, sdist, bdist, etc.). >> >> An sdist always has a single subdirectory in the archive's root >> directory, named for the package+version, and containing a PKG-INFO >> and setup.py (plus a bunch of other stuff). >> >> A bdist_dumb will not have such a subdirectory in the archive root; >> instead it will have one or more directories like /usr, /opt, /Program >> Files. >> >> Other bdist formats? Hard to say. > > From ncoghlan at gmail.com Fri Mar 29 20:40:58 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 30 Mar 2013 05:40:58 +1000 Subject: [Catalog-sig] [Distutils] Merge catalog-sig and distutils-sig In-Reply-To: References: <3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io> Message-ID: On Fri, Mar 29, 2013 at 7:47 PM, Richard Jones wrote: > On 29 March 2013 14:45, Tres Seaver wrote: >> If we leave the main list the 'distutils-sig', and just announce that >> 'catalog-sig' is retired, folks who want to follow the new list just >> switch over. All the archives (mailman / gmane / etc.) stay valid, but >> the list goes into moderated mode. > > Whoever has the power to do this, do it please. +1 distutils-sig it is. We're expanding the charter to "the distutils standard library module, the Python Package Index and associated interoperabilty standards", but that's a lot easier than forcing everyone to rewrite their mail filters. Besides, it's gonna be a *long* time before the default build system in the standard library is anything other than distutils. Coupling the build system to the language release cycle has proven to be a *bad idea*, because the addition of new platform support needs to happen in a more timely fashion than language releases. The incorporation of pip bootstrapping into 3.4 will also make it a lot easier to recommend more readily upgraded alternatives. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From donald at stufft.io Fri Mar 29 20:43:06 2013 From: donald at stufft.io (Donald Stufft) Date: Fri, 29 Mar 2013 15:43:06 -0400 Subject: [Catalog-sig] [Distutils] Merge catalog-sig and distutils-sig In-Reply-To: References: <3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io> Message-ID: <7C8C91B8-46D7-4E74-9FEE-0E52F30F1132@stufft.io> On Mar 29, 2013, at 3:40 PM, Nick Coghlan wrote: > On Fri, Mar 29, 2013 at 7:47 PM, Richard Jones wrote: >> On 29 March 2013 14:45, Tres Seaver wrote: >>> If we leave the main list the 'distutils-sig', and just announce that >>> 'catalog-sig' is retired, folks who want to follow the new list just >>> switch over. All the archives (mailman / gmane / etc.) stay valid, but >>> the list goes into moderated mode. >> >> Whoever has the power to do this, do it please. > > +1 > > distutils-sig it is. We're expanding the charter to "the distutils > standard library module, the Python Package Index and associated > interoperabilty standards", but that's a lot easier than forcing > everyone to rewrite their mail filters. > > Besides, it's gonna be a *long* time before the default build system > in the standard library is anything other than distutils. Coupling the > build system to the language release cycle has proven to be a *bad > idea*, because the addition of new platform support needs to happen in > a more timely fashion than language releases. The incorporation of pip > bootstrapping into 3.4 will also make it a lot easier to recommend > more readily upgraded alternatives. > > Cheers, > Nick. > > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Distutils-SIG maillist - Distutils-SIG at python.org > http://mail.python.org/mailman/listinfo/distutils-sig Sounds good to me, whoever please to doing the needful. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From r1chardj0n3s at gmail.com Sat Mar 30 23:16:38 2013 From: r1chardj0n3s at gmail.com (Richard Jones) Date: Sun, 31 Mar 2013 09:16:38 +1100 Subject: [Catalog-sig] Shutting down catalog-sig Message-ID: Hi all, We're about to merge the catalog-sig and distutils-sig by just removing the catalog-sig mailing list. If you wish to remain in the discussions regarding Python package cataloging then please subscribe to the distutils SIG. The catalog SIG archives will remain, but the mailing list will be deleted and the SIG will be retired. There's no real timeframe but it will be happening imminently. Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard at python.org Sat Mar 30 23:20:15 2013 From: richard at python.org (Richard Jones) Date: Sun, 31 Mar 2013 09:20:15 +1100 Subject: [Catalog-sig] [Distutils] Merge catalog-sig and distutils-sig In-Reply-To: <7C8C91B8-46D7-4E74-9FEE-0E52F30F1132@stufft.io> References: <3280C8A6-FF28-4AE5-B509-B6C543371538@stufft.io> <7C8C91B8-46D7-4E74-9FEE-0E52F30F1132@stufft.io> Message-ID: I've set the wheels in motion. I just need a little help from the pydotorg volunteers (and some hits from the mailman cluebat). Richard On 30 March 2013 06:43, Donald Stufft wrote: > > On Mar 29, 2013, at 3:40 PM, Nick Coghlan wrote: > > > On Fri, Mar 29, 2013 at 7:47 PM, Richard Jones > wrote: > >> On 29 March 2013 14:45, Tres Seaver wrote: > >>> If we leave the main list the 'distutils-sig', and just announce that > >>> 'catalog-sig' is retired, folks who want to follow the new list just > >>> switch over. All the archives (mailman / gmane / etc.) stay valid, but > >>> the list goes into moderated mode. > >> > >> Whoever has the power to do this, do it please. > > > > +1 > > > > distutils-sig it is. We're expanding the charter to "the distutils > > standard library module, the Python Package Index and associated > > interoperabilty standards", but that's a lot easier than forcing > > everyone to rewrite their mail filters. > > > > Besides, it's gonna be a *long* time before the default build system > > in the standard library is anything other than distutils. Coupling the > > build system to the language release cycle has proven to be a *bad > > idea*, because the addition of new platform support needs to happen in > > a more timely fashion than language releases. The incorporation of pip > > bootstrapping into 3.4 will also make it a lot easier to recommend > > more readily upgraded alternatives. > > > > Cheers, > > Nick. > > > > > > -- > > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > _______________________________________________ > > Distutils-SIG maillist - Distutils-SIG at python.org > > http://mail.python.org/mailman/listinfo/distutils-sig > > Sounds good to me, whoever please to doing the needful. > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 > DCFA > > -------------- next part -------------- An HTML attachment was scrubbed... URL: