From chris at simplistix.co.uk Tue Jun 1 12:14:29 2010 From: chris at simplistix.co.uk (Chris Withers) Date: Tue, 01 Jun 2010 11:14:29 +0100 Subject: [Catalog-sig] PyPI down? Message-ID: <4C04DD85.8050501@simplistix.co.uk> Hi All, PyPI appears to not be responding. Anyone know why that is and when normal service might be resumed? cheers, Chris From simon at ikanobori.jp Tue Jun 1 13:20:41 2010 From: simon at ikanobori.jp (Simon de Vlieger) Date: Tue, 1 Jun 2010 13:20:41 +0200 Subject: [Catalog-sig] PyPI down? In-Reply-To: <4C04DD85.8050501@simplistix.co.uk> References: <4C04DD85.8050501@simplistix.co.uk> Message-ID: <36F7B98F-B26D-4451-8A06-BA5E86884E2E@ikanobori.jp> Chris, PyPi seemed to be unresponsive earlier during the day but currently it looks like normal service is resumed. Regards, Simon. On 1 jun 2010, at 12:14, Chris Withers wrote: > PyPI appears to not be responding. > > Anyone know why that is and when normal service might be resumed? From chris at simplistix.co.uk Tue Jun 1 13:41:36 2010 From: chris at simplistix.co.uk (Chris Withers) Date: Tue, 01 Jun 2010 12:41:36 +0100 Subject: [Catalog-sig] PyPI down? In-Reply-To: <36F7B98F-B26D-4451-8A06-BA5E86884E2E@ikanobori.jp> References: <4C04DD85.8050501@simplistix.co.uk> <36F7B98F-B26D-4451-8A06-BA5E86884E2E@ikanobori.jp> Message-ID: <4C04F1F0.2020309@simplistix.co.uk> Simon de Vlieger wrote: > PyPi seemed to be unresponsive earlier during the day but currently it > looks like normal service is resumed. Indeed, it would be good to know what was done to resolve it and by whom ;-) Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From jannis at leidel.info Tue Jun 1 13:08:17 2010 From: jannis at leidel.info (Jannis Leidel) Date: Tue, 1 Jun 2010 13:08:17 +0200 Subject: [Catalog-sig] PyPI down? In-Reply-To: <4C04DD85.8050501@simplistix.co.uk> References: <4C04DD85.8050501@simplistix.co.uk> Message-ID: Am 01.06.2010 um 12:14 schrieb Chris Withers: > Hi All, > > PyPI appears to not be responding. > > Anyone know why that is and when normal service might be resumed? It would certainly be nice to know that, indeed. What is the official policy with regard to maintainance and failover? I know there is a bigger issue (PEP 381) but in case it's just a matter of manpower to restart the server or kick apache once in a while, I'd happily volunteer to help out. Jannis From martin at v.loewis.de Tue Jun 1 22:59:46 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 01 Jun 2010 22:59:46 +0200 Subject: [Catalog-sig] PyPI down? In-Reply-To: <4C04F1F0.2020309@simplistix.co.uk> References: <4C04DD85.8050501@simplistix.co.uk> <36F7B98F-B26D-4451-8A06-BA5E86884E2E@ikanobori.jp> <4C04F1F0.2020309@simplistix.co.uk> Message-ID: <4C0574C2.4070804@v.loewis.de> Am 01.06.2010 13:41, schrieb Chris Withers: > Simon de Vlieger wrote: >> PyPi seemed to be unresponsive earlier during the day but currently it >> looks like normal service is resumed. > > Indeed, it would be good to know what was done to resolve it and by whom > ;-) I restarted Apache. Regards, Martin From chris at simplistix.co.uk Wed Jun 2 09:31:48 2010 From: chris at simplistix.co.uk (Chris Withers) Date: Wed, 02 Jun 2010 08:31:48 +0100 Subject: [Catalog-sig] PyPI down? In-Reply-To: <4C0574C2.4070804@v.loewis.de> References: <4C04DD85.8050501@simplistix.co.uk> <36F7B98F-B26D-4451-8A06-BA5E86884E2E@ikanobori.jp> <4C04F1F0.2020309@simplistix.co.uk> <4C0574C2.4070804@v.loewis.de> Message-ID: <4C0608E4.3000005@simplistix.co.uk> Martin v. L?wis wrote: >> Indeed, it would be good to know what was done to resolve it and by whom >> ;-) > > I restarted Apache. Any idea what had brought it down? Were there lots of worker threads? High CPU usage? Memory starvation? Does the database that backs PyPI live on the same machine? cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Fri Jun 11 12:44:07 2010 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 11 Jun 2010 11:44:07 +0100 Subject: [Catalog-sig] PyPI down again... Message-ID: <4C121377.4000008@simplistix.co.uk> ...would be good to know what brought it down before and what has brought it down again. As an interim solution, what do I need to do to get access to the box running PyPI so I can get in and investigate/restart Apache? cheers, Chris From mal at egenix.com Fri Jun 11 12:48:55 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 11 Jun 2010 12:48:55 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C121377.4000008@simplistix.co.uk> References: <4C121377.4000008@simplistix.co.uk> Message-ID: <4C121497.5040806@egenix.com> Chris Withers wrote: > ...would be good to know what brought it down before and what has > brought it down again. It works for me. > As an interim solution, what do I need to do to get access to the box > running PyPI so I can get in and investigate/restart Apache? Since PyPI is a rather essential Python resource, is there some monitoring in place to automatically notify the webmasters ? Something like e.g. a Zenoss instance checking whether PyPI is pingable. If not, we'd need to address this in the PSF infrastructure committee. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 11 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 37 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From chris at simplistix.co.uk Fri Jun 11 12:50:46 2010 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 11 Jun 2010 11:50:46 +0100 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C121497.5040806@egenix.com> References: <4C121377.4000008@simplistix.co.uk> <4C121497.5040806@egenix.com> Message-ID: <4C121506.9070708@simplistix.co.uk> M.-A. Lemburg wrote: > Chris Withers wrote: >> ...would be good to know what brought it down before and what has >> brought it down again. > > It works for me. Yes, I guess someone went in and did something. Given that the topic in #python says its down and a couple of those "down for me or everyone" websites all confirmed it when I was having problems... >> As an interim solution, what do I need to do to get access to the box >> running PyPI so I can get in and investigate/restart Apache? > > Since PyPI is a rather essential Python resource, is there some > monitoring in place to automatically notify the webmasters ? Good question... > Something like e.g. a Zenoss instance checking whether PyPI is > pingable. Hmm, not enough. I suspect the box would have been pingable, it's just the web app that is getting wedged... Chris From marrakis at gmail.com Fri Jun 11 12:50:52 2010 From: marrakis at gmail.com (Mathieu Leduc-Hamel) Date: Fri, 11 Jun 2010 12:50:52 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C121377.4000008@simplistix.co.uk> References: <4C121377.4000008@simplistix.co.uk> Message-ID: Up again... For sure some monitoring and logging informations would be great. I'm working right now to test the code validity with unittests and after that I would like implemented a couple of new functionalities like that. Who is responsible of the project and the maintenance ? I was starting to work on pypi with tarek ziade, to implement distutils 2 new metadata, and I'm completely focus on quality and things like that. On Fri, Jun 11, 2010 at 12:44 PM, Chris Withers wrote: > ...would be good to know what brought it down before and what has brought > it down again. > > As an interim solution, what do I need to do to get access to the box > running PyPI so I can get in and investigate/restart Apache? > > cheers, > > Chris > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Fri Jun 11 20:17:56 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Fri, 11 Jun 2010 20:17:56 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> Message-ID: <4C127DD4.5010801@v.loewis.de> > Who is responsible of the project and the maintenance ? I am. Regards, Martin From justin.ryan at reliefgarden.org Fri Jun 11 22:09:30 2010 From: justin.ryan at reliefgarden.org (Justin Ryan) Date: Fri, 11 Jun 2010 13:09:30 -0700 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C127DD4.5010801@v.loewis.de> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> Message-ID: Is it possible it's time to designate a team? I'm sure everyone appreciates the hard work of a lone volunteer, but having been one myself at times, the feeling that others may not do the job right is often eclipsed by their availability to try. It seems like for months if not years, those of us relying on PyPI for day-to-day use, esp for deployments and developer environments like buildout, run into issues where we simply can't work for a significant part of a day. What's up with this years-old PEP for expanding the PyPI infrastructure? Are there resources, relationships, volunteers lacking? What can we do to help? :) Best, J On Fri, Jun 11, 2010 at 11:17 AM, "Martin v. L?wis" wrote: >> Who is responsible of the project and the maintenance ? > > I am. > From martin at v.loewis.de Fri Jun 11 22:56:04 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 11 Jun 2010 22:56:04 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> Message-ID: <4C12A2E4.2090305@v.loewis.de> > Is it possible it's time to designate a team? I'm sure everyone > appreciates the hard work of a lone volunteer, but having been one > myself at times, the feeling that others may not do the job right is > often eclipsed by their availability to try. Help is certainly appreciated. The type of help depends on the volunteer, of course. E.g. I wouldn't want to give root accounts to the first person that comes along and asks for them (except when the first person is Jannis Leidel, who (I believe) did the Apache restart today). > What's up with this years-old PEP for expanding the PyPI > infrastructure? Are there resources, relationships, volunteers > lacking? > > What can we do to help? :) If you are willing to invest *a lot* of time, then it seems that rewriting PyPI in Django would make a lot of people happy, because they claim they can't contribute to the current code base because they don't understand that. I don't want to do such a rewrite on my own because I *do* understand the code base (despite not having written it in the first place, so I think that if you really want to contribute, you can learn how it works); it also violates Joel Spolsky's principle of never ever doing rewrites. It will be a lot of work because it must implement full compatibility with the current code, which I can promise will keep you busy. Full compatibility is primarily defined in terms of URLs that people may have put on the web and into Google, and URLs and API that setuptools back to very old releases may use. That said, I have no idea what is causing the current outages. There must be some secret ping of death or something that somebody discovered. For a smaller project, start putting mirror support into setuptools or distribute; this would make short (several hours) outages less severe for the class of users that want permanent availability for downloading. It's unlikely that the mirrors would break when the master goes down; they just stop mirroring. Regards, Martin From justin.ryan at reliefgarden.org Fri Jun 11 23:05:19 2010 From: justin.ryan at reliefgarden.org (Justin Ryan) Date: Fri, 11 Jun 2010 14:05:19 -0700 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C12A2E4.2090305@v.loewis.de> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> Message-ID: On Fri, Jun 11, 2010 at 1:56 PM, "Martin v. L?wis" wrote: ... > Help is certainly appreciated. The type of help depends on the volunteer, of > course. E.g. I wouldn't want to give root accounts to > the first person that comes along and asks for them (except when the first > person is Jannis Leidel, who (I believe) did the Apache restart > today). > Thanks, Jannis. :) >> What's up with this years-old PEP for expanding the PyPI >> infrastructure? ?Are there resources, relationships, volunteers >> lacking? >> >> What can we do to help? :) > > If you are willing to invest *a lot* of time, then it seems that rewriting > PyPI in Django would make a lot of people happy, because > they claim they can't contribute to the current code base because > they don't understand that. I don't want to do such a rewrite on > my own because I *do* understand the code base (despite not having written > it in the first place, so I think that if you really want > to contribute, you can learn how it works); it also violates Joel > Spolsky's principle of never ever doing rewrites. I'll avoid deep comments about my general feelings about Django here. ;) What is it now, just a straight WSGI app? > It will be a lot of work because it must implement full compatibility with > the current code, which I can promise will keep you busy. Full > compatibility is primarily defined in terms of URLs that people may > have put on the web and into Google, and URLs and API that setuptools > back to very old releases may use. Sure.. > That said, I have no idea what is causing the current outages. There must be > some secret ping of death or something that somebody discovered. If you want to give me a shell that can just access ps and top for now, read-only access to log files, I can try and put some time into keeping an eye. > For a smaller project, start putting mirror support into setuptools or > distribute; this would make short (several hours) outages less severe for > the class of users that want permanent availability for downloading. > It's unlikely that the mirrors would break when the master goes down; > they just stop mirroring. That's a really great idea. I try to use egg caches in buildout and the -N option to not look for the newest of everything all the time, but I think it needs a bit of work as well. We also have support for alternate download targets in buildout, but it seems the failure mode when PyPI is down is weak. So, there's definitely two sides to this, we all need to be gentler and calmer users of PyPI, and we all need it to work more. And anyone putting time into restarting Apache probably wants to stop doing that. :) Anyone interested in helping to add mirror support to distribute? I suspect it is distribute / setuptools which are tied to the poor failure mode I'm encountering with zc.buildout. Best! J From mal at egenix.com Fri Jun 11 23:06:21 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 11 Jun 2010 23:06:21 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C12A2E4.2090305@v.loewis.de> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> Message-ID: <4C12A54D.1070406@egenix.com> "Martin v. L?wis" wrote: > For a smaller project, start putting mirror support into setuptools or > distribute; this would make short (several hours) outages less severe > for the class of users that want permanent availability for downloading. > It's unlikely that the mirrors would break when the master goes down; > they just stop mirroring. A better and cleaner strategy is to put the static PyPI information up on Amazon Cloudscape and have DNS take care of providing local mirrors (edge servers) to setuptools et al. Such a setup won't require any complicated mirror logic in any of the existing client tools. By moving the PyPI installation to Amazon AWS, we could also get the RPC access distributed to more than just one server. As I said before, the PSF infrastructure committee needs to get on of the job of getting this implemented (including funding this development). If someone wants to volunteer helping with the setup, please contact the PSF at psf at python.org. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 11 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 37 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ziade.tarek at gmail.com Fri Jun 11 23:07:03 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Fri, 11 Jun 2010 23:07:03 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C12A2E4.2090305@v.loewis.de> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> Message-ID: On Fri, Jun 11, 2010 at 10:56 PM, "Martin v. L?wis" wrote: >> Is it possible it's time to designate a team? ?I'm sure everyone >> appreciates the hard work of a lone volunteer, but having been one >> myself at times, the feeling that others may not do the job right is >> often eclipsed by their availability to try. > > Help is certainly appreciated. The type of help depends on the volunteer, of > course. E.g. I wouldn't want to give root accounts to > the first person that comes along and asks for them (except when the first > person is Jannis Leidel, who (I believe) did the Apache restart > today). > >> What's up with this years-old PEP for expanding the PyPI >> infrastructure? ?Are there resources, relationships, volunteers >> lacking? >> >> What can we do to help? :) > > If you are willing to invest *a lot* of time, then it seems that rewriting > PyPI in Django would make a lot of people happy, because > they claim they can't contribute to the current code base because > they don't understand that. I don't want to do such a rewrite on > my own because I *do* understand the code base (despite not having written > it in the first place, so I think that if you really want > to contribute, you can learn how it works); it also violates Joel > Spolsky's principle of never ever doing rewrites. -1 PyPI code is evolving. I've added with the help of Mathieu PEP 345 support, and we have more stuff coming up. Mathieu has also invested quite some time lately to write functional tests in PyPI and split the huge web.py module in sevral modules for clarity I was planning to review it and ask you before I would merge it to trunk. I think we need to make a difference here between the development of the PyPI codebase and the sysadmin work Tarek -- Tarek Ziad? | http://ziade.org From ziade.tarek at gmail.com Fri Jun 11 23:11:35 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Fri, 11 Jun 2010 23:11:35 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C12A54D.1070406@egenix.com> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> Message-ID: On Fri, Jun 11, 2010 at 11:06 PM, M.-A. Lemburg wrote: > "Martin v. L?wis" wrote: >> For a smaller project, start putting mirror support into setuptools or >> distribute; this would make short (several hours) outages less severe >> for the class of users that want permanent availability for downloading. >> It's unlikely that the mirrors would break when the master goes down; >> they just stop mirroring. > > A better and cleaner strategy is to put the static PyPI information > up on Amazon Cloudscape and have DNS take care of providing local > mirrors (edge servers) to setuptools et al. > > Such a setup won't require any complicated mirror logic in any > of the existing client tools. > > By moving the PyPI installation to Amazon AWS, we could also > get the RPC access distributed to more than just one server. > > As I said before, the PSF infrastructure committee needs to get on > of the job of getting this implemented (including funding this > development). > > If someone wants to volunteer helping with the setup, please contact > the PSF at psf at python.org. What about continuing the work that was started last year ? (and not finished due to a lack of time) There's a PEP we have started about a mirroring infrastructure: http://www.python.org/dev/peps/pep-0381/ Some of its parts are already implemented in PyPI, and what we need now is to work on the client side (pip, distribute, etc) and bootstrap one or two mirrors using the protocol. Regards Tarek From martin at v.loewis.de Sat Jun 12 00:27:46 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 12 Jun 2010 00:27:46 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> Message-ID: <4C12B862.9000503@v.loewis.de> > What is it now, just a straight WSGI app? No, FCGI. > If you want to give me a shell that can just access ps and top for > now, read-only access to log files, I can try and put some time into > keeping an eye. Sorry, no: I don't know you at all. Regards, Martin From martin at v.loewis.de Sat Jun 12 00:38:25 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 12 Jun 2010 00:38:25 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> Message-ID: <4C12BAE1.3040206@v.loewis.de> >> If you are willing to invest *a lot* of time, then it seems that rewriting >> PyPI in Django would make a lot of people happy > > -1 > > PyPI code is evolving. I've added with the help of Mathieu PEP 345 support, > and we have more stuff coming up. I can understand why you are opposed: for the same reason I don't want to lead such a project. We both have invested time into the PyPI code base, and I disagree with all the complaints I heard about it being incomprehensible. The fact remains that people continue to consider the code incomprehensible, and that those very people claimed that they would prefer if some other web framework was used, specifically Django. I know Richard Jones is also in favor of a rewrite in Django. I can also understand that Zope fans might be upset by the prospect of having to use Django; to those, I'd say "get over it". > I think we need to make a difference here between the development of > the PyPI codebase and the sysadmin work Most definitely. Regards, Martin From justin.ryan at reliefgarden.org Sat Jun 12 00:39:50 2010 From: justin.ryan at reliefgarden.org (Justin Ryan) Date: Fri, 11 Jun 2010 15:39:50 -0700 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C12B862.9000503@v.loewis.de> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12B862.9000503@v.loewis.de> Message-ID: On Fri, Jun 11, 2010 at 3:27 PM, "Martin v. L?wis" wrote: >> If you want to give me a shell that can just access ps and top for >> now, read-only access to log files, I can try and put some time into >> keeping an eye. > > Sorry, no: I don't know you at all. > I know that, I wasn't asking for today. I had access to the main plone.org box at some point, siggraph.org, acm.org, so maybe we should get to know each other. From justin.ryan at reliefgarden.org Sat Jun 12 21:48:23 2010 From: justin.ryan at reliefgarden.org (Justin Ryan) Date: Sat, 12 Jun 2010 12:48:23 -0700 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C13450A.9050104@v.loewis.de> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12B862.9000503@v.loewis.de> <4C12C07F.6000706@v.loewis.de> <4C1338A6.1020601@v.loewis.de> <4C13450A.9050104@v.loewis.de> Message-ID: Thanks, Martin, for taking the conversation offline to be a real jerk. ;) On Sat, Jun 12, 2010 at 1:27 AM, "Martin v. L?wis" wrote: >> The question, I think, is what steps can we take to begin alleviating >> each other's worries? >> >> I understand I'm a bit vague, I'm just trying to raise my hand and say >> hey, let me volunteer. > > Ok, so write code. > I was looking for guidance as to what to do. This common response really pisses me off. It's unclear exactly what code needs to be written. >> I think Chris Withers and some others have been doing same. ?People >> say to email psf, and then when I do, that it's inappropriate, I think >> we just want some direction on contribution we can make that won't >> disappear into politic. > > I'm not quite sure what specific problem you want to solve. Or, if it's not > a specific problem, what general problem you want to solve. > PyPI is fucking down all the time you nincompoop. > My observation over the years is this: everything works fine for some time, > and there are *zero* contributors. Then, a small problem occurs, > and people offer help and demand drastic changes. Then the problem gets > solved, and people disappear again. > So, turn away help, because we can all go to hell. I've seen a problem for a year and I joined the catalog-sig for other reasons, but find that people ask questions which aren't answered. Your attitude is very much like a senior employee who is about to get fired for being unpalatable. But you can't be fired by the community, so you'll continue to reign and noone should offer to help because someone you thought would help in the past didn't. >> Question is, I guess: >> >> ?What, exactly, should I do? >> >> Why should I be directing the PyPI leader? > > I thought you had a proposal on how to solve the problem at hand. > So I wasn't asking for direction, but for advise. > Like most people in all general e-mail communication of the world, you didn't read the thread closely enough to determine who said what. I responded to a proposal by someone who you completely ignored. I can see what is wrong with PyPI, Martin. I think it's painfully clear. Anyway, I won't offer to help ever again, promise. I'll just complain until you fix things. Peace, Love, and Go to Hell. From martin at v.loewis.de Sun Jun 13 01:29:03 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 13 Jun 2010 01:29:03 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12B862.9000503@v.loewis.de> <4C12C07F.6000706@v.loewis.de> <4C1338A6.1020601@v.loewis.de> <4C13450A.9050104@v.loewis.de> Message-ID: <4C14183F.7010006@v.loewis.de> >> Ok, so write code. >> > > I was looking for guidance as to what to do. > > This common response really pisses me off. It's unclear exactly what > code needs to be written. The one I proposed to write: add mirroring support to setuptools and distribute. > PyPI is fucking down all the time you nincompoop. Never heard that term before... In any case, I don't think this is factually correct. > But you can't be fired by the community, so you'll continue to reign > and noone should offer to help because someone you thought would help > in the past didn't. So prove me wrong. Actually do start helping, instead of insulting. Regards, Martin From guido at python.org Sun Jun 13 06:32:56 2010 From: guido at python.org (Guido van Rossum) Date: Sat, 12 Jun 2010 21:32:56 -0700 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12B862.9000503@v.loewis.de> <4C12C07F.6000706@v.loewis.de> <4C1338A6.1020601@v.loewis.de> <4C13450A.9050104@v.loewis.de> Message-ID: On Sat, Jun 12, 2010 at 12:48 PM, Justin Ryan wrote: > Thanks, Martin, for taking the conversation offline to be a real jerk. ;) (I won't quote more. Everyone who read it is still reeling from the sudden outburst.) Justin, go wash your mouth with soap. You may be used to this kind of language in other places, but it is inappropriate here and you will not get the respect or guidance you are seeking by swearing or insulting people. BTW there is no way I can understand your use of the smiley here. -- --Guido van Rossum (python.org/~guido) From ubernostrum at gmail.com Sun Jun 13 07:10:26 2010 From: ubernostrum at gmail.com (James Bennett) Date: Sun, 13 Jun 2010 00:10:26 -0500 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C12B862.9000503@v.loewis.de> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12B862.9000503@v.loewis.de> Message-ID: On Fri, Jun 11, 2010 at 5:27 PM, "Martin v. L?wis" wrote: >> What is it now, just a straight WSGI app? > > No, FCGI. Statements like this lead me to believe that ignoring Joel Spolsky would be the right thing to do. Right now the PyPI codebase seems to have a bus number[1] of one: Martin, who is apparently the only person who really understands the code well enough to do significant work on it. This is something which could be remedied by having more people learn the code and get familiar enough with it to make contributions, but that's complicated by the fact that PyPI still does so much basically from scratch -- it doesn't even use the standard gateway interface Python web developers are expected to be familiar with, much less any well-known libraries. As such, just having people learn the code doesn't seem like a great option; for one thing, existing knowledge of Python web development isn't transferrable to PyPI, and working on PyPI isn't transferrable to anything else a Python web developer would be doing, and so it's unlikely that many, if any, people would be sufficiently motivated. Which points to rewriting as the best option, resulting in greater innate maintainability and a larger community of potential contributors. As to *what* it should be rewritten with, I frankly don't care so long as it's something reasonably well-known and well-understood within the broader Python web community, and speaks WSGI (which is essentially the same thing, but it needs to be said). That gives all sorts of options, from a lightweight stack on something like Werkzeug all the way up to a full framework solution with something like Pylons. To avoid the perennial holy wars that choice seems to engender, though, I'd suggest just asking Martin to pick something he feels he'd be comfortable with, and having everyone else who wants to help shut up and go with his choice. -- "Bureaucrat Conrad, you are technically correct -- the best kind of correct." From martin at v.loewis.de Sun Jun 13 11:18:36 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 13 Jun 2010 11:18:36 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12B862.9000503@v.loewis.de> Message-ID: <4C14A26C.2050804@v.loewis.de> > Right now the PyPI codebase seems to have a bus number[1] of one: > Martin, who is apparently the only person who really understands the > code well enough to do significant work on it. This is something which > could be remedied by having more people learn the code and get > familiar enough with it to make contributions, but that's complicated > by the fact that PyPI still does so much basically from scratch -- it > doesn't even use the standard gateway interface Python web developers > are expected to be familiar with, much less any well-known libraries. There are several ways to run PyPI, including WSGI, FCGI, CGI, and a stand-alone server. The mode which is used on PyPI just happens to be FCGI. I'm not sure how the integration with Apache matters - the actual code generating web pages is the same all the time, no matter what gateway interface is being used. As for the bus number: Richard Jones is also familiar with the code, as he wrote it in the first place. He just didn't contribute much lately. I believe Tarek is also knowledgable. So the bus factor is rather 3. > As to *what* it should be rewritten with, I frankly don't care so long > as it's something reasonably well-known and well-understood within the > broader Python web community, and speaks WSGI (which is essentially > the same thing, but it needs to be said). I don't really want to "sell" the code base, but just for the record: It's written "in" WSGI, Zope Page Templates, and Postgres. These are all things that are well-understood in the Python web community. > That gives all sorts of > options, from a lightweight stack on something like Werkzeug all the > way up to a full framework solution with something like Pylons. To > avoid the perennial holy wars that choice seems to engender, though, > I'd suggest just asking Martin to pick something he feels he'd be > comfortable with, and having everyone else who wants to help shut up > and go with his choice. It would be really up to Richard Jones, and he said he would prefer Django; so do I. Regards, Martin From g.brandl at gmx.net Sun Jun 13 12:09:32 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 13 Jun 2010 12:09:32 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12B862.9000503@v.loewis.de> Message-ID: Am 13.06.2010 07:10, schrieb James Bennett: > On Fri, Jun 11, 2010 at 5:27 PM, "Martin v. L?wis" wrote: >>> What is it now, just a straight WSGI app? >> >> No, FCGI. > > Statements like this lead me to believe that ignoring Joel Spolsky > would be the right thing to do. > > Right now the PyPI codebase seems to have a bus number[1] of one: > Martin, who is apparently the only person who really understands the > code well enough to do significant work on it. JFTR, I had a look at the code at PyCon last year and I could find my way around it quite quickly. It's not like PyPI is such a big codebase that you need a year to get familiar with it. This is of course not an argument against a rewrite, but the situation is certainly not as gloomy as it is painted here from time to time. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From solipsis at pitrou.net Sun Jun 13 13:20:05 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 13 Jun 2010 11:20:05 +0000 (UTC) Subject: [Catalog-sig] PyPI down again... References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12B862.9000503@v.loewis.de> <4C14A26C.2050804@v.loewis.de> Message-ID: Martin v. L?wis v.loewis.de> writes: > > I don't really want to "sell" the code base, but just for the record: > It's written "in" WSGI, Zope Page Templates, and Postgres. These are > all things that are well-understood in the Python web community. > [...] > > It would be really up to Richard Jones, and he said he would prefer > Django; so do I. I'm saying this from (far) outside the playground and am not intending to contribute, so just take this as a suggestion, but: if it has to be rewritten, how about doing in Python 3? Regards Antoine. From mal at egenix.com Sun Jun 13 14:49:57 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 13 Jun 2010 14:49:57 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12B862.9000503@v.loewis.de> Message-ID: <4C14D3F5.9030001@egenix.com> James Bennett wrote: > On Fri, Jun 11, 2010 at 5:27 PM, "Martin v. L?wis" wrote: >>> What is it now, just a straight WSGI app? >> >> No, FCGI. > > Statements like this lead me to believe that ignoring Joel Spolsky > would be the right thing to do. > > Right now the PyPI codebase seems to have a bus number[1] of one: > Martin, who is apparently the only person who really understands the > code well enough to do significant work on it. This is something which > could be remedied by having more people learn the code and get > familiar enough with it to make contributions, but that's complicated > by the fact that PyPI still does so much basically from scratch -- it > doesn't even use the standard gateway interface Python web developers > are expected to be familiar with, much less any well-known libraries. > > As such, just having people learn the code doesn't seem like a great > option; for one thing, existing knowledge of Python web development > isn't transferrable to PyPI, and working on PyPI isn't transferrable > to anything else a Python web developer would be doing, and so it's > unlikely that many, if any, people would be sufficiently motivated. > Which points to rewriting as the best option, resulting in greater > innate maintainability and a larger community of potential > contributors. Why don't you just start such a project, flesh out the details, use the existing PyPI as reference for the APIs and then propose that we use the new code for running PyPI ? I think that if someone wants to do a rewrite it's best to just let them decide about the choice of technology. Even if it doesn't get used for PyPI in the end, it will still be a alternative choice for local PyPI-style indexes for projects like Zope or Plone to use, so work is not lost. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 13 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 35 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Sun Jun 13 14:58:38 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 13 Jun 2010 14:58:38 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12B862.9000503@v.loewis.de> <4C12C07F.6000706@v.loewis.de> <4C1338A6.1020601@v.loewis.de> <4C13450A.9050104@v.loewis.de> Message-ID: <4C14D5FE.1070909@egenix.com> Justin Ryan wrote: > [...lots of disrespectful and rude words...] Justin, you just disqualified yourself from being accepted as a respected member of this group. I think Martin deserves a public apology from you. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 13 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 35 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From tseaver at palladion.com Sun Jun 13 15:05:41 2010 From: tseaver at palladion.com (Tres Seaver) Date: Sun, 13 Jun 2010 09:05:41 -0400 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12B862.9000503@v.loewis.de> <4C14A26C.2050804@v.loewis.de> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Antoine Pitrou wrote: > Martin v. L?wis v.loewis.de> writes: >> I don't really want to "sell" the code base, but just for the record: >> It's written "in" WSGI, Zope Page Templates, and Postgres. These are >> all things that are well-understood in the Python web community. >> > [...] >> It would be really up to Richard Jones, and he said he would prefer >> Django; so do I. > > I'm saying this from (far) outside the playground and am not intending to > contribute, so just take this as a suggestion, but: if it has to be rewritten, > how about doing in Python 3? Such a choice would be contrary to the goal of keeping it in the "well known Python web technologies" swimlane, to ease support by folks already familiar with WSGI, etc.: none of the libraries / frameworks are ported yet. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwU16UACgkQ+gerLs4ltQ6opQCgqZj2gM6W/2YxJYYx8rO6Tb1Q 0/kAn07E7MPnUu3sCmFIIW+u+a2GXf3c =1q8f -----END PGP SIGNATURE----- From mal at egenix.com Sun Jun 13 15:11:04 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 13 Jun 2010 15:11:04 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> Message-ID: <4C14D8E8.4010903@egenix.com> Tarek Ziad? wrote: > On Fri, Jun 11, 2010 at 11:06 PM, M.-A. Lemburg wrote: >> "Martin v. L?wis" wrote: >>> For a smaller project, start putting mirror support into setuptools or >>> distribute; this would make short (several hours) outages less severe >>> for the class of users that want permanent availability for downloading. >>> It's unlikely that the mirrors would break when the master goes down; >>> they just stop mirroring. >> >> A better and cleaner strategy is to put the static PyPI information >> up on Amazon Cloudscape and have DNS take care of providing local >> mirrors (edge servers) to setuptools et al. >> >> Such a setup won't require any complicated mirror logic in any >> of the existing client tools. >> >> By moving the PyPI installation to Amazon AWS, we could also >> get the RPC access distributed to more than just one server. >> >> As I said before, the PSF infrastructure committee needs to get on >> of the job of getting this implemented (including funding this >> development). >> >> If someone wants to volunteer helping with the setup, please contact >> the PSF at psf at python.org. > > What about continuing the work that was started last year ? > (and not finished due to a lack of time) > > There's a PEP we have started about a mirroring infrastructure: > http://www.python.org/dev/peps/pep-0381/ > > Some of its parts are already implemented in PyPI, and > what we need now is to work on the client side (pip, distribute, etc) > and bootstrap one or two mirrors using the protocol. We've had some private discussions about this, so I'm just going to summarize... The idea here is not to override the mirror PEP ideas, but to use the existing PyPI installation and put the content needed for the most widely distributed package tool (currently setuptools and zc.buildout) on a content delivery network (CDN) in order to have it highly available on a managed edge network. Amazon Cloudfront is such a CDN and has Python interfaces, hence the idea to use Cloudfront. I asked for volunteers, because I didn't know enough about Amazon Cloudfront to write up a proposal and don't have the cycles available to implement such a setup myself. In the meantime, I've done some research and now know enough to write a proposal for the PSF board to consider. If the board thinks it's a good idea, we'll need to pursue finding volunteers to implement it. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 13 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 35 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Sun Jun 13 15:11:52 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 13 Jun 2010 13:11:52 +0000 (UTC) Subject: [Catalog-sig] PyPI down again... References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12B862.9000503@v.loewis.de> <4C14A26C.2050804@v.loewis.de> Message-ID: Tres Seaver palladion.com> writes: > > > I'm saying this from (far) outside the playground and am not intending to > > contribute, so just take this as a suggestion, but: if it has to be rewritten , > > how about doing in Python 3? > > Such a choice would be contrary to the goal of keeping it in the "well > known Python web technologies" swimlane, to ease support by folks > already familiar with WSGI, etc.: none of the libraries / frameworks > are ported yet. SQLAlchemy and other libraries have been ported (as well as mod_wsgi). No major framework appears to have been ported, though. Regards Antoine. From ziade.tarek at gmail.com Sun Jun 13 17:14:13 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sun, 13 Jun 2010 17:14:13 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12B862.9000503@v.loewis.de> Message-ID: On Sun, Jun 13, 2010 at 12:09 PM, Georg Brandl wrote: > Am 13.06.2010 07:10, schrieb James Bennett: >> On Fri, Jun 11, 2010 at 5:27 PM, "Martin v. L?wis" wrote: >>>> What is it now, just a straight WSGI app? >>> >>> No, FCGI. >> >> Statements like this lead me to believe that ignoring Joel Spolsky >> would be the right thing to do. >> >> Right now the PyPI codebase seems to have a bus number[1] of one: >> Martin, who is apparently the only person who really understands the >> code well enough to do significant work on it. > > JFTR, I had a look at the code at PyCon last year and I could find my > way around it quite quickly. ?It's not like PyPI is such a big codebase > that you need a year to get familiar with it. > > This is of course not an argument against a rewrite, but the situation > is certainly not as gloomy as it is painted here from time to time. +1 I've written several patches and didn't have a problem understanding it > > Georg > > > -- > Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. > Four shall be the number of spaces thou shalt indent, and the number of thy > indenting shall be four. Eight shalt thou not indent, nor either indent thou > two, excepting that thou then proceed to four. Tabs are right out. > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > -- Tarek Ziad? | http://ziade.org From ziade.tarek at gmail.com Sun Jun 13 17:26:47 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sun, 13 Jun 2010 17:26:47 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C14D8E8.4010903@egenix.com> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> Message-ID: On Sun, Jun 13, 2010 at 3:11 PM, M.-A. Lemburg wrote: ... > > We've had some private discussions about this, so I'm just > going to summarize... > > The idea here is not to override the mirror PEP ideas, > but to use the existing PyPI installation and put the > content needed for the most widely distributed package tool > (currently setuptools and zc.buildout) on a content > delivery network (CDN) in order to have it highly available > on a managed edge network. I think it overlaps a bit the PEP goal, which is to set up a network of mirrors, and have them listed in the PyPI DNS so clients can switch from one mirror to another.(and even do geoloc!) Right now we already have "unofficial mirrors" and the idea of the PEP would be to list them officially at PyPI and to have them collect the stats so we cant count download hits. > Amazon Cloudfront is such a CDN and has Python interfaces, > hence the idea to use Cloudfront. > > I asked for volunteers, because I didn't know enough about > Amazon Cloudfront to write up a proposal and don't have > the cycles available to implement such a setup myself. > > In the meantime, I've done some research and now know > enough to write a proposal for the PSF board to consider. > If the board thinks it's a good idea, we'll need to > pursue finding volunteers to implement it. Well maybe this is the best path to follow right now, as it will be done faster, without having to interact with much people to set it up, so it's a quick win. But it will probably kill the mirroring protocol idea from the PEP in the process, which I think is superior in the long term since it provides a standardized ground for the community to set up mirrors independently from pypi.python.org. Regards Tarek -- Tarek Ziad? | http://ziade.org From ziade.tarek at gmail.com Sun Jun 13 17:36:00 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sun, 13 Jun 2010 17:36:00 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C14A26C.2050804@v.loewis.de> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12B862.9000503@v.loewis.de> <4C14A26C.2050804@v.loewis.de> Message-ID: On Sun, Jun 13, 2010 at 11:18 AM, "Martin v. L?wis" wrote: >> Right now the PyPI codebase seems to have a bus number[1] of one: >> Martin, who is apparently the only person who really understands the >> code well enough to do significant work on it. This is something which >> could be remedied by having more people learn the code and get >> familiar enough with it to make contributions, but that's complicated >> by the fact that PyPI still does so much basically from scratch -- it >> doesn't even use the standard gateway interface Python web developers >> are expected to be familiar with, much less any well-known libraries. > > There are several ways to run PyPI, including WSGI, FCGI, CGI, and a > stand-alone server. The mode which is used on PyPI just happens to be FCGI. > > I'm not sure how the integration with Apache matters - the actual code > generating web pages is the same all the time, no matter what gateway > interface is being used. > > As for the bus number: Richard Jones is also familiar with the code, as he > wrote it in the first place. He just didn't contribute much lately. > I believe Tarek is also knowledgable. So the bus factor is rather 3. I am pretty confident now with the code, and I don't think it's very complex. It just grew big in some parts, like webui.py which needs to be splited. Frankly, I think it just needs a bit of cleanup, maybe a migration to SQLAchemy but that's it. As a matter of fact; some folks in the Montreal Python user group are working on refactoring it right now, because they wanted to provide some new features. So I would be 0- on writing it from scratch. I'd suggest to move it to a DVCS (hg.python.org?) to make the contributions easier. Regards Tarek -- Tarek Ziad? | http://ziade.org From g.brandl at gmx.net Sun Jun 13 17:53:29 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 13 Jun 2010 17:53:29 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12B862.9000503@v.loewis.de> <4C14A26C.2050804@v.loewis.de> Message-ID: Am 13.06.2010 15:11, schrieb Antoine Pitrou: > Tres Seaver palladion.com> writes: >> >> > I'm saying this from (far) outside the playground and am not intending to >> > contribute, so just take this as a suggestion, but: if it has to be rewritten > , >> > how about doing in Python 3? >> >> Such a choice would be contrary to the goal of keeping it in the "well >> known Python web technologies" swimlane, to ease support by folks >> already familiar with WSGI, etc.: none of the libraries / frameworks >> are ported yet. > > SQLAlchemy and other libraries have been ported (as well as mod_wsgi). > No major framework appears to have been ported, though. That's also because last I heard there was no consensus yet how WSGI would look like on Python 3. But that would be on-topic for web-SIG. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From martin at v.loewis.de Sun Jun 13 19:33:56 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 13 Jun 2010 19:33:56 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12B862.9000503@v.loewis.de> <4C14A26C.2050804@v.loewis.de> Message-ID: <4C151684.3070808@v.loewis.de> > I'm saying this from (far) outside the playground and am not intending to > contribute, so just take this as a suggestion, but: if it has to be rewritten, > how about doing in Python 3? It wouldn't really matter, so: +0. Regards, Martin From martin at v.loewis.de Sun Jun 13 19:40:20 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 13 Jun 2010 19:40:20 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> Message-ID: <4C151804.6050903@v.loewis.de> > I think it overlaps a bit the PEP goal, which is to set up a network of mirrors, > and have them listed in the PyPI DNS so clients can switch from one mirror > to another.(and even do geoloc!) JFTR, this already exists. a.mirrors.pypi.python.org and b.mirrors.pypi.python.org are already there and could be used by clients. > Well maybe this is the best path to follow right now, as it will be done faster, > without having to interact with much people to set it up, so it's a quick win. My main worry (besides the client integration) is statistics: I do want to get download statistics. So anybody implementing it would have to find a way of fetching the download numbers from Amazon. > But it will probably kill the mirroring protocol idea from the PEP in > the process, > which I think is superior in the long term since it provides a > standardized ground > for the community to set up mirrors independently from pypi.python.org. I also remain skeptical that this cloud idea is useful at all. Amazon Cloudfront is a *beta* service. So they aren't sure themselves whether it works correctly - and there have been reports about two-day outages of EC2, for bitbucket.org. There also have been complaints about the available bandwidth. So I'm not sure whether replacing a single point of failure with a different one is actually improving anything. Regards, Martin From ziade.tarek at gmail.com Sun Jun 13 22:06:10 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sun, 13 Jun 2010 22:06:10 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C151804.6050903@v.loewis.de> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C151804.6050903@v.loewis.de> Message-ID: 2010/6/13 "Martin v. L?wis" : >> I think it overlaps a bit the PEP goal, which is to set up a network of >> mirrors, >> and have them listed in the PyPI DNS ?so clients can switch from one >> mirror >> to another.(and even do geoloc!) > > JFTR, this already exists. a.mirrors.pypi.python.org and > b.mirrors.pypi.python.org are already there and could be used by clients. I wasn't aware of these mirrors. Do you maintain them ? how are they synchronized ? Do you get the statistics if we use them ? If so, we could start to use them in all clients asap. (as fallbacks if PyPI gets down) > >> Well maybe this is the best path to follow right now, as it will be done >> faster, >> without having to interact with much people to set it up, so it's a quick >> win. > > My main worry (besides the client integration) is statistics: I do want to > get download statistics. So anybody implementing it would have to find a way > of fetching the download numbers from Amazon. > >> But it will probably kill the mirroring protocol idea from the PEP in >> the process, >> which I think is superior in the long term since it provides a >> standardized ground >> for the community to set up mirrors independently from pypi.python.org. > > I also remain skeptical that this cloud idea is useful at all. Amazon > Cloudfront is a *beta* service. So they aren't sure themselves whether it > works correctly - and there have been reports about two-day outages of EC2, > for bitbucket.org. There also have been complaints about the available > bandwidth. So I'm not sure whether replacing a single point of failure with > a different one is actually improving anything. ISTM that the workload is the same, whether a cloud or a regular mirror is used, because of the statistics. FIY, the work to be done for the mirrors, beside the PEP editing consist of : - implementing the extra pages generation + stats builder in a package like z3c.pypimirror (http://pypi.python.org/pypi/z3c.pypimirror) which is used by several mirrors. - adding the client-side code in a project like Distribute or Pp Regards Tarek -- Tarek Ziad? | http://ziade.org From martin at v.loewis.de Sun Jun 13 22:19:09 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 13 Jun 2010 22:19:09 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C151804.6050903@v.loewis.de> Message-ID: <4C153D3D.7050604@v.loewis.de> >> JFTR, this already exists. a.mirrors.pypi.python.org and >> b.mirrors.pypi.python.org are already there and could be used by clients. > > I wasn't aware of these mirrors. Do you maintain them ? how are they > synchronized ? Yes, using pep381client. Notice that a.mirrors is dinsdale itself, so there is only a single mirror. > Do you get the statistics if we use them ? Not yet; that's not implemented yet. More specifically, I get the statistics, but the log files are not yet processed. > If so, we could start to use them in all clients asap. (as fallbacks > if PyPI gets down) As I said: setuptools and distribute should start supporting PEP 381, as an experimental feature. > - adding the client-side code in a project like Distribute or Pp I think MAL believes that this would not be necessary if the Amazon service would be used; I remain skeptical. I'd rather have the clients try explicitly (and indicate to the user that they are using a mirror). Regards, Martin From mal at egenix.com Mon Jun 14 11:12:07 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 14 Jun 2010 11:12:07 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C151804.6050903@v.loewis.de> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C151804.6050903@v.loewis.de> Message-ID: <4C15F267.6040108@egenix.com> "Martin v. L?wis" wrote: >> I think it overlaps a bit the PEP goal, which is to set up a network >> of mirrors, >> and have them listed in the PyPI DNS so clients can switch from one >> mirror >> to another.(and even do geoloc!) > > JFTR, this already exists. a.mirrors.pypi.python.org and > b.mirrors.pypi.python.org are already there and could be used by clients. > >> Well maybe this is the best path to follow right now, as it will be >> done faster, >> without having to interact with much people to set it up, so it's a >> quick win. > > My main worry (besides the client integration) is statistics: I do want > to get download statistics. So anybody implementing it would have to > find a way of fetching the download numbers from Amazon. Download statistics are readily available from Amazon Cloudfront, so no worries: you'll get statistics for all edge server downloads. http://docs.amazonwebservices.com/AmazonCloudFront/latest/DeveloperGuide/index.html?AccessLogs.html >> But it will probably kill the mirroring protocol idea from the PEP in >> the process, >> which I think is superior in the long term since it provides a >> standardized ground >> for the community to set up mirrors independently from pypi.python.org. > > I also remain skeptical that this cloud idea is useful at all. Amazon > Cloudfront is a *beta* service. So they aren't sure themselves whether > it works correctly - and there have been reports about two-day outages > of EC2, for bitbucket.org. There also have been complaints about the > available bandwidth. So I'm not sure whether replacing a single point of > failure with a different one is actually improving anything. Amazon Cloudfront uses S3 as basis for the service, S3 has been around for years and has a very stable uptime: http://www.readwriteweb.com/archives/amazon_s3_exceeds_9999_percent_uptime.php Cloudfront itself has been around since Nov 2008. You can check their current online status using this panel: http://status.aws.amazon.com/ Apart from the gained availability and outsourced management, we'd also get faster downloads in most parts of the world, due to the local caching Cloudfront is applying (and this can be used to further increase the availability, since we can control the expiry time of those local copies). So in summary we are replacing a single point of failure with N points of failure (with N being the number of edge caching servers they use). Regaring the bitbucket problem you mentioned: EC2 is their virtual server service, which we don't use. The bitbucket problems originated from a) a DDoS attack on their virtual servers running on EC2 and b) a problem with the Amazon EBS, which is their virtualized SAN, and was related to the way the DDoS was done (EBS and the DDoS attack both used UDP): http://blog.bitbucket.org/2009/10/04/on-our-extended-downtime-amazon-and-whats-coming/ """ And to re-iterate, the problem wasn?t really Amazon EC2 or EBS, it was isolated to our case, due to the nature of the attack. """ -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 14 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 34 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Mon Jun 14 11:27:15 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 14 Jun 2010 11:27:15 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> Message-ID: <4C15F5F3.40501@egenix.com> Tarek Ziad? wrote: > On Sun, Jun 13, 2010 at 3:11 PM, M.-A. Lemburg wrote: > ... >> >> We've had some private discussions about this, so I'm just >> going to summarize... >> >> The idea here is not to override the mirror PEP ideas, >> but to use the existing PyPI installation and put the >> content needed for the most widely distributed package tool >> (currently setuptools and zc.buildout) on a content >> delivery network (CDN) in order to have it highly available >> on a managed edge network. > > I think it overlaps a bit the PEP goal, which is to set up a network of mirrors, > and have them listed in the PyPI DNS so clients can switch from one mirror > to another.(and even do geoloc!) > > Right now we already have "unofficial mirrors" and the idea of the PEP > would be to list them officially at PyPI and to have them collect the > stats so we cant count download hits. Note that the CDN does not mirror the content of PyPI, it just takes care of delivering the requested data to the various edge servers and caching it there for a while. This is a different concept than that of a full mirror that doesn't work like a cache, but instead provides a fully functional standalone server. I still think that the concept of being able to mirror PyPI servers is a useful one. >> Amazon Cloudfront is such a CDN and has Python interfaces, >> hence the idea to use Cloudfront. >> >> I asked for volunteers, because I didn't know enough about >> Amazon Cloudfront to write up a proposal and don't have >> the cycles available to implement such a setup myself. >> >> In the meantime, I've done some research and now know >> enough to write a proposal for the PSF board to consider. >> If the board thinks it's a good idea, we'll need to >> pursue finding volunteers to implement it. > > Well maybe this is the best path to follow right now, as it will be done faster, > without having to interact with much people to set it up, so it's a quick win. > > But it will probably kill the mirroring protocol idea from the PEP in > the process, > which I think is superior in the long term since it provides a > standardized ground > for the community to set up mirrors independently from pypi.python.org. We'll have to see. Note that the CDN will only deal with the static data on PyPI, not the RPC or the web GUI access. Since static data is all that setuptools et al. currently use for fetching the data, we'll see an improved uptime for easy_install and esp. zc.buildout which by nature of their concepts rely on having a high availability of the PyPI static data resources. If, in the future, package tools start to rely on RPC for fetching data, the situation will shift towards needing full functional mirrors again. OTOH, we could also provide a snapshot copy of the database data in form of a SQLite database on the CDN for those tools to download and use locally... there are lot's of things package tools could do :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 14 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 34 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From marrakis at gmail.com Mon Jun 14 12:35:08 2010 From: marrakis at gmail.com (Mathieu Leduc-Hamel) Date: Mon, 14 Jun 2010 12:35:08 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C15F5F3.40501@egenix.com> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> Message-ID: To continue the discussion about a rewrite or a cleanup of the Pypi codebase, I'm from Montreal-Python usergroup and I'm say that yes at the first the current codebase of pypi seem to be very unclear and difficult to maintain. But it's not an impossible mission and we are currently in the process of: - Adding functional test. The test coverage is now around 40% percent. - When we'll reach a more complete coverage, we want to replace the psycopg api by SQLAlchemy - Replace many manual manipulation of the metadata by a more robust and straightforward way of dealing with (distutils2 might be the option there) At first I was thinking about rewriting everything using the chishop project (an implementation of PyPi using django). But having the control of the code source and not dependent of any framework is maybe a better idea. More than, despite the frequent outage, pypi is working today, then just a modernization of code base seem to be best idea. By the wat, after a code review of tarek, a very useful thing might be to find a better way to deal and implement contributions coming from community. Right now Tarek is responsible of making the link between our effert and the work of Martin but we don't have any official public mirror of the source code and any roadmap. On Mon, Jun 14, 2010 at 11:27 AM, M.-A. Lemburg wrote: > Tarek Ziad? wrote: > > On Sun, Jun 13, 2010 at 3:11 PM, M.-A. Lemburg wrote: > > ... > >> > >> We've had some private discussions about this, so I'm just > >> going to summarize... > >> > >> The idea here is not to override the mirror PEP ideas, > >> but to use the existing PyPI installation and put the > >> content needed for the most widely distributed package tool > >> (currently setuptools and zc.buildout) on a content > >> delivery network (CDN) in order to have it highly available > >> on a managed edge network. > > > > I think it overlaps a bit the PEP goal, which is to set up a network of > mirrors, > > and have them listed in the PyPI DNS so clients can switch from one > mirror > > to another.(and even do geoloc!) > > > > Right now we already have "unofficial mirrors" and the idea of the PEP > > would be to list them officially at PyPI and to have them collect the > > stats so we cant count download hits. > > Note that the CDN does not mirror the content of PyPI, it > just takes care of delivering the requested data to the > various edge servers and caching it there for a while. > > This is a different concept than that of a full mirror that > doesn't work like a cache, but instead provides a fully > functional standalone server. > > I still think that the concept of being able to mirror PyPI > servers is a useful one. > > >> Amazon Cloudfront is such a CDN and has Python interfaces, > >> hence the idea to use Cloudfront. > >> > >> I asked for volunteers, because I didn't know enough about > >> Amazon Cloudfront to write up a proposal and don't have > >> the cycles available to implement such a setup myself. > >> > >> In the meantime, I've done some research and now know > >> enough to write a proposal for the PSF board to consider. > >> If the board thinks it's a good idea, we'll need to > >> pursue finding volunteers to implement it. > > > > Well maybe this is the best path to follow right now, as it will be done > faster, > > without having to interact with much people to set it up, so it's a quick > win. > > > > But it will probably kill the mirroring protocol idea from the PEP in > > the process, > > which I think is superior in the long term since it provides a > > standardized ground > > for the community to set up mirrors independently from pypi.python.org. > > We'll have to see. > > Note that the CDN will only deal with the static data on PyPI, > not the RPC or the web GUI access. > > Since static data is all that setuptools et al. currently use > for fetching the data, we'll see an improved uptime for easy_install > and esp. zc.buildout which by nature of their concepts rely on having > a high availability of the PyPI static data resources. > > If, in the future, package tools start to rely on RPC for > fetching data, the situation will shift towards needing full > functional mirrors again. > > OTOH, we could also provide a snapshot copy of the database > data in form of a SQLite database on the CDN for those tools > to download and use locally... there are lot's of things > package tools could do :-) > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Jun 14 2010) > >>> Python/Zope Consulting and Support ... http://www.egenix.com/ > >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > 2010-07-19: EuroPython 2010, Birmingham, UK 34 days to go > > ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: > > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at geek.net Mon Jun 14 13:50:56 2010 From: mark at geek.net (Mark Ramm) Date: Mon, 14 Jun 2010 07:50:56 -0400 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C15F5F3.40501@egenix.com> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> Message-ID: > If, in the future, package tools start to rely on RPC for > fetching data, the situation will shift towards needing full > functional mirrors again. Ideally we move some of this to be accessible via a more REST style interface where http GET requests (which would be by far the most common case) are still cacheable via all the standard mechanisms. I'm not a REST evangelist in most cases, but when scale and availability really do matter, REST buys you quite a bit by allowing you to scale and cache in all the ways that the web does. --Mark Ramm From ametaireau at gmail.com Mon Jun 14 16:02:50 2010 From: ametaireau at gmail.com (=?UTF-8?Q?Alexis_M=C3=A9taireau?=) Date: Mon, 14 Jun 2010 16:02:50 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> Message-ID: Hi all, Distutils2 will bring two APIs to request PyPI, via the "simple" API and via the XML-RPC one. The fact is that the Simple API (it's just HTML pages, in a REST style as pointed out by Mark) does not provides all information we need, especially about distribution dependencies or if we want to query some others things contained in the metadatas. I'm working on two simple APIs for that, and I'll probably make a wrapper around both, wich could choose the right one to use, depending on the needs (eg. don't always rely on RPC or on "REST"). As we are talking about refactoring PyPI, it will probably be nice to have a real REST API, that talks JSON or XML, replacing the HTML pages actually served on http://pypi.python.org/simple/ :) Cheers, Alexis On Mon, Jun 14, 2010 at 1:50 PM, Mark Ramm wrote: > > If, in the future, package tools start to rely on RPC for > > fetching data, the situation will shift towards needing full > > functional mirrors again. > > Ideally we move some of this to be accessible via a more REST style > interface where http GET requests (which would be by far the most > common case) are still cacheable via all the standard mechanisms. > > I'm not a REST evangelist in most cases, but when scale and > availability really do matter, REST buys you quite a bit by allowing > you to scale and cache in all the ways that the web does. > > --Mark Ramm > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > -- Alexis -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Mon Jun 14 16:12:49 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 14 Jun 2010 16:12:49 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> Message-ID: <4C1638E1.102@egenix.com> When designing such interfaces, please consider that the PyPI information is mostly static. If there's information missing, it should be easy to add it to e.g. a new info file placed into the package's "simple" directory that package tools could pick up in REST style. Static directories just scale a lot better than any kind of (true) RPC interface and offloading some work to the client is certainly a good strategy as well. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 14 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 34 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ Alexis M?taireau wrote: > Hi all, > > Distutils2 will bring two APIs to request PyPI, via the "simple" API and via > the XML-RPC one. > > The fact is that the Simple API (it's just HTML pages, in a REST style as > pointed out by Mark) > does not provides all information we need, especially about distribution > dependencies or if we want to query some others things contained in the > metadatas. > > I'm working on two simple APIs for that, and I'll probably make a wrapper > around both, wich could choose the right one to use, depending on the needs > (eg. don't always rely on RPC or on "REST"). > > As we are talking about refactoring PyPI, it will probably be nice to have a > real REST API, that talks JSON or XML, replacing the HTML pages actually > served on http://pypi.python.org/simple/ :) > > Cheers, > Alexis > > On Mon, Jun 14, 2010 at 1:50 PM, Mark Ramm wrote: > >>> If, in the future, package tools start to rely on RPC for >>> fetching data, the situation will shift towards needing full >>> functional mirrors again. >> >> Ideally we move some of this to be accessible via a more REST style >> interface where http GET requests (which would be by far the most >> common case) are still cacheable via all the standard mechanisms. >> >> I'm not a REST evangelist in most cases, but when scale and >> availability really do matter, REST buys you quite a bit by allowing >> you to scale and cache in all the ways that the web does. >> >> --Mark Ramm >> _______________________________________________ >> Catalog-SIG mailing list >> Catalog-SIG at python.org >> http://mail.python.org/mailman/listinfo/catalog-sig >> > > > From ametaireau at gmail.com Mon Jun 14 17:00:58 2010 From: ametaireau at gmail.com (=?UTF-8?Q?Alexis_M=C3=A9taireau?=) Date: Mon, 14 Jun 2010 17:00:58 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C1638E1.102@egenix.com> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> <4C1638E1.102@egenix.com> Message-ID: On Mon, Jun 14, 2010 at 4:12 PM, M.-A. Lemburg wrote: > When designing such interfaces, please consider that the PyPI information > is mostly static. If there's information missing, it should be easy to add > it to e.g. a new info file placed into the package's "simple" directory > that package tools could pick up in REST style. > Yes, it can solve some problems pointed out here, and I'll consider that. *but* it's not a solution to all problems, and RPC calls will be of a great help in some cases, as it could be very long to fetch all metadata, process it on the client side and return information about eg. a search by other fields than name (give me all distributions from this author). But, definitively, yes, I'll consider that. Thanks ! -- Alexis -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Mon Jun 14 17:06:50 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 14 Jun 2010 17:06:50 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> <4C1638E1.102@egenix.com> Message-ID: <4C16458A.4070001@egenix.com> Alexis M?taireau wrote: > On Mon, Jun 14, 2010 at 4:12 PM, M.-A. Lemburg wrote: > >> When designing such interfaces, please consider that the PyPI information >> is mostly static. If there's information missing, it should be easy to add >> it to e.g. a new info file placed into the package's "simple" directory >> that package tools could pick up in REST style. >> > > Yes, it can solve some problems pointed out here, and I'll consider that. > *but* it's not a solution to all problems, and RPC calls will be of a great > help in some cases, as it could be very long to fetch all metadata, process > it on the client side and return information about eg. a search by other > fields than name (give me all distributions from this author). Agreed, that's why I think it would be useful to simply put all meta data into a SQLite database file and ship that as static file as well. Local clients could then download the database file (probably only a few MB) and work on it locally. This would also make searches in PyPI a lot faster... not only because searches could be done locally, but also because the server wouldn't have to handle the load of those searches from hundreds of clients. > But, definitively, yes, I'll consider that. Thanks ! -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 14 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 34 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From marrakis at gmail.com Mon Jun 14 17:14:11 2010 From: marrakis at gmail.com (Mathieu Leduc-Hamel) Date: Mon, 14 Jun 2010 17:14:11 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C16458A.4070001@egenix.com> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> <4C1638E1.102@egenix.com> <4C16458A.4070001@egenix.com> Message-ID: > > Agreed, that's why I think it would be useful to simply put > all meta data into a SQLite database file and ship that as > static file as well. Local clients could then download the > database file (probably only a few MB) and work on it locally. > > I don't think it would be easy to do that right now since the database store more informations than only the metadata of the all packages, you don't wanna give all the informations about users accounts by example... And, I don't understand how it can be perform with always updated informations like ones on the pypi website, your database is always updated, than it's not possible to have a completely updated one... Search might not by the big deal if the data and a good cache interface is implemented, the number of parallel connexion is something else... This would also make searches in PyPI a lot faster... not only > because searches could be done locally, but also because the > server wouldn't have to handle the load of those searches from > hundreds of clients. > > > But, definitively, yes, I'll consider that. > > Thanks ! > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Jun 14 2010) > >>> Python/Zope Consulting and Support ... http://www.egenix.com/ > >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > 2010-07-19: EuroPython 2010, Birmingham, UK 34 days to go > > ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: > > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Mon Jun 14 17:22:48 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 14 Jun 2010 17:22:48 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> <4C1638E1.102@egenix.com> <4C16458A.4070001@egenix.com> Message-ID: <4C164948.6040602@egenix.com> Mathieu Leduc-Hamel wrote: >> >> Agreed, that's why I think it would be useful to simply put >> all meta data into a SQLite database file and ship that as >> static file as well. Local clients could then download the >> database file (probably only a few MB) and work on it locally. >> > I don't think it would be easy to do that right now since the database > store more informations than only the metadata of the all packages, you > don't wanna give all the informations about users accounts by example... PyPI uses PostgreSQL as database backend, so the SQLite database file would be a (partial) copy of that database. Of course, it would have to only contain meta-data that is also visible via the web GUI. > And, I don't understand how it can be perform with always updated > informations like ones on the pypi website, your database is always updated, > than it's not possible to have a completely updated one... True, but I think only very few users are really after real-time data from PyPI. Those can use the true RPC interfaces. For the others, a static copy created and updated every 10-20 minutes or so, is likely good enough. Anyway, it's an idea based on the 80/20 rule :-) > Search might not by the big deal if the data and a good cache interface is > implemented, the number of parallel connexion is something else... > >> This would also make searches in PyPI a lot faster... not only >> because searches could be done locally, but also because the >> server wouldn't have to handle the load of those searches from >> hundreds of clients. >> >>> But, definitively, yes, I'll consider that. >> >> Thanks ! -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 14 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 34 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From chris at simplistix.co.uk Mon Jun 14 17:27:50 2010 From: chris at simplistix.co.uk (Chris Withers) Date: Mon, 14 Jun 2010 16:27:50 +0100 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12B862.9000503@v.loewis.de> <4C12C07F.6000706@v.loewis.de> <4C1338A6.1020601@v.loewis.de> <4C13450A.9050104@v.loewis.de> Message-ID: <4C164A76.2050006@simplistix.co.uk> Guido van Rossum wrote: > On Sat, Jun 12, 2010 at 12:48 PM, Justin Ryan > wrote: >> Thanks, Martin, for taking the conversation offline to be a real jerk. ;) > > (I won't quote more. Everyone who read it is still reeling from the > sudden outburst.) Sadly, it appears some people never change: https://mail.zope.org/pipermail/zope-web/2006-October/004226.html https://mail.zope.org/pipermail/zope-web/2006-October/date.html cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Mon Jun 14 19:15:24 2010 From: chris at simplistix.co.uk (Chris Withers) Date: Mon, 14 Jun 2010 18:15:24 +0100 Subject: [Catalog-sig] [Fwd: Re: PyPI down again...] Message-ID: <4C1663AC.6090708@simplistix.co.uk> Apologies for forwarding this mail onto the list, please do not reply, I would just like this and the following message archived publicly so people can avoid this invdividual... -------- Original Message -------- Subject: Re: [Catalog-sig] PyPI down again... Date: Mon, 14 Jun 2010 09:19:45 -0700 From: Justin Ryan To: Chris Withers CC: Guido van Rossum Passion is better than the rampant apathy you show. Frankly, Chris, I believe that you may represent the sort of half-cocked volunteer that turned Martin off to my offers of volunteering. And, Chris, I started by standing up for you, asking why a long standing member of this list showing concern over the constant system failure was not being answered, was being ignored. I'm not interested in being a part of any group or organization not trying to GET SHIT DONE. And that clearly includes the PSF. If it wasn't clear to the list, you guys can make it so, that was a farewell message from someone offering lots and lots of free time. Anyway, you guys are all going to the killfile, so don't bother responding. Also, grow a fucking sense of humor. Boy was that one of my most polite fuck-yous ever. On Mon, Jun 14, 2010 at 8:27 AM, Chris Withers wrote: > Guido van Rossum wrote: >> >> On Sat, Jun 12, 2010 at 12:48 PM, Justin Ryan >> wrote: >>> >>> Thanks, Martin, for taking the conversation offline to be a real jerk. ;) >> >> (I won't quote more. Everyone who read it is still reeling from the >> sudden outburst.) > > Sadly, it appears some people never change: > > https://mail.zope.org/pipermail/zope-web/2006-October/004226.html > > https://mail.zope.org/pipermail/zope-web/2006-October/date.html > > cheers, > > Chris > > -- > Simplistix - Content Management, Batch Processing & Python Consulting > - http://www.simplistix.co.uk > ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________ -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Mon Jun 14 19:15:51 2010 From: chris at simplistix.co.uk (Chris Withers) Date: Mon, 14 Jun 2010 18:15:51 +0100 Subject: [Catalog-sig] [Fwd: Re: PyPI down again...] Message-ID: <4C1663C7.3000406@simplistix.co.uk> More, again, please do not reply... -------- Original Message -------- Subject: Re: [Catalog-sig] PyPI down again... Date: Mon, 14 Jun 2010 09:33:51 -0700 From: Justin Ryan To: Chris Withers CC: Guido van Rossum And, for what it's worth, what set me off is Martin's attitude that "Something Else" should be fixed. He's clearly pissy because everyone wants to rewrite the thing in Djangass, which is Stupid(tm), and I'll grant him that, but in engineering, when your system is fragile, it is not a strong reflection on oneself to say: "Yes, but if you turned the doorknob oh so gently, it wouldn't fall off." We don't fucking need mirrors, we fucking need to stop counting downloads. Apt doesn't work that way. Yum doesn't. Microsoft and Apple almost definitely do. What kind of ridiculous software distribution mechanism requires postgres for read-only operations. This design would not be acceptable at Google, Mr. Rossum, I know that because I've interviewed with those fucking narcissists so many times I now tell recruiters anyone but Google. The characteristics of scaling PyPI currently is like scaling AdSense. Planning on charging per download soon? Anyway, you guys have lost the only person with time to dedicate to this apparently. Go buy Martin a "World's Greatest Dad" T-Shirt and remind him that he's important because he, periodically, for a few minutes at a time, does things that you would never, ever bother yourself with. On Mon, Jun 14, 2010 at 9:19 AM, Justin Ryan wrote: > Passion is better than the rampant apathy you show. Frankly, Chris, I > believe that you may represent the sort of half-cocked volunteer that > turned Martin off to my offers of volunteering. > > And, Chris, I started by standing up for you, asking why a long > standing member of this list showing concern over the constant system > failure was not being answered, was being ignored. > > I'm not interested in being a part of any group or organization not > trying to GET SHIT DONE. > > And that clearly includes the PSF. > > If it wasn't clear to the list, you guys can make it so, that was a > farewell message from someone offering lots and lots of free time. > > Anyway, you guys are all going to the killfile, so don't bother responding. > > Also, grow a fucking sense of humor. Boy was that one of my most > polite fuck-yous ever. > > On Mon, Jun 14, 2010 at 8:27 AM, Chris Withers wrote: >> Guido van Rossum wrote: >>> >>> On Sat, Jun 12, 2010 at 12:48 PM, Justin Ryan >>> wrote: >>>> >>>> Thanks, Martin, for taking the conversation offline to be a real jerk. ;) >>> >>> (I won't quote more. Everyone who read it is still reeling from the >>> sudden outburst.) >> >> Sadly, it appears some people never change: >> >> https://mail.zope.org/pipermail/zope-web/2006-October/004226.html >> >> https://mail.zope.org/pipermail/zope-web/2006-October/date.html >> >> cheers, >> >> Chris >> >> -- >> Simplistix - Content Management, Batch Processing & Python Consulting >> - http://www.simplistix.co.uk >> > ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________ -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From jannis at leidel.info Mon Jun 14 19:58:38 2010 From: jannis at leidel.info (Jannis Leidel) Date: Mon, 14 Jun 2010 19:58:38 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C12A2E4.2090305@v.loewis.de> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> Message-ID: <67072407-83C3-4654-B7C0-33AA3D310370@leidel.info> Hi all, Apologies for the late reply, I was traveling. >> Is it possible it's time to designate a team? I'm sure everyone >> appreciates the hard work of a lone volunteer, but having been one >> myself at times, the feeling that others may not do the job right is >> often eclipsed by their availability to try. > > Help is certainly appreciated. The type of help depends on the volunteer, of course. E.g. I wouldn't want to give root accounts to > the first person that comes along and asks for them (except when the first person is Jannis Leidel, who (I believe) did the Apache restart > today). Yes, I restarted Apache after getting a failure report on IRC. I'll look into the reasons later today. Jannis From mal at egenix.com Tue Jun 15 13:49:03 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 15 Jun 2010 13:49:03 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability Message-ID: <4C1768AF.9040606@egenix.com> As mentioned, I've been working on a proposal text for the cloud idea. Here's a first draft. Please have a look and let me know whether I've missed any important facts. Thanks. I intend to post the proposal to the PSF board (of which I'm a member, in case you shouldn't know) and to have it vote on the proposal in one of the next board meetings. """ PSF-Proposal: 100 Title: Move PyPI static data to the cloud for better availability Version: Draft 1 Last-Modified: 2010-06-15 Author: mal at lemburg.com (Marc-Andr? Lemburg) Discussions-To: catalog-sig at python.org Status: Draft Type: Informational Created: 2010-06-14 Post-History: Proposal: Move PyPI static data to the cloud for better availability ======================================================================== Motivation ---------- PyPI has in recent months seen several outages with the index not being unavailable to both users using the web GUI interface as well as package administration tools such as easy_install from setuptools. As more and more Python applications rely on tools such as easy_install for direct installation, or zc.buildout to manage the complete software configuration cycle, the PyPI infrastructure receives more and more attention from the Python community. In order to maintain its credibility as software repository, to support the many different projects relying on the PyPI infrastructure and the many users who rely on the simplified installation process enabled by PyPI, the PSF needs to take action and move the essential parts of PyPI to a more robust infrastructur that provides: * scalability * 24/7 system administration management * geo-localized fast and reliable access Current Situation ----------------- PyPI is currently run from a single server hosted in The Netherlands (ximinez.python.org). This server is run by a very small team of sys admin. PyPI itself has in recent months been mostly maintained by one developer: Martin von Loewis. Projects are underway to enhance PyPI in various ways, including a proposal to add external mirroring (PEP 381), but these are all far from being finalized or implemented. Usage ----- PyPI provides four different mechanisms for accessing the stored information: * a web GUI that is meant for use by humans * an RPC interface which is mostly used for uploading new content * a semi-static /simple package listing, used by setuptools * a static area /packages for package download files and documentation, used by both the web GUI and setuptools The /simple package listing is dump of all packages in PyPI using a simple HTML page with links to sub-pages for each package. These sub-pages provide links to download files and external references. External tools like easy_install only use the /simple package listing together with the hosted package download files. While the /simple package listing is currently dynamically created from the database in real-time, this is not really needed for normal operation. A static copy created every 10-20 minutes would provide the same level of service in much the same way. Moving static data to a CDN --------------------------- Under the proposal the static information stored in PyPI (meta-information as well as package download files and documentation) is moved to a content delivery network (CDN). For this purpose, the /simple package listing is replaced with a static copy that is recreated every 10-20 minutes using a cronjob on the PyPI server. At the same intervals, another script will scan the package and documentation files under /packages for updates and upload any changes to the CDN for neartime availability. By using a CDN the PSF will enable and provide: * high availability of the static PyPI content * offload management to the CDN * enable geo-localized downloads, i.e. the files are hosted on a nearby server * faster downloads * more reliability and scalability * move away from a single point of failure setup Note that the proposal does not cover distribution of the dynamic parts of PyPI. As a result uploads to PyPI may still fail if the PyPI server goes down. However, these dynamic parts are currently not being used by the existing package installation tools. Choice of CDN: Amazon Cloudfront -------------------------------- To keep the costs low for the PSF, Amazon Cloudfront appears to be the bext choice for CDN. Cloudfront is supported by a set of Python libraries (e.g. Amazon S3 lib and boto), upload scripts are readily available and can easily be customized. http://www.saltycrane.com/blog/2008/12/card-store-project-4-notes-using-amazons-cloudfront/ Other CDNs, such as Akamai, are either more expensive or require custom integration. Availability of Python-based tools is not always given, in fact, accessing such information is difficult for most of the proporietary CDNs. Cloudfront: quality of service ------------------------------ Amazon Cloudfront uses S3 as basis for the service, S3 has been around for years and has a very stable uptime: http://www.readwriteweb.com/archives/amazon_s3_exceeds_9999_percent_uptime.php Cloudfront itself has been around since Nov 2008. You can check their current online status using this panel: http://status.aws.amazon.com/ Apart from the gained availability and outsourced management, we'd also get faster downloads in most parts of the world, due to the local caching Cloudfront is applying. This caching can be used to further increase the availability, since we can control the expiry time of those local copies. So in summary we are replacing a single point of failure with N points of failure (with N being the number of edge caching servers they use). How Cloudfront works -------------------- Cloudfront uses Amazon's S3 storage system which is based on "buckets". These can store any number of files in a directory-like structure. The only limit is a 5GB per file limit - more than enough for any PyPI package file. Cloudfront provides a domain for each registered S3 bucket via a "distribution" which is then made available through local cache servers in various locations around the world. The management of which server to use for an incoming request is transparently handled by Amazon. Once uploaded to the S3 bucket, the files will be distributed to the cache servers on demand and as necessary. Each edge server server maintains a cache of requested files and refetches the files after an expiry time which can be defined when uploading the file to the bucket. To simplify things on our side, we'll setup a CNAME DNS alias for the Cloudfront domain issued by Amazon to our bucket: pypi-static.python.org. IN CNAME d32z1yuk7jeryy.cloudfront.net. For more details, please see the Cloudfront documentation: http://aws.amazon.com/documentation/cloudfront/ Integration ----------- In order to keep the number of changes to existing client side tools and PyPI itself to a minimum, the installation will try to be as transparent to both the server and the client side as possible. This requires on the server side: * few, if any changes to the PyPI code base * simple scripts, driven by cronjobs * a simple distributed redirection setup to avoid having to change client side tools On the client side: * no need to change the existing URL http://pypi.python.org/simple to access PyPI * redirects are already supported by setuptools via urllib2 Server side: upload cronjobs ---------------------------- Since the /simple index tree is currently being created dynamically, we'd need to create static copies of it at regular intervals in order to upload the content to the S3 bucket. This can easily be done using tools such as wget or curl. Both the static copy of the /simple tree and the static files uploaded to /packages then need to be uploaded or updated in the S3 bucket by a cronjob running every 10-20 minutes. Server side: downloads statistics --------------------------------- The next step would then be to configure access logs: http://docs.amazonwebservices.com/AmazonCloudFront/latest/DeveloperGuide/index.html?AccessLogs.html and add a cronjob to download them to the PyPI server. Since the format is a bit different than the Apache log format used by the PyPI software, we'd have two options: 1. convert the Cloudfront format to Apache format and simply append the converted logs to the local log files 2. write a Cloudfront log file reader and add it to the apache_count_dist.py script that updates the download counts on the web GUI Both options require no more than a few hours to implement and test. Server side: redirection setup ------------------------------ Since PyPI wasn't designed to be put on a CDN, it mixes static file URL paths with dynamic access ones, e.g. dynamic: http://pypi.python.org/pypi (and a few others) static: http://pypi.python.org/simple http://pypi.python.org/packages To move part of the URL path tree to a CDN, which works based on domains, we will need to provide a URL redirection setup that redirects client side tools to the new location. As Martin von Loewis mentioned, this will require distributing the redirection setup to more than just one server as well. Fortunately, this is not difficult to do: it requires a preconfigured lighttpd (*) setup running on N different servers which then all provide the necessary redirections (and nothing more): dynamic: http://pypi.python.org/ -> http://ximinez.python.org/pypi http://pypi.python.org/pypi -> http://ximinez.python.org/pypi (and possibly a few others) static: http://pypi.python.org/simple -> http://pypi-static.python.org/simple http://pypi.python.org/packages -> http://pypi-static.python.org/packages http://pypi.python.org/documentation -> http://pypi-static.python.org/documentation (note: pypi-static.python.org is a CNAME alias for the Cloudfront domain issued to the S3 bucket where we upload the data) The pypi.python.org domain would then have to be setup to map to multiple IP addresses via DNS round-robin, one entry for each redirection server, e.g. pypi.python.org. IN A 123.123.123.1 pypi.python.org. IN A 123.123.123.1 pypi.python.org. IN A 123.123.123.3 pypi.python.org. IN A 123.123.123.4 Redirection servers could be run on all PSF server machines, and, to increase availability, on PSF partner servers as well. (*) lighttpd is a lightwheight and fast HTTP server. It's easy to setup, doesn't require a lot of resources on the server machine and runs stable. Long-term changes ----------------- While enabling the above redirection setup, we should also start working on changing PyPI and the client tools to use two new domains which then cleanly separate the static CDN file access from the dynamic PyPI server access: pypi.python.org pypi-static.python.org Such a transition on the client side is expected to take at least a few years. After that, the redirection service can be shut down or used to distribute and scale the dynamic PyPI service parts. Side-effects ------------ Restarts of the PyPI server, network outages, or hardware failures would not affect the static copies of the PyPI on the CDN. setuptools, easy_install, pip, zc.buildout, etc. would continue to work. The S3 bucket would serve as additional backup for the files on PyPI. Later intergration with Amazon EC2 (their virtual server offering) would easily be possible for more scalability and reduced system administration load. Costs ----- Amazon charges for S3 and Cloudfront storage, transfer and access. The costs vary depending on location. http://aws.amazon.com/cloudfront/#pricing http://aws.amazon.com/s3/#pricing To get an idea of the costs, we'd have to take a closer look at the PyPI web stats: http://pypi.python.org/webstats/usage_201005.html In May 2010, PyPI transferred 819GB data and had to handle 22mio requests. Using the AWS monthly calculator this gives roughly (I used 37KB as average object size and 35% US, 35% EU, 10% HK, 10% JP as basis): USD 132 per month, or about USD 1,600 per year. Refinancing the costs --------------------- Since PyPI is being used as essential resource by many important Python projects (Zope, Plone, Django, etc.), it's fair to ask the respective foundations and the general Python community for donations to help refinance the administration costs. A prominent donation button should go the PyPI page with a text explaining how PyPI is being hosted and why donations are necessary. We may also be able to directly ask for donations from the above foundations. Details of this are currently being evaluated by the PSF board (there are some issues related to our non-profit status that make this more complicated than it appears at first). Effort ------ Given that most of the tools are readily available, setting up the servers shouldn't take more than 2-3 developer days for developers who've worked with Amazon S3 and Cloudfront before, including testing. It is expected that we'll find volunteers to implement the necessary changes. """ -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 15 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 33 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Tue Jun 15 14:02:28 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 15 Jun 2010 14:02:28 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> Message-ID: <4C176BD4.3080909@egenix.com> Mathieu Leduc-Hamel wrote: > To continue the discussion about a rewrite or a cleanup of the Pypi > codebase, I'm from Montreal-Python usergroup and I'm say that yes at the > first the current codebase of pypi seem to be very unclear and difficult to > maintain. > > But it's not an impossible mission and we are currently in the process of: > > - Adding functional test. The test coverage is now around 40% percent. > - When we'll reach a more complete coverage, we want to replace the psycopg > api by SQLAlchemy > - Replace many manual manipulation of the metadata by a more robust and > straightforward way of dealing with (distutils2 might be the option there) > > At first I was thinking about rewriting everything using the chishop project > (an implementation of PyPi using django). But having the control of the code > source and not dependent of any framework is maybe a better idea. > > More than, despite the frequent outage, pypi is working today, then just a > modernization of code base seem to be best idea. > > By the wat, after a code review of tarek, a very useful thing might be to > find a better way to deal and implement contributions coming from community. > Right now Tarek is responsible of making the link between our effert and the > work of Martin but we don't have any official public mirror of the source > code and any roadmap. You should be able to get access to the Python sandbox repository and add your project there: http://svn.python.org/projects/sandbox/trunk/ If that's not an option, I'd suggest you have a look at one of the other public repo sites such as launchpad. Note that working on PyPI needs a somewhat different development approach since any changes will be run on a live system. In my experience the best way to do this is by gradually changing things (rather than introduce big structural changes such as using SA instead of a native adapter) and keeping a close eye on the log files for any problems. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 15 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 33 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mcrute at gmail.com Tue Jun 15 14:20:43 2010 From: mcrute at gmail.com (Michael Crute) Date: Tue, 15 Jun 2010 08:20:43 -0400 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C1768AF.9040606@egenix.com> References: <4C1768AF.9040606@egenix.com> Message-ID: On Tue, Jun 15, 2010 at 7:49 AM, M.-A. Lemburg wrote: > As mentioned, I've been working on a proposal text for the cloud idea. > Here's a first draft. Please have a look and let me know whether I've > missed any important facts. Thanks. What about a set of volunteer mirrors of PyPi similar to the way CPAN and Linux distributions handle this problem. pypi.python.org? That approach eliminates any cost for the PSF and might ultimately result in better reliability. With the volunteer mirror system you would still statically generate the files and just make them available for rsync then setup a page to allow mirrors to register (see CPAN). If you take this approach I would be happy to donate a mirror to the pool. -- Michael E. Crute http://mike.crute.org It is a mistake to think you can solve any major problem just with potatoes. --Douglas Adams From marrakis at gmail.com Tue Jun 15 14:27:13 2010 From: marrakis at gmail.com (Mathieu Leduc-Hamel) Date: Tue, 15 Jun 2010 14:27:13 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C176BD4.3080909@egenix.com> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> <4C176BD4.3080909@egenix.com> Message-ID: Hi Martin, > You should be able to get access to the Python sandbox repository and > add your project there: > > http://svn.python.org/projects/sandbox/trunk/ > > If that's not an option, I'd suggest you have a look at one of the > other public repo sites such as launchpad. > Right now I'm working with Tarek Ziade on a clone of the PyPi repository sourcecode on bitbucket, that way, it allowed tarek to keep an eye the modifications I made on the source code since double checking any changes is very important, as you said, for this type of project. > > Note that working on PyPI needs a somewhat different development > approach since any changes will be run on a live system. > > In my experience the best way to do this is by gradually changing things > (rather than introduce big structural changes such as using SA > instead of a native adapter) and keeping a close eye on the log > files for any problems. > > That's why I was working to implement a better unit testing coverage. I would like to modernize a little bit the source code of pypi cause i think in the future there will some major structural changes of the code. Having a great test coverage will allow us to change the code and be less afraid of making mistakes. You know implementing SA is one of the many goal I would like to achieve, but I think the structural change you were proposing might need too some major changes to code base if we want to it properly. Maybe it would be easier to switch to the official mercurial repository ( hg.python.org), it would allow a better collaboration between everybody who would like to contribute. And if you want to see the changes I'll proposed, you could see it at: http://bitbucket.org/mtlpython/pypi (it will be merge in the tarek's repos soon) > -- Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Jun 15 2010) > >>> Python/Zope Consulting and Support ... http://www.egenix.com/ > >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > 2010-07-19: EuroPython 2010, Birmingham, UK 33 days to go > > ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: > > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Tue Jun 15 14:44:36 2010 From: ben+python at benfinney.id.au (Ben Finney) Date: Tue, 15 Jun 2010 22:44:36 +1000 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability References: <4C1768AF.9040606@egenix.com> Message-ID: <871vc81n4b.fsf@benfinney.id.au> Michael Crute writes: > On Tue, Jun 15, 2010 at 7:49 AM, M.-A. Lemburg wrote: > > As mentioned, I've been working on a proposal text for the cloud > > idea. Here's a first draft. Please have a look and let me know > > whether I've missed any important facts. Thanks. If ?the cloud? in this proposal means ?some single organisation or individual?, I don't think the situation is thereby improved much. > What about a set of volunteer mirrors of PyPi similar to the way CPAN > and Linux distributions handle this problem. pypi.python.org? That > approach eliminates any cost for the PSF and might ultimately result > in better reliability. +1. A distributed system of mirrors administrated by disparate organisations and/or individuals also greatly reduces the reliance on any individual or organisation, helping reduce the inherent risks of both conflict of interest and single-point-of-failure. -- \ ?Rightful liberty is unobstructed action, according to our | `\ will, within limits drawn around us by the equal rights of | _o__) others.? ?Thomas Jefferson | Ben Finney From fuzzyman at voidspace.org.uk Tue Jun 15 15:18:27 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 15 Jun 2010 14:18:27 +0100 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> Message-ID: On 15 June 2010 13:20, Michael Crute wrote: > On Tue, Jun 15, 2010 at 7:49 AM, M.-A. Lemburg wrote: > > As mentioned, I've been working on a proposal text for the cloud idea. > > Here's a first draft. Please have a look and let me know whether I've > > missed any important facts. Thanks. > > What about a set of volunteer mirrors of PyPi similar to the way CPAN > and Linux distributions handle this problem. pypi.python.org? That > approach eliminates any cost for the PSF and might ultimately result > in better reliability. With the volunteer mirror system you would > still statically generate the files and just make them available for > rsync then setup a page to allow mirrors to register (see CPAN). If > you take this approach I would be happy to donate a mirror to the > pool. > > >From the document: "Projects are underway to enhance PyPI in various ways, including a proposal to add external mirroring (PEP 381), but these are all far from being finalized or implemented." Just saying "mirroring" is not a solution in itself - that also takes time and effort. Michael > -- > Michael E. Crute > http://mike.crute.org > > It is a mistake to think you can solve any major problem just with > potatoes. --Douglas Adams > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > -- http://www.voidspace.org.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From ametaireau at gmail.com Tue Jun 15 15:48:23 2010 From: ametaireau at gmail.com (=?UTF-8?Q?Alexis_M=C3=A9taireau?=) Date: Tue, 15 Jun 2010 15:48:23 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C1768AF.9040606@egenix.com> References: <4C1768AF.9040606@egenix.com> Message-ID: Hello, Firstly, as Tarek said in another thread, I'm afraid this kill the PEP381 about making a mirroring infrastructure. Having a infrastructure hosted on a cloud platform may be confortable, and probably needed to have a 24/7 running system, but we need to take care of letting possible the creation of new public mirrors, outside from the Amazon (or whatever) cloud infrastructure. On Tue, Jun 15, 2010 at 1:49 PM, M.-A. Lemburg wrote: > > PyPI is currently run from a single server hosted in The Netherlands > (ximinez.python.org). This server is run by a very small team of sys > admin. > As Martin von L?wis said, this already exists. "a.mirrors.pypi.python.org and b.mirrors.pypi.python.org are already there and could be used by clients". Maybe Martin can you explain us (apologies if this is already done somewhere) how things are working from now ? Is this possible to rely on the existing work rather than using a cloud system ? What's the in place infrastructure ? Alexis -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Jun 15 16:33:45 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 16 Jun 2010 00:33:45 +1000 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C1768AF.9040606@egenix.com> References: <4C1768AF.9040606@egenix.com> Message-ID: <201006160033.46095.steve@pearwood.info> On Tue, 15 Jun 2010 09:49:03 pm M.-A. Lemburg wrote: > As mentioned, I've been working on a proposal text for the cloud > idea. Here's a first draft. Please have a look and let me know > whether I've missed any important facts. Thanks. I think the most important missed fact is, just how unreliable is PyPI currently? Does anyone know? I know there's a number of people complaining that it's down "all the time", or even occasionally, but I think that we need to know the magnitude of the problem that needs solving. What's the average length of time between outages? What's the average length of the outage? Just saying that there's been several outages in recent months is awfully hand-wavy. [...] > Amazon Cloudfront uses S3 as basis for the service, S3 has been > around for years and has a very stable uptime: > > http://www.readwriteweb.com/archives/amazon_s3_exceeds_9999_percent_u >ptime.php Is there anyone here who has personal experience with Cloudfront and is willing to vouch for it? Or argue against it? We can only go so far based on Amazon's marketing material. One thing that does worry me: > So in summary we are replacing a single point of failure with N > points of failure (with N being the number of edge caching servers > they use). I don't think this means what you seem to think it means. If you replace a single point of failure with N points of failure, your overall reliability goes down, not up, since there are now more things to go wrong. Assuming that they're independent points of failure, that means your total number of failures will increase by a factor of N. For example, if a single edge server in (say) Australia goes down, Amazon might not count it as an outage for the purpose of calculating their 99.99% reliability since the system as a whole is still up, but conceivably Australian users might see an outage (or at least a slow-down). With N servers, I'd expect N times the number of individual outages, with Amazon presumably only counting it as "system down" if all N servers go down at the same time. -- Steven D'Aprano From marrakis at gmail.com Tue Jun 15 16:42:53 2010 From: marrakis at gmail.com (Mathieu Leduc-Hamel) Date: Tue, 15 Jun 2010 16:42:53 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <201006160033.46095.steve@pearwood.info> References: <4C1768AF.9040606@egenix.com> <201006160033.46095.steve@pearwood.info> Message-ID: > > I think the most important missed fact is, just how unreliable is PyPI > currently? Does anyone know? > Exactly my point, right now, since the code is not completely clear and not tested we don't really know what's supposed to worked and how. It's really a problem when the only way you have to know if something goes wrong is when your users start complaining... > I don't think this means what you seem to think it means. If you replace > a single point of failure with N points of failure, your overall > reliability goes down, not up, since there are now more things to go > wrong. Assuming that they're independent points of failure, that means > your total number of failures will increase by a factor of N. > > This is why we should work on the heart the problem problem, pypi itself and why it's down sometime. Nobody know exactly what happen, maybe it's not a performance problems. As you said, we may have the same problem in the future on all mirroring nodes ... -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Tue Jun 15 17:55:30 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 15 Jun 2010 17:55:30 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <201006160033.46095.steve@pearwood.info> References: <4C1768AF.9040606@egenix.com> <201006160033.46095.steve@pearwood.info> Message-ID: <4C17A272.9070808@egenix.com> Steven D'Aprano wrote: > On Tue, 15 Jun 2010 09:49:03 pm M.-A. Lemburg wrote: >> As mentioned, I've been working on a proposal text for the cloud >> idea. Here's a first draft. Please have a look and let me know >> whether I've missed any important facts. Thanks. > > I think the most important missed fact is, just how unreliable is PyPI > currently? Does anyone know? > > I know there's a number of people complaining that it's down "all the > time", or even occasionally, but I think that we need to know the > magnitude of the problem that needs solving. What's the average length > of time between outages? What's the average length of the outage? Just > saying that there's been several outages in recent months is awfully > hand-wavy. I'm sorry, but I can't provide any numbers since there doesn't appear to be any monitoring in place to pull those numbers from. What I can say is that from reading the various mailing lists, PyPI is down often enough to let people start discussions about it and that's the point I want to address: """ In order to maintain its credibility as software repository, to support the many different projects relying on the PyPI infrastructure and the many users who rely on the simplified installation process enabled by PyPI, the PSF needs to take action and move the essential parts of PyPI to a more robust infrastructur that provides: * scalability * 24/7 system administration management * geo-localized fast and reliable access """ Setting up some Zenoss or Nagios monitoring system to take care of monitoring the PyPI server (and our other servers) would be a separate project. > [...] >> Amazon Cloudfront uses S3 as basis for the service, S3 has been >> around for years and has a very stable uptime: >> >> http://www.readwriteweb.com/archives/amazon_s3_exceeds_9999_percent_u >> ptime.php > > Is there anyone here who has personal experience with Cloudfront and is > willing to vouch for it? Or argue against it? We can only go so far > based on Amazon's marketing material. I don't have personal experience with Cloudfront, but have advised companies to use Amazon EC2 and S3 as disaster recovery and backup solution. So far, none of them has ever complained. While doing research for the proposal, I've read a lot of posts about people using Amazon S3 and Cloudfront. The overall feedback is very positive. If things still don't work out for us, we can always go back to the single server setup. The proposal doesn't bind us to Cloudfront or the CDN setup in any way. > One thing that does worry me: > >> So in summary we are replacing a single point of failure with N >> points of failure (with N being the number of edge caching servers >> they use). > > I don't think this means what you seem to think it means. If you replace > a single point of failure with N points of failure, your overall > reliability goes down, not up, since there are now more things to go > wrong. Assuming that they're independent points of failure, that means > your total number of failures will increase by a factor of N. > > For example, if a single edge server in (say) Australia goes down, > Amazon might not count it as an outage for the purpose of calculating > their 99.99% reliability since the system as a whole is still up, but > conceivably Australian users might see an outage (or at least a > slow-down). With N servers, I'd expect N times the number of individual > outages, with Amazon presumably only counting it as "system down" if > all N servers go down at the same time. It's poor wording, I agree. Thanks for pointing this out. The math is correct, though, I believe... Let's say all servers have a probability of being unavailable of P("Server down") = q (with q in [0,1]). Let's further assume that all servers are independent of each other. The probability of none of the servers being available then is P("System down") = q^N <= q Cloudfront uses a DNS round-robin system with a TTL of 60 seconds, and returns more than just one cache server per edge node, e.g. in Germany I get 8 cache servers: > dig d1ylr6sba64qi3.cloudfront.net ;; ANSWER SECTION: d1ylr6sba64qi3.cloudfront.net. 57 IN CNAME d1ylr6sba64qi3.ams1.cloudfront.net. d1ylr6sba64qi3.ams1.cloudfront.net. 57 IN A 216.137.59.184 d1ylr6sba64qi3.ams1.cloudfront.net. 57 IN A 216.137.59.250 d1ylr6sba64qi3.ams1.cloudfront.net. 57 IN A 216.137.59.84 d1ylr6sba64qi3.ams1.cloudfront.net. 57 IN A 216.137.59.106 d1ylr6sba64qi3.ams1.cloudfront.net. 57 IN A 216.137.59.15 d1ylr6sba64qi3.ams1.cloudfront.net. 57 IN A 216.137.59.102 d1ylr6sba64qi3.ams1.cloudfront.net. 57 IN A 216.137.59.40 d1ylr6sba64qi3.ams1.cloudfront.net. 57 IN A 216.137.59.118 ;; AUTHORITY SECTION: ams1.cloudfront.net. 141251 IN NS ns-ams1-01.cloudfront.net. ams1.cloudfront.net. 141251 IN NS ns-ams1-02.cloudfront.net. The probability of all 8 server being down is P("Edge node down") = q^8 <= q Assuming that Amazon's system monitoring is fast enough to detect the edge node down state, it will likely switch me over to a different edge within those 60 seconds, where I'll see another 8 or so servers: P("2 edge nodes unavailable") = q^8 * q^8 = q^16 and so on. Now compare all this to the probability of the single PyPI server being down: P("PyPI server down") = q >> q^N = P("Cloudfront down") In other words, the probability for PyPI on the CDN being unreachable for more than say 5 minutes (assuming the switchover to all edge nodes takes at most 5 minutes), is q^N. In numbers: Let's assume that q=0.01, ie. 99% uptime, with N=32 (the true number is likely higher): P("PyPI server down") = 0.01 >> P("Cloudfront down") = 0.01^32 = 1e-64 Of course, you'd have to add an offset of the Amazon infrastructure or network connectivity being down, human error, inherent system failures and DDoS attacks, so the actual numbers are higher. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 15 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 33 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Tue Jun 15 18:02:33 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 15 Jun 2010 18:02:33 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> Message-ID: <4C17A419.4060602@egenix.com> Alexis M?taireau wrote: > Hello, > > Firstly, as Tarek said in another thread, I'm afraid this kill the PEP381 > about making a mirroring infrastructure. > Having a infrastructure hosted on a cloud platform may be confortable, and > probably needed to have a 24/7 running system, but > we need to take care of letting possible the creation of new public mirrors, > outside from the Amazon (or whatever) cloud infrastructure. The proposal doesn't prevent that. However, please note that setting up public mirrors not under PSF control has its own set of (legal) problems, which the PSF hosted cloud setup avoids. > On Tue, Jun 15, 2010 at 1:49 PM, M.-A. Lemburg wrote: >> >> PyPI is currently run from a single server hosted in The Netherlands >> (ximinez.python.org). This server is run by a very small team of sys >> admin. >> > > As Martin von L?wis said, this already exists. "a.mirrors.pypi.python.org > and b.mirrors.pypi.python.org are already there and could be used by > clients". Maybe Martin can you explain us (apologies if this is already done > somewhere) how things are working from now ? Is this possible to rely on the > existing work rather than using a cloud system ? What's the in place > infrastructure ? In order to use those two servers, you'd still need to implement the redirection changes or client side tool changes and, what's more important, you'd need to administer and monitor those servers 24/7 to achieve similar uptime. The latter is what the proposal is all about: we're outsourcing the administration and monitoring to a service provider. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 15 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 33 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Tue Jun 15 18:10:31 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 15 Jun 2010 18:10:31 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> Message-ID: <4C17A5F7.7080808@egenix.com> Michael Crute wrote: > On Tue, Jun 15, 2010 at 7:49 AM, M.-A. Lemburg wrote: >> As mentioned, I've been working on a proposal text for the cloud idea. >> Here's a first draft. Please have a look and let me know whether I've >> missed any important facts. Thanks. > > What about a set of volunteer mirrors of PyPi similar to the way CPAN > and Linux distributions handle this problem. pypi.python.org? That > approach eliminates any cost for the PSF and might ultimately result > in better reliability. With the volunteer mirror system you would > still statically generate the files and just make them available for > rsync then setup a page to allow mirrors to register (see CPAN). If > you take this approach I would be happy to donate a mirror to the > pool. Thanks for the offer. Setting up such a network based on PSF partner organizations (to avoid the legal problems) would work indeed, but it would both take longer to setup and require more work on the administration side. I still think that the cloud proposal is more cost effective and faster to setup. If it doesn't work out, we can always go back to such a network of servers that we administer on our own. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 15 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 33 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ziade.tarek at gmail.com Tue Jun 15 19:02:05 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Tue, 15 Jun 2010 19:02:05 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17A419.4060602@egenix.com> References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> Message-ID: On Tue, Jun 15, 2010 at 6:02 PM, M.-A. Lemburg wrote: > Alexis M?taireau wrote: >> Hello, >> >> Firstly, as Tarek said in another thread, I'm afraid this kill the PEP381 >> about making a mirroring infrastructure. >> Having a infrastructure hosted on a cloud platform may be confortable, and >> probably needed to have a 24/7 running system, but >> we need to take care of letting possible the creation of new public mirrors, >> outside from the Amazon (or whatever) cloud infrastructure. > > The proposal doesn't prevent that. However, please note that > setting up public mirrors not under PSF control has its own > set of (legal) problems, which the PSF hosted cloud setup avoids. Mirrors already exists out there, so unless you ban them (which would be a really bad idea) setting up a cloud will not fix any legal issue if you think there's a legal issue. In any case, you can't prevent people from creating mirrors even if you would say its illegal. Moreover, having mirrors provided by the community is way better than relying on one single entity (the PSF) for this. (if we think "decentralized") So I think it would be better to focus on PEP 381, and make those existing mirrors comply with it. And maybe work on the legal issues you've mentioned > >> On Tue, Jun 15, 2010 at 1:49 PM, M.-A. Lemburg wrote: >>> >>> PyPI is currently run from a single server hosted in The Netherlands >>> (ximinez.python.org). ?This server is run by a very small team of sys >>> admin. >>> >> >> As Martin von L?wis said, this already exists. "a.mirrors.pypi.python.org >> ?and b.mirrors.pypi.python.org are already there and could be used by >> clients". Maybe Martin can you explain us (apologies if this is already done >> somewhere) how things are working from now ? Is this possible to rely on the >> existing work rather than using a cloud system ? What's the in place >> infrastructure ? > > In order to use those two servers, you'd still need to implement > the redirection changes or client side tool changes and, what's > more important, you'd need to administer and monitor those servers > 24/7 to achieve similar uptime. Not at all because the registered mirrors would be in the DNS round robin, and the clients would just have to switch to another mirror if a mirror is down. (that's explained in PEP 381) Such a decentralized system is far more reliable than any centralized system, and won't cost anything to the PSF. > > The latter is what the proposal is all about: we're outsourcing > the administration and monitoring to a service provider. Having a better PyPI server is of course a good idea, don't get me wrong. But it doesn't really solve anything at this point. A simple, documented protocol, and a list of registered mirrors backed up by the community is the way to go imho. And that's what unofficially happened already ! When PyPI is down, you'll see some tweet messages saying "go to this url, it's my mirror!" So I would trust the community and finish the PEP and provide a library that would allow anyone to run a PEP 381-compatible mirror. Regards Tarek -- Tarek Ziad? | http://ziade.org From ziade.tarek at gmail.com Tue Jun 15 19:09:30 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Tue, 15 Jun 2010 19:09:30 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> <4C176BD4.3080909@egenix.com> Message-ID: On Tue, Jun 15, 2010 at 2:27 PM, Mathieu Leduc-Hamel wrote: [..] > Maybe it would be easier to switch to the official mercurial repository > (hg.python.org), it would allow a better collaboration between everybody who > would like to contribute. Yes that's what I was proposing earlier in the thread. Having the repo at hg.python.org would facilitate contributions. We can have a process where they are reviewed by Martin and/or myself for example, and pulled from anyone's clone. I am volunteering to import it into hg.python.org, if Martin agrees for this switch. Regards Tarek -- Tarek Ziad? | http://ziade.org From ronaldoussoren at mac.com Tue Jun 15 19:15:00 2010 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 15 Jun 2010 19:15:00 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> Message-ID: On 15 Jun, 2010, at 19:02, Tarek Ziad? wrote: > On Tue, Jun 15, 2010 at 6:02 PM, M.-A. Lemburg wrote: >> Alexis M?taireau wrote: >>> Hello, >>> >>> Firstly, as Tarek said in another thread, I'm afraid this kill the PEP381 >>> about making a mirroring infrastructure. >>> Having a infrastructure hosted on a cloud platform may be confortable, and >>> probably needed to have a 24/7 running system, but >>> we need to take care of letting possible the creation of new public mirrors, >>> outside from the Amazon (or whatever) cloud infrastructure. >> >> The proposal doesn't prevent that. However, please note that >> setting up public mirrors not under PSF control has its own >> set of (legal) problems, which the PSF hosted cloud setup avoids. > > Mirrors already exists out there, so unless you ban them (which would > be a really bad idea) > setting up a cloud will not fix any legal issue if you think there's a > legal issue. > > In any case, you can't prevent people from creating mirrors even if you > would say its illegal. Moreover, having mirrors provided by the community > is way better than relying on one single entity (the PSF) for this. > (if we think "decentralized") Why is having community mirrors better than one managed by the PSF? Even with community mirrors the contents of PyPI are still controlled by the PSF, because they control the master server, there is not much decentralization in that respect. AFAIK the goal of this exercise is to improve the uptime of the PyPI download service as used by existing installation, MAL's proposal seems like an easy way to accomplish that with minimal effort. Ronald -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3567 bytes Desc: not available URL: From jcea at jcea.es Tue Jun 15 19:22:22 2010 From: jcea at jcea.es (Jesus Cea) Date: Tue, 15 Jun 2010 19:22:22 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C1768AF.9040606@egenix.com> References: <4C1768AF.9040606@egenix.com> Message-ID: <4C17B6CE.20209@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 15/06/10 13:49, M.-A. Lemburg wrote: > Server side: upload cronjobs > ---------------------------- > > Since the /simple index tree is currently being created dynamically, > we'd need to create static copies of it at regular intervals in order > to upload the content to the S3 bucket. This can easily be done using > tools such as wget or curl. > > Both the static copy of the /simple tree and the static files uploaded > to /packages then need to be uploaded or updated in the S3 bucket by a > cronjob running every 10-20 minutes. I don't comment about the convenience to migrate or not. But having to wait 20 minutes to deploy my just released package to my datacenter is a bit inconvenient to me :-). Would be nice to change PYPI code just to dump "simple" each time the database changes. Perusing the RSS, the load should be low and actually less demanding to CPU and database server (if you only update "simple" with the changes, not rebuilding everything each time). - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTBe2zplgi5GaxT1NAQLZIAP+JHe5dAVN27FTMD+gMzKntFEbEA3t9gqh gblEFPc5bigEAvfXxJTm2p+A0meeH7dVNT2akyYU4Cn+DmdV9+LkXY1c+beV7bpY BD2ROBvmFJ05FXPPkFD/La4Z0Bqb9JuZy7PV2kTQagzMsn3VjLJRDWt5K0kpIwcw Fntro0K/dRs= =G2bd -----END PGP SIGNATURE----- From ziade.tarek at gmail.com Tue Jun 15 19:24:29 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Tue, 15 Jun 2010 19:24:29 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> Message-ID: On Tue, Jun 15, 2010 at 7:15 PM, Ronald Oussoren wrote: > > On 15 Jun, 2010, at 19:02, Tarek Ziad? wrote: > >> On Tue, Jun 15, 2010 at 6:02 PM, M.-A. Lemburg wrote: >>> Alexis M?taireau wrote: >>>> Hello, >>>> >>>> Firstly, as Tarek said in another thread, I'm afraid this kill the PEP381 >>>> about making a mirroring infrastructure. >>>> Having a infrastructure hosted on a cloud platform may be confortable, and >>>> probably needed to have a 24/7 running system, but >>>> we need to take care of letting possible the creation of new public mirrors, >>>> outside from the Amazon (or whatever) cloud infrastructure. >>> >>> The proposal doesn't prevent that. However, please note that >>> setting up public mirrors not under PSF control has its own >>> set of (legal) problems, which the PSF hosted cloud setup avoids. >> >> Mirrors already exists out there, so unless you ban them (which would >> be a really bad idea) >> setting up a cloud will not fix any legal issue if you think there's a >> legal issue. >> >> In any case, you can't prevent people from creating mirrors even if you >> would say its illegal. Moreover, having mirrors provided by the community >> is way better than relying on one single entity (the PSF) for this. >> (if we think "decentralized") > > Why is having community mirrors better than one managed by the PSF? Because it's not controlled anymore by one single entity. For example, if something is broken in the system and need a human intervention, and the sysadmin people are not available, we get a downtime. Lots of mirrors back by more people in the community greatly reduces this problem > Even with community mirrors the contents of PyPI are still controlled by the PSF, because they control the master server, there is not much decentralization in that respect. Once the DNS is set to accept other servers, the PyPI 'main' server is just the master that gets the content first which is then replicated. So, yes, the PSF controls the DNS, but will not control the downtime/uptime issues anymore. > AFAIK the goal of this exercise is to improve the uptime of the PyPI download service as used by existing installation, MAL's proposal seems like an easy way to accomplish that with minimal effort. Again, mirrors already exists out there. and they are getting updated every day. We are not far from what we want. So after more thoughts, I really don't think the cloud thing will be a minimal effort. > > Ronald -- Tarek Ziad? | http://ziade.org From mal at egenix.com Tue Jun 15 19:34:42 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 15 Jun 2010 19:34:42 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> Message-ID: <4C17B9B2.10006@egenix.com> Tarek Ziad? wrote: > On Tue, Jun 15, 2010 at 6:02 PM, M.-A. Lemburg wrote: >> Alexis M?taireau wrote: >>> Hello, >>> >>> Firstly, as Tarek said in another thread, I'm afraid this kill the PEP381 >>> about making a mirroring infrastructure. >>> Having a infrastructure hosted on a cloud platform may be confortable, and >>> probably needed to have a 24/7 running system, but >>> we need to take care of letting possible the creation of new public mirrors, >>> outside from the Amazon (or whatever) cloud infrastructure. >> >> The proposal doesn't prevent that. However, please note that >> setting up public mirrors not under PSF control has its own >> set of (legal) problems, which the PSF hosted cloud setup avoids. > > Mirrors already exists out there, so unless you ban them (which would > be a really bad idea) > setting up a cloud will not fix any legal issue if you think there's a > legal issue. > > In any case, you can't prevent people from creating mirrors even if you > would say its illegal. Moreover, having mirrors provided by the community > is way better than relying on one single entity (the PSF) for this. > (if we think "decentralized") > > So I think it would be better to focus on PEP 381, and make those > existing mirrors comply with it. And maybe work on the legal issues > you've mentioned That can all happen in parallel. >>> On Tue, Jun 15, 2010 at 1:49 PM, M.-A. Lemburg wrote: >>>> >>>> PyPI is currently run from a single server hosted in The Netherlands >>>> (ximinez.python.org). This server is run by a very small team of sys >>>> admin. >>>> >>> >>> As Martin von L?wis said, this already exists. "a.mirrors.pypi.python.org >>> and b.mirrors.pypi.python.org are already there and could be used by >>> clients". Maybe Martin can you explain us (apologies if this is already done >>> somewhere) how things are working from now ? Is this possible to rely on the >>> existing work rather than using a cloud system ? What's the in place >>> infrastructure ? >> >> In order to use those two servers, you'd still need to implement >> the redirection changes or client side tool changes and, what's >> more important, you'd need to administer and monitor those servers >> 24/7 to achieve similar uptime. > > Not at all because the registered mirrors would be in the DNS round robin, > and the clients would just have to switch to another mirror if a mirror > is down. (that's explained in PEP 381) Someone would still have to provide the system administration for those servers and also make sure that the servers do actually provide up-to-date snapshots. DNS round-robin will help with finding the servers, not with the other aspects. Something the PEP should focus a bit more on is the freshness guarantee of the mirror data. It currently puts this important detail into the hands of the client software, so every package tool will have to find it's own way of determining whether to use a mirror or not. Another important feature missing from the PEP is data consistency. Since a client tool would only communicate with one mirror, it will ultimately have to trust the information on that server, including the MD5 sums. This makes it rather easy to manipulate data on the servers (not by the admins, but by hackers manipulating those servers). Having digitally signed packages, like you do on many Linux repository servers, would solve this issue, but also require a complete verification infrastructure on the client side. You don't need any of this with the cloud caching approach. > Such a decentralized system is far more reliable than any centralized > system, and won't cost anything to the PSF. We'll see :-) >> >> The latter is what the proposal is all about: we're outsourcing >> the administration and monitoring to a service provider. > > Having a better PyPI server is of course a good idea, don't get me wrong. > > But it doesn't really solve anything at this point. Obviously I have a different opinion, otherwise I wouldn't have written the proposal :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 15 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 33 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Tue Jun 15 19:43:31 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 15 Jun 2010 19:43:31 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> Message-ID: <4C17BBC3.3050205@egenix.com> Tarek Ziad? wrote: > On Tue, Jun 15, 2010 at 7:15 PM, Ronald Oussoren wrote: >> >> On 15 Jun, 2010, at 19:02, Tarek Ziad? wrote: >> >>> On Tue, Jun 15, 2010 at 6:02 PM, M.-A. Lemburg wrote: >>>> Alexis M?taireau wrote: >>>>> Hello, >>>>> >>>>> Firstly, as Tarek said in another thread, I'm afraid this kill the PEP381 >>>>> about making a mirroring infrastructure. >>>>> Having a infrastructure hosted on a cloud platform may be confortable, and >>>>> probably needed to have a 24/7 running system, but >>>>> we need to take care of letting possible the creation of new public mirrors, >>>>> outside from the Amazon (or whatever) cloud infrastructure. >>>> >>>> The proposal doesn't prevent that. However, please note that >>>> setting up public mirrors not under PSF control has its own >>>> set of (legal) problems, which the PSF hosted cloud setup avoids. >>> >>> Mirrors already exists out there, so unless you ban them (which would >>> be a really bad idea) >>> setting up a cloud will not fix any legal issue if you think there's a >>> legal issue. >>> >>> In any case, you can't prevent people from creating mirrors even if you >>> would say its illegal. Moreover, having mirrors provided by the community >>> is way better than relying on one single entity (the PSF) for this. >>> (if we think "decentralized") >> >> Why is having community mirrors better than one managed by the PSF? > > Because it's not controlled anymore by one single entity. For example, > if something is broken in the system > and need a human intervention, and the sysadmin people are not > available, we get a downtime. I'm not sure I understand: if the PyPI server goes down, the data will still be readily available on Amazon S3 and Cloudfront caches - the cronjobs copy over the PyPI server content to S3 and Cloudfront serves it up from there. And if Cloudfront or S3 goes down, client tools could still try to access the PyPI server. (I'll add a note about that to the proposal.) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 15 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 33 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From jcea at jcea.es Tue Jun 15 19:44:05 2010 From: jcea at jcea.es (Jesus Cea) Date: Tue, 15 Jun 2010 19:44:05 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> Message-ID: <4C17BBE5.4010901@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 15/06/10 14:20, Michael Crute wrote: > What about a set of volunteer mirrors of PyPi similar to the way CPAN > and Linux distributions handle this problem. pypi.python.org? That > approach eliminates any cost for the PSF and might ultimately result > in better reliability. With the volunteer mirror system you would > still statically generate the files and just make them available for > rsync then setup a page to allow mirrors to register (see CPAN). If > you take this approach I would be happy to donate a mirror to the > pool. I would rather prefer this approach, actually. With the following changes in current code: 1. setuptools & friends: Support for retrying several mirrors if first try fails. 2. Packages MUST be digitally signed. Ideally by the owner, but at least by PYPI central node (current pypi server). That way, a "rogue" mirror can't distribute trojans. 3. Trusting the stats is not possible :(, if there are "rogue" mirrors. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTBe75Zlgi5GaxT1NAQLnawP+J4Cb6ywGCpIEOsD1L4mbUTfnWnh9X59T zxTjxbEdCaZrbLgY2KuAAoAdSocmrQFhX/zfeMxEpoilnLH2mZknM+Bb6icNAzbR JFYDmfu7QPhUjPrNgFlQhXQsuuMnpNEzTv3yINmjKZg2OYwU7BhbolFKrAGF+b+5 kKmnwWjTju0= =rQh4 -----END PGP SIGNATURE----- From mal at egenix.com Tue Jun 15 19:45:28 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 15 Jun 2010 19:45:28 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17B6CE.20209@jcea.es> References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> Message-ID: <4C17BC38.6090208@egenix.com> Jesus Cea wrote: > On 15/06/10 13:49, M.-A. Lemburg wrote: >> Server side: upload cronjobs >> ---------------------------- > >> Since the /simple index tree is currently being created dynamically, >> we'd need to create static copies of it at regular intervals in order >> to upload the content to the S3 bucket. This can easily be done using >> tools such as wget or curl. > >> Both the static copy of the /simple tree and the static files uploaded >> to /packages then need to be uploaded or updated in the S3 bucket by a >> cronjob running every 10-20 minutes. > > I don't comment about the convenience to migrate or not. > > But having to wait 20 minutes to deploy my just released package to my > datacenter is a bit inconvenient to me :-). > > Would be nice to change PYPI code just to dump "simple" each time the > database changes. Perusing the RSS, the load should be low and actually > less demanding to CPU and database server (if you only update "simple" > with the changes, not rebuilding everything each time). I'll leave that for a version 2.0 of the cloud idea :-) My main interest now is getting something done with only requiring minimal changes to the PyPI software. Note that with community servers that only mirror once a day, you'd have to wait up to a whole day for your package updates to become visible worldwide. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 15 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 33 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From jcea at jcea.es Tue Jun 15 19:53:19 2010 From: jcea at jcea.es (Jesus Cea) Date: Tue, 15 Jun 2010 19:53:19 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <201006160033.46095.steve@pearwood.info> References: <4C1768AF.9040606@egenix.com> <201006160033.46095.steve@pearwood.info> Message-ID: <4C17BE0F.5090509@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 15/06/10 16:33, Steven D'Aprano wrote: > For example, if a single edge server in (say) Australia goes down, > Amazon might not count it as an outage for the purpose of calculating > their 99.99% reliability since the system as a whole is still up, but > conceivably Australian users might see an outage (or at least a > slow-down). With N servers, I'd expect N times the number of individual > outages, with Amazon presumably only counting it as "system down" if > all N servers go down at the same time. I don't know, but if I were Amazon, I would (automatically) update the DNS to serve Australia users from any other edge server :). - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTBe+D5lgi5GaxT1NAQJtfAP6Azk2UGRQS7tPpxX9AcHQA9ALRXubcoHQ cleDsSxDe0ghoeSVtGMFJYN3KTlMknc9sPmxwBy2dR8tTlxQh0ytHQsQEqokZMsC jAbtYcaPgVG4gPo19xHg81elTkRAVhflW7NbV8AmlEIPXsV1LP92DH5wHPMaWyws 4nynJKYCBlY= =g4k1 -----END PGP SIGNATURE----- From jcea at jcea.es Tue Jun 15 20:21:41 2010 From: jcea at jcea.es (Jesus Cea) Date: Tue, 15 Jun 2010 20:21:41 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17BC38.6090208@egenix.com> References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> Message-ID: <4C17C4B5.3000801@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 15/06/10 19:45, M.-A. Lemburg wrote: > Note that with community servers that only mirror once a day, > you'd have to wait up to a whole day for your package updates > to become visible worldwide. But TODAY mirror use is voluntary and per-user. That is, you use a mirror because you want, not because pypi is pushing you around transparently. I don't use mirrors so far, because pypi inestability hasn't hit me so far, and because I don't "trust" mirrors (see next paragraph). I read pep 381 long time ago and I don't remember how/when a mirror would update, but I do remember it doesn't mandate digital signatures (signed by pypi central node, verified by setuptools&friends). That is a big gap, in my opinion. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTBfEtZlgi5GaxT1NAQKuKAP/YUTRh9GXAlEa8X5trvnUsWmS6KRgxSIz jxB35L9WwWKR0FMzeay1ThvOoiz5aXlrqGaBbEZiPjr3UuWMXRf+WSh2RoylEher f5i8pxwwBwopVCKbRx07nWsroJUH9oIFYmTY/IIidqjh8UNL+FBBRCSRuFyay/H/ W/zxzjAFxuc= =UVuI -----END PGP SIGNATURE----- From ziade.tarek at gmail.com Tue Jun 15 20:38:58 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Tue, 15 Jun 2010 20:38:58 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17B9B2.10006@egenix.com> References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> <4C17B9B2.10006@egenix.com> Message-ID: On Tue, Jun 15, 2010 at 7:34 PM, M.-A. Lemburg wrote: [..] >> So I think it would be better to focus on PEP 381, and make those >> existing mirrors comply with it. And maybe work on the legal issues >> you've mentioned > > That can all happen in parallel. I really doubt it. You have come with a cloud proposal and want it to be funded by the PSF. Your proposal is basically a proprietary mirroring system, and it competes with the mirroring protocol we wanted to build, based on the existing mirrors the community has. So far I don't see any advantage in a cloud-based mirror managed by the PSF, compared to a round of community mirrors. Given the lack of time and resources we had to finish the work, this means that if your proposal is accepted, it will be done whereas PEP 381 will stay as it is today. So if you want this to happen in parralell, a funding should also be granted to build the implementation of PEP 381 (in z3c.pypimirror I guess) [..] >> Not at all because the registered mirrors would be in the DNS round robin, >> and the clients would just have to switch to another mirror if a mirror >> is down. (that's explained in PEP 381) > > Someone would still have to provide the system administration for > those servers and also make sure that the servers do actually provide > up-to-date snapshots. DNS round-robin will help with finding the > servers, not with the other aspects. > > Something the PEP should focus a bit more on is the freshness > guarantee of the mirror data. It currently puts this > important detail into the hands of the client software, > so every package tool will have to find it's own way of > determining whether to use a mirror or not. > > Another important feature missing from the PEP is data consistency. > Since a client tool would only communicate with one mirror, it > will ultimately have to trust the information on that server, > including the MD5 sums. This makes it rather easy to manipulate > data on the servers (not by the admins, but by hackers manipulating > those servers). Your PyPI cloud infrastructure be hacked as well. The mirrors are trusted, because they are registered manually and they are managed by people in the community, we trust. [..] >> Such a decentralized system is far more reliable than any centralized >> system, and won't cost anything to the PSF. > > We'll see :-) Hehe, not sure what you mean here. Did the PSF voted yes on your proposal already ? ;) >>> >>> The latter is what the proposal is all about: we're outsourcing >>> the administration and monitoring to a service provider. >> >> Having a better PyPI server is of course a good idea, don't get me wrong. >> >> But it doesn't really solve anything at this point. > > Obviously I have a different opinion, otherwise I wouldn't have > written the proposal :-) Well technically the problem is already solved by the existing mirrors we have in the community: when PyPI is down, other servers can take the relay. I have no doubt you can enhance the PyPI main server and make its uptime approaching 100% by putting money and time. But having a documented protocol and a library anyone who has a spare server can use to provide a mirror will always beat your cloud system for the reasons I've already mentioned. Putting all the eggs in the same basket (PSF+Amazon?) can't be as reliable as a distributed networks of mirrors Regards Tarek -- Tarek Ziad? | http://ziade.org From ziade.tarek at gmail.com Tue Jun 15 20:52:14 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Tue, 15 Jun 2010 20:52:14 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17C4B5.3000801@jcea.es> References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> <4C17C4B5.3000801@jcea.es> Message-ID: On Tue, Jun 15, 2010 at 8:21 PM, Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 15/06/10 19:45, M.-A. Lemburg wrote: >> Note that with community servers that only mirror once a day, >> you'd have to wait up to a whole day for your package updates >> to become visible worldwide. > > But TODAY mirror use is voluntary and per-user. That is, you use a > mirror because you want, not because pypi is pushing you around > transparently. I don't use mirrors so far, because pypi inestability > hasn't hit me so far, and because I don't "trust" mirrors (see next > paragraph). > > I read pep 381 long time ago and I don't remember how/when a mirror > would update, but I do remember it doesn't mandate digital signatures > (signed by pypi central node, verified by setuptools&friends). That is a > big gap, in my opinion. You don't trust mirrors right now, but if they are listed at PyPI as official mirrors, that are managed by people that can be trusted as much as you can trust the PyPI syadmin for instance, and much much more than the packages you can download at PyPI. Do you trust the package you are installing more than an "official" mirror ? if so, why ? Anyone can upload a package at PyPI with os.system('rm -rf /') in its setup.py... Regards Tarek -- Tarek Ziad? | http://ziade.org From ziade.tarek at gmail.com Tue Jun 15 20:47:52 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Tue, 15 Jun 2010 20:47:52 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17BBC3.3050205@egenix.com> References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> <4C17BBC3.3050205@egenix.com> Message-ID: On Tue, Jun 15, 2010 at 7:43 PM, M.-A. Lemburg wrote: > Tarek Ziad? wrote: >> On Tue, Jun 15, 2010 at 7:15 PM, Ronald Oussoren wrote: >>> >>> On 15 Jun, 2010, at 19:02, Tarek Ziad? wrote: >>> >>>> On Tue, Jun 15, 2010 at 6:02 PM, M.-A. Lemburg wrote: >>>>> Alexis M?taireau wrote: >>>>>> Hello, >>>>>> >>>>>> Firstly, as Tarek said in another thread, I'm afraid this kill the PEP381 >>>>>> about making a mirroring infrastructure. >>>>>> Having a infrastructure hosted on a cloud platform may be confortable, and >>>>>> probably needed to have a 24/7 running system, but >>>>>> we need to take care of letting possible the creation of new public mirrors, >>>>>> outside from the Amazon (or whatever) cloud infrastructure. >>>>> >>>>> The proposal doesn't prevent that. However, please note that >>>>> setting up public mirrors not under PSF control has its own >>>>> set of (legal) problems, which the PSF hosted cloud setup avoids. >>>> >>>> Mirrors already exists out there, so unless you ban them (which would >>>> be a really bad idea) >>>> setting up a cloud will not fix any legal issue if you think there's a >>>> legal issue. >>>> >>>> In any case, you can't prevent people from creating mirrors even if you >>>> would say its illegal. Moreover, having mirrors provided by the community >>>> is way better than relying on one single entity (the PSF) for this. >>>> (if we think "decentralized") >>> >>> Why is having community mirrors better than one managed by the PSF? >> >> Because it's not controlled anymore by one single entity. For example, >> if something is broken in the system >> and need a human intervention, and the sysadmin people are not >> available, we get a downtime. > > I'm not sure I understand: if the PyPI server goes down, the > data will still be readily available on Amazon S3 and Cloudfront > caches - the cronjobs copy over the PyPI server content to S3 > and Cloudfront serves it up from there. > > And if Cloudfront or S3 goes down, client tools could still > try to access the PyPI server. (I'll add a note about that to > the proposal.) This can't beat a distributed network of mirrors that are not depending on a single provider like Amazon. We have suffered from this at bitbucket.org as a matter of fact: Amazon was having problems, so bitbucket was slow and sometimes down. If Bitbucket had back then a distributed network of mirrors hosted at different providers, that wouldn't have happened. What I have learned lately in this area is that a lot of cheap servers spreaded all over the world in different datacenters is more reliable. And we happen to have this network already: lots of people will host a PyPI mirror as soon as it's easy to set one imho. Regards Tarek -- Tarek Ziad? | http://ziade.org From martin at v.loewis.de Tue Jun 15 21:02:45 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 15 Jun 2010 21:02:45 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> <4C176BD4.3080909@egenix.com> Message-ID: <4C17CE55.5000601@v.loewis.de> > Hi Martin, Notice that you actually replied to Marc-Andre Lemburg. > You should be able to get access to the Python sandbox repository and > add your project there: > > http://svn.python.org/projects/sandbox/trunk/ > > If that's not an option, I'd suggest you have a look at one of the > other public repo sites such as launchpad. > > Right now I'm working with Tarek Ziade on a clone of the PyPi repository > sourcecode on bitbucket, that way, it allowed tarek to keep an eye the > modifications I made on the source code since double checking any > changes is very important, as you said, for this type of project. Most certainly. However, before I add the code to PyPI, I'd review it, anyway, so no worries. Just be prepared to provide the code as separately-reviewable chunks of modifications. > That's why I was working to implement a better unit testing coverage. I > would like to modernize a little bit the source code of pypi cause i > think in the future there will some major structural changes of the > code. Having a great test coverage will allow us to change the code and > be less afraid of making mistakes. Alternatively, you could start submitting patches. > Maybe it would be easier to switch to the official mercurial repository > (hg.python.org ), it would allow a better > collaboration between everybody who would like to contribute. I'm not quite sure why that would be. You still couldn't write to the repository, could you? So what would be the difference? Regards, Martin From martin at v.loewis.de Tue Jun 15 21:46:06 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 15 Jun 2010 21:46:06 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C1768AF.9040606@egenix.com> References: <4C1768AF.9040606@egenix.com> Message-ID: <4C17D87E.2050609@v.loewis.de> > PyPI itself has in recent months been mostly maintained by one > developer: Martin von Loewis. Projects are underway to enhance PyPI > in various ways, including a proposal to add external mirroring (PEP > 381), but these are all far from being finalized or implemented. That's not at all accurate: PEP 381 is almost completely implemented in the mirroring tools. Client-side support is missing, but isn't strictly necessary as users could manually point their setuptools installation to a mirror. > While the /simple package listing is currently dynamically created > from the database in real-time, this is not really needed for normal > operation. A static copy created every 10-20 minutes would provide the > same level of service in much the same way. For normal operation (i.e. on the master copy), this would be really insufficient. Users expect, in automated build processes, that the packages they upload are available for *immediate* download. > Under the proposal the static information stored in PyPI > (meta-information as well as package download files and documentation) > is moved to a content delivery network (CDN). There is a good chance that, before that proposal is implemented, the PEP 381 implementation is completed. > At the same intervals, another script will scan the package and > documentation files under /packages for updates and upload any changes > to the CDN for neartime availability. Not sure why you wouldn't push every change immediately to the CDN, though. > Cloudfront itself has been around since Nov 2008. Please add that Amazon considers Cloudfront as a beta service. > The pypi.python.org domain would then have to be setup to map to > multiple IP addresses via DNS round-robin, one entry for each > redirection server, e.g. > > pypi.python.org. IN A 123.123.123.1 > pypi.python.org. IN A 123.123.123.1 > pypi.python.org. IN A 123.123.123.3 > pypi.python.org. IN A 123.123.123.4 I don't think this works if one of the servers fails (or, worse, produces a hanging connection). What piece of software would implement the fallback to the next machine? Regards, Martin From martin at v.loewis.de Tue Jun 15 21:48:38 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 15 Jun 2010 21:48:38 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17BBE5.4010901@jcea.es> References: <4C1768AF.9040606@egenix.com> <4C17BBE5.4010901@jcea.es> Message-ID: <4C17D916.8030502@v.loewis.de> > 1. setuptools& friends: Support for retrying several mirrors if first > try fails. That's the part that still needs to be implemented. > 2. Packages MUST be digitally signed. Ideally by the owner, but at least > by PYPI central node (current pypi server). That way, a "rogue" mirror > can't distribute trojans. That is already part of the mirroring infrastructure (although still not explained in PEP 381 yet). > 3. Trusting the stats is not possible :(, if there are "rogue" mirrors. That's true. Regards, Martin From martin at v.loewis.de Tue Jun 15 21:52:48 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 15 Jun 2010 21:52:48 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> Message-ID: <4C17DA10.8000508@v.loewis.de> > As Martin von L?wis said, this already exists. > "a.mirrors.pypi.python.org and > b.mirrors.pypi.python.org are already > there and could be used by clients". Maybe Martin can you explain us > (apologies if this is already done somewhere) how things are working > from now ? Is this possible to rely on the existing work rather than > using a cloud system ? What's the in place infrastructure ? Primarily, client support is missing: i.e. distutils won't fall back from one mirror to the next. As a minor issue, the download stats collection is also not implemented yet. As for timeliness: it would be reasonable to setup the mirrors so that they won't be behind more than one minute (by polling for changes every minute). On the one hand, some people claim that this would be much too frequent, and that 10 minutes or more would be frequent enough. Others claim that changes should be propagated instantaneously. This would also be possible (given that the master knows the list of all mirrors), but would need to be implemented as well. Regards, Martin From martin at v.loewis.de Tue Jun 15 22:02:45 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 15 Jun 2010 22:02:45 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17BC38.6090208@egenix.com> References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> Message-ID: <4C17DC65.7010707@v.loewis.de> > Note that with community servers that only mirror once a day, > you'd have to wait up to a whole day for your package updates > to become visible worldwide. However, the community mirrors would mirror every ten minutes, or more often. Implementing a push model would be fairly simple. Regards, Martin From martin at v.loewis.de Tue Jun 15 22:04:55 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 15 Jun 2010 22:04:55 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17C4B5.3000801@jcea.es> References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> <4C17C4B5.3000801@jcea.es> Message-ID: <4C17DCE7.6090802@v.loewis.de> > I read pep 381 long time ago and I don't remember how/when a mirror > would update, but I do remember it doesn't mandate digital signatures > (signed by pypi central node, verified by setuptools&friends). That is a > big gap, in my opinion. The PEP doesn't explain the digital signing that is going on in mirroring. See http://mail.python.org/pipermail/catalog-sig/2009-March/002018.html This is fully implemented (except that client would need to verify the signatures, and except key rollover hasn't happened yet). Regards, Martin From mal at egenix.com Tue Jun 15 22:14:02 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 15 Jun 2010 22:14:02 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> <4C17B9B2.10006@egenix.com> Message-ID: <4C17DF0A.3090008@egenix.com> Tarek Ziad? wrote: > On Tue, Jun 15, 2010 at 7:34 PM, M.-A. Lemburg wrote: > [..] >>> So I think it would be better to focus on PEP 381, and make those >>> existing mirrors comply with it. And maybe work on the legal issues >>> you've mentioned >> >> That can all happen in parallel. > > I really doubt it. > > You have come with a cloud proposal and want it to be funded by the PSF. > > Your proposal is basically a proprietary mirroring system, and it competes > with the mirroring protocol we wanted to build, based on the existing > mirrors the community has. I'm not trying to compete with your mirror PEP, just trying to solve a problem. > So far I don't see any advantage in a cloud-based mirror managed by the PSF, > compared to a round of community mirrors. We can have it up and running in a few days and it doesn't require any changes to existing client tools, that's the main argument. The proposal solves a problem we have now and doesn't get in the way of PEP 381. Instead it buys it more time to get finalized, implemented and deployed on the client side. If you need funding for PEP 381, please write a proposal. This would then also need to address the problem of added administration overhead (screening mirror server providers, getting them registered or removed, monitored and verified for correct operation, etc.). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 15 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 33 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Tue Jun 15 22:33:15 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 15 Jun 2010 22:33:15 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17DCE7.6090802@v.loewis.de> References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> <4C17C4B5.3000801@jcea.es> <4C17DCE7.6090802@v.loewis.de> Message-ID: <4C17E38B.7050103@egenix.com> "Martin v. L?wis" wrote: >> I read pep 381 long time ago and I don't remember how/when a mirror >> would update, but I do remember it doesn't mandate digital signatures >> (signed by pypi central node, verified by setuptools&friends). That is a >> big gap, in my opinion. > > The PEP doesn't explain the digital signing that is going on in > mirroring. See > > http://mail.python.org/pipermail/catalog-sig/2009-March/002018.html > > This is fully implemented (except that client would need to verify the > signatures, and except key rollover hasn't happened yet). That's good to know, but I think some parts of this will have to be discussed some more: """ /serverkey Public DSA key of the server, in the PEM format as generated by "openssl dsa -pubout" (i.e. RFC 3280 SubjectPublicKeyInfo, with the algorithm 1.3.14.3.2.12). This URL must *not* be mirrored, and clients must fetch the official serverkey from PyPI directly. The serverkey """ * How will clients be sure that they are getting the correct key ? * What would a client do if the PyPI server is down ? * How would clients protect their local cached copy of the server key against manipulation ? * Without access to OpenSSL and M2Crypto, how would clients apply the check ? Also, please consider that access to crypto code is restricted in some parts of the world. Users in those countries would have to be able to turn off verification. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 15 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 33 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ziade.tarek at gmail.com Tue Jun 15 22:46:46 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Tue, 15 Jun 2010 22:46:46 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17DF0A.3090008@egenix.com> References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> <4C17B9B2.10006@egenix.com> <4C17DF0A.3090008@egenix.com> Message-ID: On Tue, Jun 15, 2010 at 10:14 PM, M.-A. Lemburg wrote: > Tarek Ziad? wrote: >> On Tue, Jun 15, 2010 at 7:34 PM, M.-A. Lemburg wrote: >> [..] >>>> So I think it would be better to focus on PEP 381, and make those >>>> existing mirrors comply with it. And maybe work on the legal issues >>>> you've mentioned >>> >>> That can all happen in parallel. >> >> I really doubt it. >> >> You have come with a cloud proposal and want it to be funded by the PSF. >> >> Your proposal is basically a proprietary mirroring system, and it competes >> with the mirroring protocol we wanted to build, based on the existing >> mirrors the community has. > > I'm not trying to compete with your mirror PEP, just trying > to solve a problem. We are trying to solve the same problem, aren't we ? That is : avoiding any downtime when PyPI is used by setuptools and derived tools. So if you solve this problem by implementing a cloud system backed by a PSF funding, and managed by the PSF, and if you claim that there will be no more downtime, then PEP 381 will be useless. I am just arguing that I don't think it's the best solution, compared to what was started e.g. a community network of mirrors. > >> So far I don't see any advantage in a cloud-based mirror managed by the PSF, >> compared to a round of community mirrors. > > We can have it up and running in a few days and it doesn't > require any changes to existing client tools, that's the main > argument. The global uptime of PyPI in this last year was probably around 99.9%, so I don't think we are in such a rush to set up something in any case. The problem occured in the past, and was fixed in a matter of hours. every. time. It's just that everytime it happens it makes us all want to improve the system. So why don't we implement the best solution ? Maybe we could use a wiki page and work on a synthetic overview of the pros and cons. > > The proposal solves a problem we have now and doesn't get in the > way of PEP 381. Instead it buys it more time to get finalized, > implemented and deployed on the client side. > > If you need funding for PEP 381, please write a proposal. I won't. I think we should decide here, all together, what is the best technical solution to set up mirrors (e.g. cloud vs community) Then, ask for its funding from the PSF. > This would then also need to address the problem of added administration > overhead (screening mirror server providers, getting them registered or > removed, monitored and verified for correct operation, etc.). This overhead is minimum compared to an in-house administration of a full mirroring system based on a cloud imho. Regards Tarek -- Tarek Ziad? | http://ziade.org From martin at v.loewis.de Tue Jun 15 22:48:00 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 15 Jun 2010 22:48:00 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17E38B.7050103@egenix.com> References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> <4C17C4B5.3000801@jcea.es> <4C17DCE7.6090802@v.loewis.de> <4C17E38B.7050103@egenix.com> Message-ID: <4C17E700.1090107@v.loewis.de> > * How will clients be sure that they are getting the correct key ? They should initially download it from the master server (when that is online) and cache it. > * What would a client do if the PyPI server is down ? Isn't that straight-forward? > * How would clients protect their local cached copy of the > server key against manipulation ? Using standard operating system access control. > * Without access to OpenSSL and M2Crypto, how would clients > apply the check ? distribute could include a pure-python checking function. The API was specifically designed to make this possible. > Also, please consider that access to crypto code is restricted > in some parts of the world. Users in those countries would have > to be able to turn off verification. Most certainly. The simplest approach would be to turn off mirror usage in the first place. If you do use mirrors, it is then a matter of your own risk evaluation whether you want the mirror result verified. Notice that none of this protects against the master server being tempered; the only way to protect against that is to use the PGP signing feature in PyPI (which, of course, package authors must use). Regards, Martin From mal at egenix.com Tue Jun 15 23:03:29 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 15 Jun 2010 23:03:29 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17D87E.2050609@v.loewis.de> References: <4C1768AF.9040606@egenix.com> <4C17D87E.2050609@v.loewis.de> Message-ID: <4C17EAA1.5090609@egenix.com> "Martin v. L?wis" wrote: >> PyPI itself has in recent months been mostly maintained by one >> developer: Martin von Loewis. Projects are underway to enhance PyPI >> in various ways, including a proposal to add external mirroring (PEP >> 381), but these are all far from being finalized or implemented. > > That's not at all accurate: PEP 381 is almost completely implemented > in the mirroring tools. Which parts of PEP 381 are implemented ? > Client-side support is missing, but isn't > strictly necessary as users could manually point their setuptools > installation to a mirror. That's not a good argument. Users like setuptools because they can run: "easy_install stuff" and let it do whatever it needs to do. It's important not to require changes on the client side. >> While the /simple package listing is currently dynamically created >> from the database in real-time, this is not really needed for normal >> operation. A static copy created every 10-20 minutes would provide the >> same level of service in much the same way. > > For normal operation (i.e. on the master copy), this would be really > insufficient. Users expect, in automated build processes, that the > packages they upload are available for *immediate* download. Power users and developers will probably want that, but those can hook up to the PyPI server directly if they have such a need. For the majority, waiting 10-20 minutes should be fine. Note that the push idea is part of the plan, but won't happen in the initial rollout. >> Under the proposal the static information stored in PyPI >> (meta-information as well as package download files and documentation) >> is moved to a content delivery network (CDN). > > There is a good chance that, before that proposal is implemented, > the PEP 381 implementation is completed. Including getting all client side package tools updated and deployed to the existing users ? >> At the same intervals, another script will scan the package and >> documentation files under /packages for updates and upload any changes >> to the CDN for neartime availability. > > Not sure why you wouldn't push every change immediately to the CDN, though. The proposal wants to do without changing PyPI code where possible. This is planned for a later release. If this can be had without any major changes, we can also add it to phase one. >> Cloudfront itself has been around since Nov 2008. > > Please add that Amazon considers Cloudfront as a beta service. I don't think that makes a difference. The "beta" term is a web 2.0 marketing term, nothing more. But I'll add it anyway. >> The pypi.python.org domain would then have to be setup to map to >> multiple IP addresses via DNS round-robin, one entry for each >> redirection server, e.g. >> >> pypi.python.org. IN A 123.123.123.1 >> pypi.python.org. IN A 123.123.123.2 >> pypi.python.org. IN A 123.123.123.3 >> pypi.python.org. IN A 123.123.123.4 > > I don't think this works if one of the servers fails (or, worse, > produces a hanging connection). What piece of software would implement > the fallback to the next machine? AFAIK, the package tools don't currently implement any kind of fail- over. While this would be good to have and provide a better user experience, it's not required. The user would just need to restart the command and then get a new server IP address to try - just like you do in a web browser if a page doesn't load. That's still a lot better than not being able to download anything at all. The alternative would be a proxy setup, which then again introduces a single point of failure (unless you setup a HA cluster). The mirror PEP shares this problem with the cloud proposal. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 15 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 33 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ziade.tarek at gmail.com Tue Jun 15 23:09:22 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Tue, 15 Jun 2010 23:09:22 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C17CE55.5000601@v.loewis.de> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> <4C176BD4.3080909@egenix.com> <4C17CE55.5000601@v.loewis.de> Message-ID: On Tue, Jun 15, 2010 at 9:02 PM, "Martin v. L?wis" wrote: [..] > Alternatively, you could start submitting patches. Some work Matthieu did is already integrated via the branch I worked on for PEP 345. And we were considering using the same workflow since I can commit. Of course, after a while, I wanted to propose Matthieu as a PyPI commiter. >> Maybe it would be easier to switch to the official mercurial repository >> (hg.python.org ), it would allow a better >> collaboration between everybody who would like to contribute. > > I'm not quite sure why that would be. You still couldn't write to the > repository, could you? So what would be the difference? Not answering instead of Matthieu, but with a DVCS he will be able to wrote to the repository, and have the same privileges any other commiters have, as long a offer are great. As a maintainer of the PyPI project, it makes your workflow simpler, - contributors can clone the repo, change the code and ask you for a pull - you can pull changes by direct hg commands, and merge them > > Regards, > Martin > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > -- Tarek Ziad? | http://ziade.org From ziade.tarek at gmail.com Tue Jun 15 23:13:52 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Tue, 15 Jun 2010 23:13:52 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> <4C176BD4.3080909@egenix.com> <4C17CE55.5000601@v.loewis.de> Message-ID: 2010/6/15 Tarek Ziad? : > On Tue, Jun 15, 2010 at 9:02 PM, "Martin v. L?wis" wrote: > [..] >> Alternatively, you could start submitting patches. > > Some work Matthieu did is already integrated via the branch I worked > on for PEP 345. > And we were considering using the same workflow since I can commit. > Of course, after a while, I wanted to propose Matthieu as a PyPI commiter. > >>> Maybe it would be easier to switch to the official mercurial repository >>> (hg.python.org ), it would allow a better >>> collaboration between everybody who would like to contribute. >> >> I'm not quite sure why that would be. You still couldn't write to the >> repository, could you? So what would be the difference? Ooops, sent it to early. Please scratch my previous answer, here's the finished one ;) Not answering instead of Mathieu, but with a DVCS he will be able to write to the repository, and have the same privileges and changeset granularity any other commiter has, as long you or any direct commiter pull his changes (after a review you can do on your side with simple hg commands) So the difference is the same I guess, than the difference for Python itself, which switch to Mercurial. Regards Tarek From martin at v.loewis.de Tue Jun 15 23:24:23 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 15 Jun 2010 23:24:23 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17EAA1.5090609@egenix.com> References: <4C1768AF.9040606@egenix.com> <4C17D87E.2050609@v.loewis.de> <4C17EAA1.5090609@egenix.com> Message-ID: <4C17EF87.7090302@v.loewis.de> >> That's not at all accurate: PEP 381 is almost completely implemented >> in the mirroring tools. > > Which parts of PEP 381 are implemented ? For the mirrors themselves: everything except for the propagation of download counters. > It's important not to require changes on the client side. I disagree. It's the only way to provide reliably protection against server failures. The client code must initiate the fallback, e.g. after a timeout. >> For normal operation (i.e. on the master copy), this would be really >> insufficient. Users expect, in automated build processes, that the >> packages they upload are available for *immediate* download. > > Power users and developers will probably want that, but those > can hook up to the PyPI server directly if they have such a > need. Under your proposal, how precisely would they do that? >> There is a good chance that, before that proposal is implemented, >> the PEP 381 implementation is completed. > > Including getting all client side package tools updated and > deployed to the existing users ? That depends on how long the proposal requires to get implemented. However, I don't think it is necessary to have the tools updated and deployed to all existing users. Instead, it is sufficient that people who worry about server outages get the tools deployed; for this, the answer is "yes". >> Not sure why you wouldn't push every change immediately to the CDN, though. > > The proposal wants to do without changing PyPI code where > possible. -1000. What's the rationale for not modifying PyPI code? Are you, by any chance, proposing that this CDN propagation tool does a full PyPI traversal every 20 minutes??? > While this would be good to have and provide a better > user experience, it's not required. The user would just need > to restart the command and then get a new server IP address > to try - just like you do in a web browser if a page doesn't > load. That's still a lot better than not being able to download > anything at all. I think this depends a lot on the client setup. For example, on my machine, I don't get a different IP address for www.google.com each time, using the DNS server in my Fritzbox router. > The mirror PEP shares this problem with the cloud proposal. Except that it gives the client the explicit choice which copy to get the data from. Regards, Martin From mal at egenix.com Tue Jun 15 23:24:51 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 15 Jun 2010 23:24:51 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> <4C17B9B2.10006@egenix.com> <4C17DF0A.3090008@egenix.com> Message-ID: <4C17EFA3.6050204@egenix.com> Tarek Ziad? wrote: > On Tue, Jun 15, 2010 at 10:14 PM, M.-A. Lemburg wrote: >> >> I'm not trying to compete with your mirror PEP, just trying >> to solve a problem. > > We are trying to solve the same problem, aren't we ? Sure, but the intent is not to compete with the PEP. Even with the cloud proposal implemented, we can still have a mirror setup like the one you propose. > That is : avoiding any downtime when PyPI is used by setuptools and > derived tools. > > So if you solve this problem by implementing a cloud system backed by > a PSF funding, > and managed by the PSF, and if you claim that there will be no more > downtime, then PEP 381 > will be useless. No, not at all. The PSF would not be the only user of the PEP and the client tools. If all client tools implement the things you suggested in the PEP, we'd have a lot more possibilities. > I am just arguing that I don't think it's the best solution, compared to what > was started e.g. a community network of mirrors. I've heard you, but still disagree. I think we'll just have to leave it at that. >>> So far I don't see any advantage in a cloud-based mirror managed by the PSF, >>> compared to a round of community mirrors. >> >> We can have it up and running in a few days and it doesn't >> require any changes to existing client tools, that's the main >> argument. > > The global uptime of PyPI in this last year was probably around 99.9%, > so I don't think we are in such a rush to set up something in any case. > > The problem occured in the past, and was fixed in a matter of hours. > every. time. > > It's just that everytime it happens it makes us all want to improve the system. > > So why don't we implement the best solution ? Maybe we could use a wiki page > and work on a synthetic overview of the pros and cons. Again: I don't want to compete against the PEP. I'm looking for a solution that's easy to implement and doesn't get in the way. That's all. Nothing more. If you can come up with a solution that's ready in a month or two, I'll happily wait. >> The proposal solves a problem we have now and doesn't get in the >> way of PEP 381. Instead it buys it more time to get finalized, >> implemented and deployed on the client side. >> >> If you need funding for PEP 381, please write a proposal. > > I won't. > > I think we should decide here, all together, what is the best technical solution > to set up mirrors (e.g. cloud vs community) > > Then, ask for its funding from the PSF. > > >> This would then also need to address the problem of added administration >> overhead (screening mirror server providers, getting them registered or >> removed, monitored and verified for correct operation, etc.). > > This overhead is minimum compared to an in-house administration of a > full mirroring system based on a cloud imho. YMMV, but my experience with these systems is that they cause a lot less overhead than anything you administer yourself. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 15 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 33 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Tue Jun 15 23:26:58 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 15 Jun 2010 23:26:58 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17E700.1090107@v.loewis.de> References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> <4C17C4B5.3000801@jcea.es> <4C17DCE7.6090802@v.loewis.de> <4C17E38B.7050103@egenix.com> <4C17E700.1090107@v.loewis.de> Message-ID: <4C17F022.7050707@egenix.com> "Martin v. L?wis" wrote: >> * How will clients be sure that they are getting the correct key ? > > They should initially download it from the master server (when that is > online) and cache it. So they'll use HTTPS and check the server certificate as well ? >> * What would a client do if the PyPI server is down ? > > Isn't that straight-forward? If the local cache doesn't have the server key, the tools would have to download it from somewhere and if the main server is down, that's not possible, so you reintroduce a single point of failure. >> * How would clients protect their local cached copy of the >> server key against manipulation ? > > Using standard operating system access control. So clients will have to be careful to get this right. >> * Without access to OpenSSL and M2Crypto, how would clients >> apply the check ? > > distribute could include a pure-python checking function. The API > was specifically designed to make this possible. Do you have a pure-Python DSA and PEM/DER parsing function available ? Wouldn't a set of hex dumps be easier to parse ? >> Also, please consider that access to crypto code is restricted >> in some parts of the world. Users in those countries would have >> to be able to turn off verification. > > Most certainly. The simplest approach would be to turn off mirror usage > in the first place. If you do use mirrors, it is then a matter of your > own risk evaluation whether you want the mirror result verified. > > Notice that none of this protects against the master server being > tempered; the only way to protect against that is to use the PGP signing > feature in PyPI (which, of course, package authors must use). Right, it's just an end-to-end authentication. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 15 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 33 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From martin at v.loewis.de Tue Jun 15 23:28:05 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 15 Jun 2010 23:28:05 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> <4C176BD4.3080909@egenix.com> <4C17CE55.5000601@v.loewis.de> Message-ID: <4C17F065.7070309@v.loewis.de> > As a maintainer of the PyPI project, it makes your workflow simpler, > > - contributors can clone the repo, change the code and ask you for a pull > - you can pull changes by direct hg commands, and merge them After using Mercurial in one project, I'm skeptical that this really makes things simpler. I find it very hard to find out what changes a specific clone has that I still need to integrate. Also, when merging with conflicts, I find it very difficult to determine whether I merged all the conflicts correctly (since the diff will show all changes, not just the conflicts). So I rather expect things to become more difficult when switching to hg. Regards, Martin From martin at v.loewis.de Tue Jun 15 23:39:05 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 15 Jun 2010 23:39:05 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17F022.7050707@egenix.com> References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> <4C17C4B5.3000801@jcea.es> <4C17DCE7.6090802@v.loewis.de> <4C17E38B.7050103@egenix.com> <4C17E700.1090107@v.loewis.de> <4C17F022.7050707@egenix.com> Message-ID: <4C17F2F9.6020401@v.loewis.de> >>> * How will clients be sure that they are getting the correct key ? >> >> They should initially download it from the master server (when that is >> online) and cache it. > > So they'll use HTTPS and check the server certificate > as well ? No. But they trust that the package contents is untampered when they download from the central copy, so they should also trust that the server key is untampered. If some attack could arrange to modify the server key (either during transmission, or afterwards), the same threat applies to the actual packages. So this doesn't add any new risk. >>> * What would a client do if the PyPI server is down ? >> >> Isn't that straight-forward? > > If the local cache doesn't have the server key, the tools > would have to download it from somewhere and if the main server > is down, that's not possible, so you reintroduce a single > point of failure. That wouldn't be a problem, since one copy of the server key could ship with setuptools/distribute itself. So people who have never used it before could still validate the mirrors. >>> * How would clients protect their local cached copy of the >>> server key against manipulation ? >> >> Using standard operating system access control. > > So clients will have to be careful to get this right. Not anymore than they do for the actual package data. >>> * Without access to OpenSSL and M2Crypto, how would clients >>> apply the check ? >> >> distribute could include a pure-python checking function. The API >> was specifically designed to make this possible. > > Do you have a pure-Python DSA and PEM/DER parsing function > available ? Wouldn't a set of hex dumps be easier to parse ? See tools/verify.py. Regards, Martin From ziade.tarek at gmail.com Tue Jun 15 23:39:35 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Tue, 15 Jun 2010 23:39:35 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C17F065.7070309@v.loewis.de> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> <4C176BD4.3080909@egenix.com> <4C17CE55.5000601@v.loewis.de> <4C17F065.7070309@v.loewis.de> Message-ID: 2010/6/15 "Martin v. L?wis" : >> As a maintainer of the PyPI project, it makes your workflow simpler, >> >> - contributors can clone the repo, change the code and ask you for a pull >> - you can pull changes by direct hg commands, and merge them > > After using Mercurial in one project, I'm skeptical that this really makes > things simpler. I find it very hard to find out what changes a specific > clone has that I still need to integrate. If the clone is used as an unsynced copy of the repository for various works, you are right, it can become a nightmare ! I think the best practice is to make sure the clone is a fresh synced one, containing only the commits you want to push in the "main" repo so the reviewer has a clean understanding when you ask for the pull; An alternative approach is to use the queue system Mercurial has, which are commands that create patches you can then send to the reviewers. You can even use a tool like CodeReview in that case. Then I guess I doesn't really matter if the main repo is svn or hg.. > Also, when merging with conflicts, > I find it very difficult to determine whether I merged all the conflicts > correctly (since the diff will show all changes, not just the conflicts). > > So I rather expect things to become more difficult when switching to hg. Well I guess it's up to you anyway :) > Regards, > Martin > -- Tarek Ziad? | http://ziade.org From simon at ikanobori.jp Tue Jun 15 23:50:43 2010 From: simon at ikanobori.jp (Simon de Vlieger) Date: Tue, 15 Jun 2010 23:50:43 +0200 Subject: [Catalog-sig] Fwd: [Distutils] Proposal: Move PyPI static data to the cloud for better availability References: <4C17DBD2.20509@v.loewis.de> Message-ID: <6E682DB9-5750-4D42-813F-4872FF42565D@ikanobori.jp> I am forwarding this message as I initially posted my message to the wrong mailinglist (distutils) and Martin has responded on that list. Begin forwarded message: > From: "Martin v. L?wis" > Date: 15 juni 2010 22:00:18 GMT+02:00 > To: Simon de Vlieger > Cc: Mathieu Leduc-Hamel , distutils-sig at python.org > Subject: Re: [Distutils] [Catalog-sig] Proposal: Move PyPI static > data to the cloud for better availability > >> Is there any Nagios monitoring in place or is there the need to have >> some external reliability monitoring in place? > > There is no external monitoring in place that I know of. I know ZC > had some monitoring that was supposed to send me an email, but that > was setup a few years ago, and recently didn't report the downtime. > > My own mirroring reported the downtime (indirectly, by reporting > that it couldn't mirror anymore); this is how I noticed one of the > recent outages. > >> I can set up a Nagios machine to check the HTTP status of PyPi. > > If it's easy to setup: why not? What exactly would that check? > >>> As you said, we may have the same problem in the future on all >>> mirroring nodes ... >> >> Yes, there should be some more investigative work be done on the >> reason >> of the apparent unreliability. > > The pep381mirror software produces a set of static files on the > mirror, so you don't need to run PyPI itself. I merely use Apache to > serve the PyPI mirrors. > > Regards, > Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From marrakis at gmail.com Tue Jun 15 23:55:07 2010 From: marrakis at gmail.com (Mathieu Leduc-Hamel) Date: Tue, 15 Jun 2010 23:55:07 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C17CE55.5000601@v.loewis.de> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> <4C176BD4.3080909@egenix.com> <4C17CE55.5000601@v.loewis.de> Message-ID: > > Just be prepared to provide the code as separately-reviewable chunks > > of modifications. > > That's exactly the point. I may be wrong but me and people want to contribute and it's exactly what project like Bitbucket and code review tools allow. I worked with people of a very wide range of experience at our local python user group and one common complain is that it's alway difficult to contribute. Using a DVCS is exactly one good way to deal with merges and code review. I'm not asking to have a commiter access right away. I just want to be able to contribute cause I'm open to work on something that needed to be done. > > Alternatively, you could start submitting patches. > > I'm not quite sure why that would be. You still couldn't write to the > repository, could you? So what would be the difference? > For sure, right now i worked on Tarek repos and he is responsible to merge on the main svn repos and the production server of pypi. Having complete mercurial workflow would be easier... -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcea at jcea.es Tue Jun 15 23:55:32 2010 From: jcea at jcea.es (Jesus Cea) Date: Tue, 15 Jun 2010 23:55:32 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> <4C17C4B5.3000801@jcea.es> Message-ID: <4C17F6D4.2050504@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 15/06/10 20:52, Tarek Ziad? wrote: > Do you trust the package you are installing more than an "official" > mirror ? if so, why ? If a package is signed by the author, I only need to "trust" the author. If a package is not signed in PYPI, I must "trust" the author, PYPI admins and pypi machines security. If I download from a mirror, with no digital signature, I must trust the author, PYPI admins, pypi machines security, mirror admins, mirror machine security and mirror replication protocol. And all network connections and harddisks in between. It is just me, call me paranoid, but I pay close attention to where the package being installed by "easy_install" is pulled from. I have documented where each package used to live and I check carefully when I see an unexpected URL. And I freak out when I package upgrade includes new dependencies I haven't seen before. > Anyone can upload a package at PyPI with > > os.system('rm -rf /') > > in its setup.py... True. And SCARY. Fortunatelly I only install packages I am interested in, check signatures, etc. Of course, I can be hacked if the original autor put a trojan in the package, or he/she was hacked before. But my exposure is smaller that if I must trust too every link in a LONG chain of mirrors. Just check his link, for a recent example: The trojan was not in the original sourcecode, but in an altered mirror version. Asking for pypi central node to add signatures is a trivial way of avoiding this issue. The question is not to trust or not to trust mirrors, but that we have technology to be safe even if the mirrors are not trusted. I don't NEED to trust you to be safe. I am happy!. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTBf21Jlgi5GaxT1NAQLPngP+NfLf7js3ni9FvoDjkrzOB0AmRIyfmDJm tm0wNEVIlTY+d3st76Gd62ET+VxtgNHfWyNQ82Zp0iAISoWlpDyflJlZ1r5oVjAR sWOSntdXXZAaaxOkumggi1cHKVCbWAe+62fGctTLWt4QtP4557yJDHZO1LKp1nWe qtHX5LyUD5k= =yGPk -----END PGP SIGNATURE----- From ziade.tarek at gmail.com Wed Jun 16 00:01:58 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Wed, 16 Jun 2010 00:01:58 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17EFA3.6050204@egenix.com> References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> <4C17B9B2.10006@egenix.com> <4C17DF0A.3090008@egenix.com> <4C17EFA3.6050204@egenix.com> Message-ID: On Tue, Jun 15, 2010 at 11:24 PM, M.-A. Lemburg wrote: [..] >> I am just arguing that I don't think it's the best solution, compared to what >> was started e.g. a community network of mirrors. > > I've heard you, but still disagree. I think we'll just have to > leave it at that. Sure. Although, I am pretty sure we will come up with a consensus here at some point :) [..] >> So why don't we implement the best solution ? Maybe we could use a wiki page >> and work on a synthetic overview of the pros and cons. > > Again: I don't want to compete against the PEP. I'm looking > for a solution that's easy to implement and doesn't get in the > way. That's all. Nothing more. > > If you can come up with a solution that's ready in a month or two, > I'll happily wait. If I understood Martin correctly, he did some work and things are looking good; so I'll let him answer. Adding the failover part in distribute/pip shouldn't be too long though, falling back to a mirror is a small change. What's important also, is to make sure z3c.pypimirror includes the server-side work, so existing mirrors can be upgraded. Regards Tarek -- Tarek Ziad? | http://ziade.org From jcea at jcea.es Wed Jun 16 00:07:16 2010 From: jcea at jcea.es (Jesus Cea) Date: Wed, 16 Jun 2010 00:07:16 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17DA10.8000508@v.loewis.de> References: <4C1768AF.9040606@egenix.com> <4C17DA10.8000508@v.loewis.de> Message-ID: <4C17F994.2010000@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 15/06/10 21:52, "Martin v. L?wis" wrote: > As for timeliness: it would be reasonable to setup the mirrors so that > they won't be behind more than one minute (by polling for changes every > minute). On the one hand, some people claim that this would be much too > frequent, and that 10 minutes or more would be frequent enough. Others > claim that changes should be propagated instantaneously. This would also > be possible (given that the master knows the list of all mirrors), > but would need to be implemented as well. WebHooks: - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTBf5lJlgi5GaxT1NAQKY9wP/X96mYFA2BWpiQVbuQ+bKM9TWrIZzo+49 jCZFVN67LxecKbhvPvuO1XUCMpECiyl0ycowTUC00+Q+gJIm1TMzw5gJPdh2avy5 kZk31rEmWVIhWN+AclzSgK6CJxZ6Y9YnVsySs185YfM+BpVanjwBma73rU3Vrq0x zLxjHGXDLJI= =1Jcr -----END PGP SIGNATURE----- From jcea at jcea.es Wed Jun 16 00:11:14 2010 From: jcea at jcea.es (Jesus Cea) Date: Wed, 16 Jun 2010 00:11:14 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17DCE7.6090802@v.loewis.de> References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> <4C17C4B5.3000801@jcea.es> <4C17DCE7.6090802@v.loewis.de> Message-ID: <4C17FA82.1000104@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 15/06/10 22:04, "Martin v. L?wis" wrote: >> I read pep 381 long time ago and I don't remember how/when a mirror >> would update, but I do remember it doesn't mandate digital signatures >> (signed by pypi central node, verified by setuptools&friends). That is a >> big gap, in my opinion. > > The PEP doesn't explain the digital signing that is going on in > mirroring. See > > http://mail.python.org/pipermail/catalog-sig/2009-March/002018.html > > This is fully implemented (except that client would need to verify the > signatures, and except key rollover hasn't happened yet). Could I ask pep381 to be updated?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTBf6gplgi5GaxT1NAQJh6AP/T0pyein9GQ2ZmsL1JOxQOdGMhZfg7Jxu go2WuHgrV2Jog7koQFDaX0y/gwTonW5w9AWRcsbQTbOL+ss9JUMgAvd2aSRhWMu2 SQrTsbimuJwHwPbVLRzV3HS6NsgzJgwIEexjmJ1a6kVKvbwOL3RsOqgMyK8/5ka2 V2cWn//0Jzc= =Rplg -----END PGP SIGNATURE----- From martin at v.loewis.de Wed Jun 16 00:17:13 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 16 Jun 2010 00:17:13 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> <4C17B9B2.10006@egenix.com> <4C17DF0A.3090008@egenix.com> <4C17EFA3.6050204@egenix.com> Message-ID: <4C17FBE9.8040400@v.loewis.de> > What's important also, is to make sure z3c.pypimirror includes the > server-side work, so existing mirrors can be upgraded. Not really. z3c.pypimirror has a completely different function. Operators providing one of the official PyPI mirrors should use pep381client instead. Of course, if people absolutely want to, they could also put PEP 381 support in z3c.pypimirror, but that may result in a significant rewrite. Regards, Martin From martin at v.loewis.de Wed Jun 16 00:19:36 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 16 Jun 2010 00:19:36 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17FA82.1000104@jcea.es> References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> <4C17C4B5.3000801@jcea.es> <4C17DCE7.6090802@v.loewis.de> <4C17FA82.1000104@jcea.es> Message-ID: <4C17FC78.2090700@v.loewis.de> > Could I ask pep381 to be updated?. Sure you can ask. So did I. Regards, Martin From jcea at jcea.es Wed Jun 16 00:20:15 2010 From: jcea at jcea.es (Jesus Cea) Date: Wed, 16 Jun 2010 00:20:15 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17E38B.7050103@egenix.com> References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> <4C17C4B5.3000801@jcea.es> <4C17DCE7.6090802@v.loewis.de> <4C17E38B.7050103@egenix.com> Message-ID: <4C17FC9F.4070507@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 15/06/10 22:33, M.-A. Lemburg wrote: > * How will clients be sure that they are getting the correct key ? Err... Download from a HTTPS server, with certificate verification in the client, would be nice :). > * What would a client do if the PyPI server is down ? I would keep using the old key if I can't refresh it. If the key is changed once per year, that would be painless most of the time. > * How would clients protect their local cached copy of the > server key against manipulation ? Well, if you can alter the local cached key, you can alter too the client code to skip the verification completely. > * Without access to OpenSSL and M2Crypto, how would clients > apply the check ? Time ago I proposed to use ?Elgamal? signatures. The check can be done in pure Python in maybe 5 lines of code. I use this in my own projects. > Also, please consider that access to crypto code is restricted > in some parts of the world. Users in those countries would have > to be able to turn off verification. Not for verification, I think. If the verification is 100% python, with no crypto library required, less legal risk. Personally I would ban mirrors deployed in no-crypto countries, if I can not "certify" the files they are serving. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTBf8n5lgi5GaxT1NAQJR6AP6A45T2KF7k6v60w8fa2oH5ZBK/7x3lOgI RQT69ftWwZT+ifPnhJlOMAJ+Xq7F18PL3uOwgsj1Ce12KjimkHPnrOy09+/TblOL Hy0hijddktcAdaaPwBOgE1sOL2ffPsXUk0afKJzPOzYIqFzdqzpb49DYH6vvwsuh I4jJT12x3Ps= =8SNq -----END PGP SIGNATURE----- From martin at v.loewis.de Wed Jun 16 00:23:10 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 16 Jun 2010 00:23:10 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17F994.2010000@jcea.es> References: <4C1768AF.9040606@egenix.com> <4C17DA10.8000508@v.loewis.de> <4C17F994.2010000@jcea.es> Message-ID: <4C17FD4E.6030005@v.loewis.de> > WebHooks: Exactly so. Still, it requires a non-static web server. Also, with a push model, it's more difficult for the client to determine whether the server is current. In a pull model, the client can look at the last synchronization timestamp, and determine whether that's good enough. Of course, if you trust that the push actually works, you could fake the synchronization timestamp if no sync operation is going on. Regards, Martin From ziade.tarek at gmail.com Wed Jun 16 00:27:57 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Wed, 16 Jun 2010 00:27:57 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> <4C176BD4.3080909@egenix.com> <4C17CE55.5000601@v.loewis.de> Message-ID: On Tue, Jun 15, 2010 at 11:55 PM, Mathieu Leduc-Hamel wrote: >>> Just be prepared to provide the code as separately-reviewable chunks >> >> of modifications. > > That's exactly the point. I may be wrong but me and people want to > contribute and it's exactly what project like Bitbucket and code review > tools allow. > I worked with people of a very wide range of experience at our local python > user group and one ?common complain is that it's alway difficult to > contribute. > Using a DVCS is exactly one good way to deal with merges and code review. > I'm not asking to have a commiter access right away. ?I just want to be able > to contribute cause I'm open to work on something that needed to be done. >> >> Alternatively, you could start submitting patches. >> >> I'm not quite sure why that would be. You still couldn't write to the >> repository, could you? So what would be the difference? > > For sure, right now i worked on Tarek repos and he is responsible to merge > on the main svn repos and the production server of pypi. Having complete > mercurial workflow would be easier... Note that Martin is doing the final step (checking the changes before they go in production and updating the production server). From steve at pearwood.info Wed Jun 16 00:24:23 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 16 Jun 2010 08:24:23 +1000 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17BBE5.4010901@jcea.es> References: <4C1768AF.9040606@egenix.com> <4C17BBE5.4010901@jcea.es> Message-ID: <201006160824.23449.steve@pearwood.info> On Wed, 16 Jun 2010 03:44:05 am Jesus Cea wrote: > 2. Packages MUST be digitally signed. Ideally by the owner -1 on requiring that by the package owner. While digitally signing packages is a good idea, the state of the art is not yet so simple that this will be anything but a barrier to entry to the average Python developer. Not to mention there are places in the world where effective encryption is illegal. > but at least by PYPI central node (current pypi server). Martin has said this is already planned, and linked here: http://mail.python.org/pipermail/catalog-sig/2009-March/002018.html Has anyone considered whether there are any legal implications of this? A digital signature is not an MD5 checksum, it may have actual legal meaning in many countries equivalent to a pen and paper signature. IANAL but I do not believe that it is a good idea to be signing arbitrary packages without knowing what they are (other than "a bunch of bytes uploaded from some arbitrary IP address") any more than I would put my physical signature on a parcel handed to me by some random person at the airport. I would not be digitally signing anything I didn't create unless I had good legal advice that it was safe to do so. -- Steven D'Aprano From justinc at cs.washington.edu Wed Jun 16 00:32:39 2010 From: justinc at cs.washington.edu (Justin Cappos) Date: Tue, 15 Jun 2010 15:32:39 -0700 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17F6D4.2050504@jcea.es> References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> <4C17C4B5.3000801@jcea.es> <4C17F6D4.2050504@jcea.es> Message-ID: On Tue, Jun 15, 2010 at 2:55 PM, Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 15/06/10 20:52, Tarek Ziad? wrote: >> Do you trust the package you are installing more than an "official" >> mirror ? if so, why ? > > If a package is signed by the author, I only need to "trust" the author. I think it might not be this simple. You're still trusting PYPI to provide you with the latest version of a package. Absent other mechanisms, you don't have a way to tell if the file you're being served is actually a version that is obsolete (possibly due to security flaws). Also, in practice many package managers perform dependency resolution based upon on metadata that isn't signed with the author's GPG key. http://www.cs.arizona.edu/stork/packagemanagersecurity/otherattacks.html#extradep Is the plan to use what is proposed in http://mail.python.org/pipermail/catalog-sig/2009-March/002018.html in practice? Is more information available about this? Does this protect against man-in-the-middle attacks? > If a package is not signed in PYPI, I must "trust" the author, PYPI > admins and pypi machines security. > > If I download from a mirror, with no digital signature, I must trust the > author, PYPI admins, pypi machines security, mirror admins, mirror > machine security and mirror replication protocol. And all network > connections and harddisks in between. > > It is just me, call me paranoid, but I pay close attention to where the > package being installed by "easy_install" is pulled from. I have > documented where each package used to live and I check carefully when I > see an unexpected URL. And I freak out when I package upgrade includes > new dependencies I haven't seen before. > >> Anyone can upload a package at PyPI with >> >> ? os.system('rm -rf /') >> >> in its setup.py... > > True. And SCARY. Fortunatelly I only install packages I am interested > in, check signatures, etc. Of course, I can be hacked if the original > autor put a trojan in the package, or he/she was hacked before. But my > exposure is smaller that if I must trust too every link in a LONG chain > of mirrors. > > Just check his link, for a recent example: > > > > The trojan was not in the original sourcecode, but in an altered mirror > version. > > Asking for pypi central node to add signatures is a trivial way of > avoiding this issue. The question is not to trust or not to trust > mirrors, but that we have technology to be safe even if the mirrors are > not trusted. I don't NEED to trust you to be safe. I am happy!. I think there are other subtle issues here dealing with key revocation, mismatching of package versions, etc. A lot of these issues are pretty subtle and I'd be happy to talk in more detail about how one might address them. In fact, we have a project that is trying to do so: https://www.updateframework.com/ Geremy do you want to chime in? Thanks, Justin > - -- > Jesus Cea Avion ? ? ? ? ? ? ? ? ? ? ? ? _/_/ ? ? ?_/_/_/ ? ? ? ?_/_/_/ > jcea at jcea.es - http://www.jcea.es/ ? ? _/_/ ? ?_/_/ ?_/_/ ? ?_/_/ ?_/_/ > jabber / xmpp:jcea at jabber.org ? ? ? ? _/_/ ? ?_/_/ ? ? ? ? ?_/_/_/_/_/ > . ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?_/_/ ?_/_/ ? ?_/_/ ? ? ? ? ?_/_/ ?_/_/ > "Things are not so easy" ? ? ?_/_/ ?_/_/ ? ?_/_/ ?_/_/ ? ?_/_/ ?_/_/ > "My name is Dump, Core Dump" ? _/_/_/ ? ? ? ?_/_/_/ ? ? ?_/_/ ?_/_/ > "El amor es poner tu felicidad en la felicidad de otro" - Leibniz > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iQCVAwUBTBf21Jlgi5GaxT1NAQLPngP+NfLf7js3ni9FvoDjkrzOB0AmRIyfmDJm > tm0wNEVIlTY+d3st76Gd62ET+VxtgNHfWyNQ82Zp0iAISoWlpDyflJlZ1r5oVjAR > sWOSntdXXZAaaxOkumggi1cHKVCbWAe+62fGctTLWt4QtP4557yJDHZO1LKp1nWe > qtHX5LyUD5k= > =yGPk > -----END PGP SIGNATURE----- > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From ziade.tarek at gmail.com Wed Jun 16 00:34:22 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Wed, 16 Jun 2010 00:34:22 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17F6D4.2050504@jcea.es> References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> <4C17C4B5.3000801@jcea.es> <4C17F6D4.2050504@jcea.es> Message-ID: On Tue, Jun 15, 2010 at 11:55 PM, Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 15/06/10 20:52, Tarek Ziad? wrote: >> Do you trust the package you are installing more than an "official" >> mirror ? if so, why ? > > If a package is signed by the author, I only need to "trust" the author. > > If a package is not signed in PYPI, I must "trust" the author, PYPI > admins and pypi machines security. > > If I download from a mirror, with no digital signature, I must trust the > author, PYPI admins, pypi machines security, mirror admins, mirror > machine security and mirror replication protocol. And all network > connections and harddisks in between. > > It is just me, call me paranoid, but I pay close attention to where the > package being installed by "easy_install" is pulled from. I have > documented where each package used to live and I check carefully when I > see an unexpected URL. And I freak out when I package upgrade includes > new dependencies I haven't seen before. Makes sense. > >> Anyone can upload a package at PyPI with >> >> ? os.system('rm -rf /') >> >> in its setup.py... > > True. And SCARY. Fortunatelly I only install packages I am interested > in, check signatures, etc. Of course, I can be hacked if the original > autor put a trojan in the package, or he/she was hacked before. But my > exposure is smaller that if I must trust too every link in a LONG chain > of mirrors. > > Just check his link, for a recent example: > > > > The trojan was not in the original sourcecode, but in an altered mirror > version. > > Asking for pypi central node to add signatures is a trivial way of > avoiding this issue. The question is not to trust or not to trust > mirrors, but that we have technology to be safe even if the mirrors are > not trusted. I don't NEED to trust you to be safe. I am happy!. Sure, the ultimate solution are signatures, and I have forgotten that Martin had work on this last year. My opinion is just that until it's available and used, all PyPI mirrors maintained by people that are known members of the community are of a limited risk. From fdrake at acm.org Wed Jun 16 00:37:11 2010 From: fdrake at acm.org (Fred Drake) Date: Tue, 15 Jun 2010 18:37:11 -0400 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <201006160824.23449.steve@pearwood.info> References: <4C1768AF.9040606@egenix.com> <4C17BBE5.4010901@jcea.es> <201006160824.23449.steve@pearwood.info> Message-ID: On Tue, Jun 15, 2010 at 6:24 PM, Steven D'Aprano wrote: > A digital signature is not an MD5 checksum, it may have actual legal > meaning in many countries equivalent to a pen and paper signature. I would expect that verifying a package was signed by PyPI to mean no more than that the bits match what's available from PyPI for the same name. (Not sure if that's what's in the PEP, but that's what I'd be looking for.) We'd have to disclaim anything more than that. But it would be useful to verify that a package from a mirror was accurately mirrored. -Fred -- Fred L. Drake, Jr. "Chaos is the score upon which reality is written." --Henry Miller From ziade.tarek at gmail.com Wed Jun 16 00:38:59 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Wed, 16 Jun 2010 00:38:59 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C17FBE9.8040400@v.loewis.de> References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> <4C17B9B2.10006@egenix.com> <4C17DF0A.3090008@egenix.com> <4C17EFA3.6050204@egenix.com> <4C17FBE9.8040400@v.loewis.de> Message-ID: 2010/6/16 "Martin v. L?wis" : >> What's important also, is to make sure z3c.pypimirror includes the >> server-side work, so existing mirrors can be upgraded. > > Not really. z3c.pypimirror has a completely different function. It's a mirroring script for PyPI. Why do you say it has a completely different function ? > Operators > providing one of the official PyPI mirrors should use pep381client instead. > > Of course, if people absolutely want to, they could also put PEP 381 support > in z3c.pypimirror, but that may result in a significant rewrite. What I had in mind was using pep381client within z3c.pypimirror. From jcea at jcea.es Wed Jun 16 00:39:22 2010 From: jcea at jcea.es (Jesus Cea) Date: Wed, 16 Jun 2010 00:39:22 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <201006160824.23449.steve@pearwood.info> References: <4C1768AF.9040606@egenix.com> <4C17BBE5.4010901@jcea.es> <201006160824.23449.steve@pearwood.info> Message-ID: <4C18011A.3000202@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 16/06/10 00:24, Steven D'Aprano wrote: > I would not be digitally signing anything I didn't create unless I had > good legal advice that it was safe to do so. The pypi signature certifies that the package has not been tampered with. It DO NOT certify anything else. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTBgBGplgi5GaxT1NAQL/fAP/a1GAtmt9kkVzMBiKA7G1hYZ6BG7bOdt0 D3+q5ces91uk6lmzU+HZXCl4pfCljCMYsQjuKa1EP6aNGOT/beAr35s7K2+4S/FE FjBwchWe5YJaJY7gaMUoWakf0Dz9x4rgebd/Aa2a2Qi14fuA2JJyeOzrIcwgRfwQ wgNq65M3ke8= =IgAh -----END PGP SIGNATURE----- From fdrake at acm.org Wed Jun 16 00:37:11 2010 From: fdrake at acm.org (Fred Drake) Date: Tue, 15 Jun 2010 18:37:11 -0400 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <201006160824.23449.steve@pearwood.info> References: <4C1768AF.9040606@egenix.com> <4C17BBE5.4010901@jcea.es> <201006160824.23449.steve@pearwood.info> Message-ID: On Tue, Jun 15, 2010 at 6:24 PM, Steven D'Aprano wrote: > A digital signature is not an MD5 checksum, it may have actual legal > meaning in many countries equivalent to a pen and paper signature. I would expect that verifying a package was signed by PyPI to mean no more than that the bits match what's available from PyPI for the same name. (Not sure if that's what's in the PEP, but that's what I'd be looking for.) We'd have to disclaim anything more than that. But it would be useful to verify that a package from a mirror was accurately mirrored. -Fred -- Fred L. Drake, Jr. "Chaos is the score upon which reality is written." --Henry Miller From martin at v.loewis.de Wed Jun 16 00:45:26 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 16 Jun 2010 00:45:26 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <201006160824.23449.steve@pearwood.info> References: <4C1768AF.9040606@egenix.com> <4C17BBE5.4010901@jcea.es> <201006160824.23449.steve@pearwood.info> Message-ID: <4C180286.1060807@v.loewis.de> > I would not be digitally signing anything I didn't create unless I had > good legal advice that it was safe to do so. I'm actually not worried about this. In my own country, a valid digital signature requires much more than invocation of the RSA algorithm. E.g. available of certain certified information about the key holder is necessary (including some identification of the key holder). The PyPI signatures don't include any identification information. Also, the only thing that *does* get signed are the simple index pages, and indeed, I not only sign them, I also generate them. Regards, Martin From ianb at colorstudy.com Wed Jun 16 00:47:57 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 15 Jun 2010 17:47:57 -0500 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C1768AF.9040606@egenix.com> References: <4C1768AF.9040606@egenix.com> Message-ID: Hmm... long thread. Anyway: I'm +1 on using a CDN. I think the overhead of managing a mirror network is considerably greater than the cost of the CDN, and more error-prone. With a CDN one developer can figure out how to implement this in PyPI, and any problems will be with PyPI, not some other mirror system that the person debugging the problem doesn't control. I think your cost only covers bandwidth, but there are also storage costs. What disk space are the PyPI packages using right now? That will only increase over time as PyPI generally keeps all releases. Possibly CDN space could be donated. As an implementation note, Google's new system copies S3's API (http://code.google.com/apis/storage/) -- I'm not sure if it covers the same territory as CloudFront though. Anyway, implementing to S3/CloudFront probably is a good bet even if the provider changes in the future. For generation /simple/ with a cronjob, I'm -0. I find these delays make testing difficult and unreliable; you can never be sure if the job is just slow, what you did didn't work, etc. I'd rather see PyPI shift to creating static pages on-demand, that is, anytime they need updating. Then if PyPI goes down the static pages still exist and work, but there's no delay. Another option might be a caching proxy configured to serve up cached copies when the underlying system is down... but I'm not sure if that's any less work ultimately, and is more ongoing administration. I don't see a benefit to moving further into the cloud, such as hosting on multiple machines. I suspect that PyPI is not anywhere near needing more power than a good sized server can provide, and I doubt that will change soon. It will be easier to manage the system with a single machine and database. There won't be network problems where app servers can't access the database, for instance. Or a need for replication, which is another big potential administration hassle. > * scalability > * 24/7 system administration management > * geo-localized fast and reliable access > > > Current Situation > ----------------- > > PyPI is currently run from a single server hosted in The Netherlands > (ximinez.python.org). This server is run by a very small team of sys > admin. > As far as I know, none of this changes how much administration load there is, does it? That is, cloud machines still need to be administered. The only way I see that you'd really decrease administration load is with a more radical move to a managed service, like App Engine. That's probably quite doable and would have substantial advantages, but it feels like a quite different approach than is proposed here and it involves lots more coding. Unless there really is a problem with the physical management of the server? Server side: upload cronjobs > ---------------------------- > > Since the /simple index tree is currently being created dynamically, > we'd need to create static copies of it at regular intervals in order > to upload the content to the S3 bucket. This can easily be done using > tools such as wget or curl. > > Both the static copy of the /simple tree and the static files uploaded > to /packages then need to be uploaded or updated in the S3 bucket by a > cronjob running every 10-20 minutes. > Is it easy to sync something with S3? It's easy to upload, delete, etc., but sync is rather different, no? Not a big deal, just that changes would have to be tracked if sync was not efficient. > Server side: redirection setup > ------------------------------ > > Since PyPI wasn't designed to be put on a CDN, it mixes static file > URL paths with dynamic access ones, e.g. > > dynamic: > > http://pypi.python.org/pypi > (and a few others) > > static: > > http://pypi.python.org/simple > http://pypi.python.org/packages > > To move part of the URL path tree to a CDN, which works based on > domains, we will need to provide a URL redirection setup that > redirects client side tools to the new location. > As far as I know /packages isn't accessed directly, but only from links from /simple -- so if those links are updated everything should work. Some packages already aren't on PyPI, so there's no particular expectation about hosting location. If /simple/ is a set of static files hosted on ximinez, will it be reliable enough? Then no redirects will be required. I don't know what exactly has caused failures. If it's networking then redirects would help. If it's services failing, then static files will solve it. If it's the entire machine getting wonky, e.g., if memory is exhausted... then quite possible static files will help avoid those situations but it's not a guarantee. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Wed Jun 16 00:55:41 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 16 Jun 2010 00:55:41 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> <4C17C4B5.3000801@jcea.es> <4C17F6D4.2050504@jcea.es> Message-ID: <4C1804ED.8030708@v.loewis.de> > Is the plan to use what is proposed in > http://mail.python.org/pipermail/catalog-sig/2009-March/002018.html in > practice? You mean, is it implemented and deployed? Sure - just try for yourself. > Is more information available about this? This is not a very specific question. The answer is certainly: yes, e.g. the source code of PyPI. > Does this protect against man-in-the-middle attacks? Hmm. This is also not very specific. Sometimes yes, sometimes no. It protects against men sitting in the middle of a package download, and also against men sitting on a mirror (which are both in the middle between PyPI and the user). It doesn't protect against men sitting in the middle of the serverkey download, or men sitting in the middle of a setuptools installation process, or men sitting on PyPI itself (which would be in the middle between the package author and the user). Regards, Martin From martin at v.loewis.de Wed Jun 16 01:01:17 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 16 Jun 2010 01:01:17 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> <4C17B9B2.10006@egenix.com> <4C17DF0A.3090008@egenix.com> <4C17EFA3.6050204@egenix.com> <4C17FBE9.8040400@v.loewis.de> Message-ID: <4C18063D.7000708@v.loewis.de> Am 16.06.2010 00:38, schrieb Tarek Ziad?: > 2010/6/16 "Martin v. L?wis": >>> What's important also, is to make sure z3c.pypimirror includes the >>> server-side work, so existing mirrors can be upgraded. >> >> Not really. z3c.pypimirror has a completely different function. > > It's a mirroring script for PyPI. Why do you say it has a completely > different function ? a) it's a selective mirror (IIUC); a PEP 381 mirror should be complete b) it's also a superset-mirror, mirroring stuff that actually *isn't* on PyPI. This is not needed for PEP 381 c) it edits the simple index pages, thus breaking the page signature. So it is really aimed at private mirrors (IIUC, that's also what it is used for), whereas PEP 381 is about public mirrors. >> Operators >> providing one of the official PyPI mirrors should use pep381client instead. >> >> Of course, if people absolutely want to, they could also put PEP 381 support >> in z3c.pypimirror, but that may result in a significant rewrite. > > What I had in mind was using pep381client within z3c.pypimirror. Not sure how this would work - but you can certainly feel free to copy any code that you find useful into z3c.pypimirror. Regards, Martin From martin at v.loewis.de Wed Jun 16 01:04:36 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 16 Jun 2010 01:04:36 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> <4C17BBE5.4010901@jcea.es> <201006160824.23449.steve@pearwood.info> Message-ID: <4C180704.9060008@v.loewis.de> Am 16.06.2010 00:37, schrieb Fred Drake: > On Tue, Jun 15, 2010 at 6:24 PM, Steven D'Aprano wrote: >> A digital signature is not an MD5 checksum, it may have actual legal >> meaning in many countries equivalent to a pen and paper signature. > > I would expect that verifying a package was signed by PyPI to mean no more than > that the bits match what's available from PyPI for the same name. (Not sure if > that's what's in the PEP, but that's what I'd be looking for.) It's indeed exactly that. > We'd have to disclaim anything more than that. But it would be useful to verify > that a package from a mirror was accurately mirrored. There are actually two layers here: one is to verify that the transmission was not faulty; for this, the md5sum that is already in the simple pages should be enough (and *please* don't tell me that md5 is broken). Of course, an adversary could then try to modify the simple pages, that's what the actual signatures are for. Regards, Martin From debatem1 at gmail.com Wed Jun 16 01:33:03 2010 From: debatem1 at gmail.com (geremy condra) Date: Tue, 15 Jun 2010 16:33:03 -0700 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C1804ED.8030708@v.loewis.de> References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> <4C17C4B5.3000801@jcea.es> <4C17F6D4.2050504@jcea.es> <4C1804ED.8030708@v.loewis.de> Message-ID: On Tue, Jun 15, 2010 at 3:55 PM, "Martin v. L?wis" wrote: >> Is the plan to use what is proposed in >> http://mail.python.org/pipermail/catalog-sig/2009-March/002018.html in >> practice? > > You mean, is it implemented and deployed? Sure - just try for yourself. > >> Is more information available about this? > > This is not a very specific question. The answer is certainly: yes, e.g. > the source code of PyPI. > >> Does this protect against man-in-the-middle attacks? > > Hmm. This is also not very specific. Sometimes yes, sometimes no. > > It protects against men sitting in the middle of a package download, and > also against men sitting on a mirror (which are both in the middle between > PyPI and the user). > > It doesn't protect against men sitting in the middle of the serverkey > download, or men sitting in the middle of a setuptools installation > process, or men sitting on PyPI itself (which would be in the middle between > the package author and the user). I'm not clear on this and the document is a little vague, so perhaps I should be perusing the source, but if you don't protect against a serverkey MITM and you are supposed to update the serverkey any time a signature doesn't match up, couldn't an attacker just MITM you, produce a known bad signature, and then wait for you to request a serverkey from them? Geremy Condra From martin at v.loewis.de Wed Jun 16 08:09:58 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 16 Jun 2010 08:09:58 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> <4C17C4B5.3000801@jcea.es> <4C17F6D4.2050504@jcea.es> <4C1804ED.8030708@v.loewis.de> Message-ID: <4C186AB6.2030407@v.loewis.de> > I'm not clear on this and the document is a little vague, so perhaps > I should be perusing the source, but if you don't protect against a > serverkey MITM and you are supposed to update the serverkey any > time a signature doesn't match up, couldn't an attacker just MITM > you, produce a known bad signature, and then wait for you to > request a serverkey from them? That's true; transmission of the serverkey is not currently protected against MITM. How would you suggest to fix that? As for perusing the source: the client behavior is not implemented yet, so there isn't really any source to check, yet. Regards, Martin From martin at v.loewis.de Wed Jun 16 08:40:40 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 16 Jun 2010 08:40:40 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C186AB6.2030407@v.loewis.de> References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> <4C17C4B5.3000801@jcea.es> <4C17F6D4.2050504@jcea.es> <4C1804ED.8030708@v.loewis.de> <4C186AB6.2030407@v.loewis.de> Message-ID: <4C1871E8.9060503@v.loewis.de> > That's true; transmission of the serverkey is not currently protected > against MITM. How would you suggest to fix that? > > As for perusing the source: the client behavior is not implemented yet, > so there isn't really any source to check, yet. Following up to myself: The mirroring protocol doesn't really *need* to protect against MITM. Communication with PyPI (e.g. package download) currently isn't protected against MITM, either, so the mirroring adds no new threat here. The protocol primarily protects against malicious mirror operators, and hacked mirrors. With that, a simple solution might be to offer opt-out of serverkey updates. Users that worry about MITM should manually install the serverkey in their pypirc, then distribute could refuse to automatically update it. In the case of key rollover, users would need to download the server key again in a trusted manner. Regards, Martin From justinc at cs.washington.edu Wed Jun 16 08:41:45 2010 From: justinc at cs.washington.edu (Justin Cappos) Date: Tue, 15 Jun 2010 23:41:45 -0700 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C186AB6.2030407@v.loewis.de> References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> <4C17C4B5.3000801@jcea.es> <4C17F6D4.2050504@jcea.es> <4C1804ED.8030708@v.loewis.de> <4C186AB6.2030407@v.loewis.de> Message-ID: On Tue, Jun 15, 2010 at 11:09 PM, "Martin v. L?wis" wrote: >> I'm not clear on this and the document is a little vague, so perhaps >> I should be perusing the source, but if you don't protect against a >> serverkey MITM and you are supposed to update the serverkey any >> time a signature doesn't match up, couldn't an attacker just MITM >> you, produce a known bad signature, and then wait for you to >> request a serverkey from them? > > That's true; transmission of the serverkey is not currently protected > against MITM. How would you suggest to fix that? A simple way to protect against just the issue you mentioned is to have the clients retrieve the key over HTTPS or distribute the key with the client. In general, the problems are much, much trickier than just this. I won't bore you with all of the details (unless you'd like to know more), but we found and fixed a lot of problems with the security of linux package managers. A quick pointer to some of the technical details can be found here: http://www.cs.arizona.edu/stork/packagemanagersecurity/papers.html > As for perusing the source: the client behavior is not implemented yet, so > there isn't really any source to check, yet. Okay. We'd be happy to work with you to get an easy solution put in place. As I was shamelessly plugging before, we've been working on a library called TUF that is supposed to make this as simple as possible for whomever maintains the repository and be completely transparent for the clients. TUF is fairly early stage (our first major deployment is on going), but might be worth consideration. I think we could probably put together a quick demo so that you and others could see how it might work with one of the existing client updaters. Thanks, Justin From marrakis at gmail.com Wed Jun 16 09:33:39 2010 From: marrakis at gmail.com (Mathieu Leduc-Hamel) Date: Wed, 16 Jun 2010 09:33:39 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> <4C176BD4.3080909@egenix.com> <4C17CE55.5000601@v.loewis.de> Message-ID: > > Note that Martin is doing the final step (checking the changes before > they go in production > and updating the production server). > For sure ! I wasn't saying that I wanted to be able to push anything directly on PyPi or on the official repository. My point was more about making it easier for contributors to fork, modify and proposed... -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Jun 16 13:44:56 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 16 Jun 2010 11:44:56 +0000 (UTC) Subject: [Catalog-sig] Mercurial References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> <4C176BD4.3080909@egenix.com> <4C17CE55.5000601@v.loewis.de> <4C17F065.7070309@v.loewis.de> Message-ID: Martin v. L?wis v.loewis.de> writes: > > > As a maintainer of the PyPI project, it makes your workflow simpler, > > > > - contributors can clone the repo, change the code and ask you for a pull > > - you can pull changes by direct hg commands, and merge them > > After using Mercurial in one project, I'm skeptical that this really > makes things simpler. I find it very hard to find out what changes a > specific clone has that I still need to integrate. Also, when merging > with conflicts, I find it very difficult to determine whether I merged > all the conflicts correctly (since the diff will show all changes, not > just the conflicts). > > So I rather expect things to become more difficult when switching to hg. I think it would be fair to bring those points on the mercurial mailing-list. After all we'll be one of their "high-profile" users, so they'd probably like us to enjoy the experience. Regards Antoine. From solipsis at pitrou.net Wed Jun 16 13:53:00 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 16 Jun 2010 11:53:00 +0000 (UTC) Subject: [Catalog-sig] =?utf-8?q?Proposal=3A_Move_PyPI_static_data_to_the_?= =?utf-8?q?cloud_for=09better_availability?= References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> <4C17BBC3.3050205@egenix.com> Message-ID: Tarek Ziad? gmail.com> writes: > > And we happen to have this network already: lots of people > will host a PyPI mirror as soon as it's easy to set one imho. You must be careful that the mirrors are properly managed and administered, though. Having stale/dysfunctioning mirrors is worse than having no mirrors at all. It is likely that some people will setup a mirror and then "forget" to take care about it. Like our buildbots really. Regards Antoine. From solipsis at pitrou.net Wed Jun 16 14:03:30 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 16 Jun 2010 12:03:30 +0000 (UTC) Subject: [Catalog-sig] [OT] Nagios / Shinken References: <4C1768AF.9040606@egenix.com> <201006160033.46095.steve@pearwood.info> <4C17A272.9070808@egenix.com> Message-ID: M.-A. Lemburg egenix.com> writes: > > Setting up some Zenoss or Nagios monitoring system to take > care of monitoring the PyPI server (and our other servers) > would be a separate project. Just for the record, I would mention that someone started a rewrite of the Nagios software in Python: http://www.shinken-monitoring.org/ According to the author, the Python rewrite is also much faster than the original C software: http://www.shinken-monitoring.org/features/huge-performances/ Probably a good showcase of the "using a dynamic language allows you to focus on a better architecture" argument :-) Regards Antoine. From chrism at plope.com Wed Jun 16 14:09:14 2010 From: chrism at plope.com (Chris McDonough) Date: Wed, 16 Jun 2010 08:09:14 -0400 Subject: [Catalog-sig] [OT] Nagios / Shinken In-Reply-To: References: <4C1768AF.9040606@egenix.com> <201006160033.46095.steve@pearwood.info> <4C17A272.9070808@egenix.com> Message-ID: <1276690154.2688.26.camel@thinko> Even more OT, we might try setting up the PyPI server under supervisord (http://supervisord.org) plus superlance's HTTPOK and memmon event listeners. This would make sure that the process is restarted when it stops answering HTTP requests or if it begins to consume "too much" memory. It's slightly more reliable than other systems that do similar things, because it's the parent process of the processes being monitored. On Wed, 2010-06-16 at 12:03 +0000, Antoine Pitrou wrote: > M.-A. Lemburg egenix.com> writes: > > > > Setting up some Zenoss or Nagios monitoring system to take > > care of monitoring the PyPI server (and our other servers) > > would be a separate project. > > Just for the record, I would mention that someone started a rewrite of the > Nagios software in Python: > http://www.shinken-monitoring.org/ > > According to the author, the Python rewrite is also much faster than the > original C software: > http://www.shinken-monitoring.org/features/huge-performances/ > > Probably a good showcase of the "using a dynamic language allows you to focus on > a better architecture" argument :-) > > Regards > > Antoine. > > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From mal at egenix.com Wed Jun 16 14:20:00 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 16 Jun 2010 14:20:00 +0200 Subject: [Catalog-sig] [OT] Nagios / Shinken In-Reply-To: References: <4C1768AF.9040606@egenix.com> <201006160033.46095.steve@pearwood.info> <4C17A272.9070808@egenix.com> Message-ID: <4C18C170.7060102@egenix.com> Antoine Pitrou wrote: > M.-A. Lemburg egenix.com> writes: >> >> Setting up some Zenoss or Nagios monitoring system to take >> care of monitoring the PyPI server (and our other servers) >> would be a separate project. > > Just for the record, I would mention that someone started a rewrite of the > Nagios software in Python: > http://www.shinken-monitoring.org/ > According to the author, the Python rewrite is also much faster than the > original C software: > http://www.shinken-monitoring.org/features/huge-performances/ > > Probably a good showcase of the "using a dynamic language allows you to focus on > a better architecture" argument :-) Zenoss is written in Python and uses Zope for the web GUI. It has a large community around it and provides all the enterprise features you'd need from such a system. http://www.zenoss.com/ and Zenoss can use Nagios plugins as well. I'd see a chance for such a new tool, though: Zenoss can be very complicated to setup, esp. if you're not using SNMP on all your machines. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 16 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 32 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Wed Jun 16 14:20:09 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 16 Jun 2010 14:20:09 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> <4C17BBC3.3050205@egenix.com> Message-ID: <4C18C179.4080709@egenix.com> Antoine Pitrou wrote: > Tarek Ziad? gmail.com> writes: >> >> And we happen to have this network already: lots of people >> will host a PyPI mirror as soon as it's easy to set one imho. > > You must be careful that the mirrors are properly managed and administered, > though. Having stale/dysfunctioning mirrors is worse than having no mirrors at > all. > It is likely that some people will setup a mirror and then "forget" to take care > about it. Like our buildbots really. Right, it's that administration overhead I was referring to. Perhaps we should just let the users decide: a) they use the default PyPI access (which we then enhance by caching the content in the cloud) b) they setup their easy_install or zc.buildout to pull data from a mirror network by enabling a configuration option Since implementing option b) will require updating existing package tools on the client side anyway, the extra configuration shouldn't be a problem. Option a) requires no changes whatsoever on the client side. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 16 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 32 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From jacob at jacobian.org Wed Jun 16 18:39:32 2010 From: jacob at jacobian.org (Jacob Kaplan-Moss) Date: Wed, 16 Jun 2010 11:39:32 -0500 Subject: [Catalog-sig] Renaming packages Message-ID: Howdy folks -- I've received a request from the Debian and Ubuntu maintainers to rename one of my packages [1] so that it'd comply better with the Debian/Ubuntu naming standards. I'd like to help them out, and ideally I'd like to rename my package on PyPI to match the name that APT will use. However, as far as I can tell there's no real mechanism for renaming packages on PyPI: if I change the name, everyone's pip/buildout dependencies will just fail until they, too, update the name. Ideally, I'd expect PyPI to give me a renaming mechanism that'd issue the proper redirects from the old name to the new. Apologies if I'm just not seeing a feature that's already there; if it's not, though, are there any plans for this in the future? Or any other bright ideas? Thanks! Jacob [1] http://pypi.python.org/pypi/python-cloudservers From sridharr at activestate.com Wed Jun 16 19:06:58 2010 From: sridharr at activestate.com (Sridhar Ratnakumar) Date: Wed, 16 Jun 2010 10:06:58 -0700 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <4C12A2E4.2090305@v.loewis.de> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> Message-ID: <362E7782-303B-4ED1-803A-EA82762F6365@activestate.com> On 2010-06-11, at 1:56 PM, Martin v. L?wis wrote: > If you are willing to invest *a lot* of time, then it seems that rewriting PyPI in Django would make a lot of people happy, because > they claim they can't contribute to the current code base because > they don't understand that. I don't want to do such a rewrite on > my own because I *do* understand the code base (despite not having written it in the first place, so I think that if you really want > to contribute, you can learn how it works); it also violates Joel > Spolsky's principle of never ever doing rewrites. FYI: I just happened to stumble upon what claims to be a "re-implementation of PyPI" in Django: http://pypi.python.org/pypi/djangopypi/0.4 -srid From debatem1 at gmail.com Wed Jun 16 19:42:25 2010 From: debatem1 at gmail.com (geremy condra) Date: Wed, 16 Jun 2010 13:42:25 -0400 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> <4C17C4B5.3000801@jcea.es> <4C17F6D4.2050504@jcea.es> <4C1804ED.8030708@v.loewis.de> <4C186AB6.2030407@v.loewis.de> Message-ID: On Wed, Jun 16, 2010 at 2:41 AM, Justin Cappos wrote: > On Tue, Jun 15, 2010 at 11:09 PM, "Martin v. L?wis" wrote: >>> I'm not clear on this and the document is a little vague, so perhaps >>> I should be perusing the source, but if you don't protect against a >>> serverkey MITM and you are supposed to update the serverkey any >>> time a signature doesn't match up, couldn't an attacker just MITM >>> you, produce a known bad signature, and then wait for you to >>> request a serverkey from them? >> >> That's true; transmission of the serverkey is not currently protected >> against MITM. How would you suggest to fix that? > > A simple way to protect against just the issue you mentioned is to > have the clients retrieve the key over HTTPS or distribute the key > with the client. I'd just add that this is not currently as simple as it should be in Python; by default Python does not check certs for HTTPS connections, so you can't just feed the correct url into urllib and be sure you're getting the right answer. http://bugs.python.org/issue1589 Geremy Condra From martin at v.loewis.de Wed Jun 16 20:37:37 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 16 Jun 2010 20:37:37 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> <4C17BBC3.3050205@egenix.com> Message-ID: <4C1919F1.9080506@v.loewis.de> Am 16.06.2010 13:53, schrieb Antoine Pitrou: > Tarek Ziad? gmail.com> writes: >> >> And we happen to have this network already: lots of people >> will host a PyPI mirror as soon as it's easy to set one imho. > > You must be careful that the mirrors are properly managed and administered, > though. Having stale/dysfunctioning mirrors is worse than having no mirrors at > all. That's not true. The client software can check whether a mirror is up-to-date, and proceed to the next mirror if one is outdated. > It is likely that some people will setup a mirror and then "forget" to take care > about it. Like our buildbots really. The same can happen to any infrastructure, though. Amazon may decide to change the setup, and then the automated update procedure would break. Of course, they would give advance notice - but then somebody would have to react to that advance notice. With the proposed default redirection of all PyPI downloads to Amazon, such breakage would affect the entire installation, not just a single mirror. Regards, Martin From martin at v.loewis.de Wed Jun 16 20:40:18 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 16 Jun 2010 20:40:18 +0200 Subject: [Catalog-sig] Mercurial In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> <4C176BD4.3080909@egenix.com> <4C17CE55.5000601@v.loewis.de> <4C17F065.7070309@v.loewis.de> Message-ID: <4C191A92.9030404@v.loewis.de> Am 16.06.2010 13:44, schrieb Antoine Pitrou: > Martin v. L?wis v.loewis.de> writes: >> >>> As a maintainer of the PyPI project, it makes your workflow simpler, >>> >>> - contributors can clone the repo, change the code and ask you for a pull >>> - you can pull changes by direct hg commands, and merge them >> >> After using Mercurial in one project, I'm skeptical that this really >> makes things simpler. I find it very hard to find out what changes a >> specific clone has that I still need to integrate. Also, when merging >> with conflicts, I find it very difficult to determine whether I merged >> all the conflicts correctly (since the diff will show all changes, not >> just the conflicts). >> >> So I rather expect things to become more difficult when switching to hg. > > I think it would be fair to bring those points on the mercurial mailing-list. > After all we'll be one of their "high-profile" users, so they'd probably like us > to enjoy the experience. I'm just a hg beginner, so it's probably all my fault, and I'm not using it correctly. However, I admit that switching from RCS to CVS was easy, and so was switching from CVS to SVN. Switching to hg is the most difficult change for me. I'm probably getting old. Regards, Martin From martin at v.loewis.de Wed Jun 16 20:56:06 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 16 Jun 2010 20:56:06 +0200 Subject: [Catalog-sig] Renaming packages In-Reply-To: References: Message-ID: <4C191E46.1060602@v.loewis.de> Am 16.06.2010 18:39, schrieb Jacob Kaplan-Moss: > Howdy folks -- > > I've received a request from the Debian and Ubuntu maintainers to > rename one of my packages [1] so that it'd comply better with the > Debian/Ubuntu naming standards. I'd like to help them out, and ideally > I'd like to rename my package on PyPI to match the name that APT will > use. However, as far as I can tell there's no real mechanism for > renaming packages on PyPI: if I change the name, everyone's > pip/buildout dependencies will just fail until they, too, update the > name. > > Ideally, I'd expect PyPI to give me a renaming mechanism that'd issue > the proper redirects from the old name to the new. Apologies if I'm > just not seeing a feature that's already there; if it's not, though, > are there any plans for this in the future? Or any other bright ideas? There is a renaming mechanism, but it does just that: rename the package, and all releases. Also, it's available only to the admin, so you have to request it through the bug tracker. It turns out that this actually causes problems (beyond the dependencies): the files are *not* renamed, and that is, at least, confusing (because they stop matching the project name). Renaming the files is no option, either, because they then stop matching the embedded setup.py. I think your proposed mechanism wouldn't work too well, either: if you issue redirects, then setuptools will follow the redirects, too. Depending on the package name you originally requested, it will then fail to see either the old files or the new files, since they don't match the project name. So I think the best you can hope for is this: - you have the old releases, and they are easy_installable only with the old name. - you have the new releases, and they are easy_installable only with the new name. If that's all you can get, I suggest just to create the new package, and release under the new package name. For human users of the package index, create a single release of the old package, with a description that has a link to the new name. Regards, Martin From fdrake at acm.org Wed Jun 16 21:14:32 2010 From: fdrake at acm.org (Fred Drake) Date: Wed, 16 Jun 2010 15:14:32 -0400 Subject: [Catalog-sig] Mercurial In-Reply-To: <4C191A92.9030404@v.loewis.de> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> <4C176BD4.3080909@egenix.com> <4C17CE55.5000601@v.loewis.de> <4C17F065.7070309@v.loewis.de> <4C191A92.9030404@v.loewis.de> Message-ID: On Wed, Jun 16, 2010 at 2:40 PM, "Martin v. L?wis" wrote: > However, I admit that switching from RCS to CVS was easy, and so was > switching from CVS to SVN. Switching to hg is the most difficult change for > me. I'm probably getting old. Pretty much the same here; DVCS systems have some highly desirable features (better merging), but there's a lot of other changes to learn before they can be used effectively. -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From tjreedy at udel.edu Wed Jun 16 21:27:09 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 16 Jun 2010 15:27:09 -0400 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C18C179.4080709@egenix.com> References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> <4C17BBC3.3050205@egenix.com> <4C18C179.4080709@egenix.com> Message-ID: On 6/16/2010 8:20 AM, M.-A. Lemburg wrote: > Antoine Pitrou wrote: >> Tarek Ziad? gmail.com> writes: >>> >>> And we happen to have this network already: lots of people >>> will host a PyPI mirror as soon as it's easy to set one imho. >> >> You must be careful that the mirrors are properly managed and administered, >> though. Having stale/dysfunctioning mirrors is worse than having no mirrors at >> all. >> It is likely that some people will setup a mirror and then "forget" to take care >> about it. Like our buildbots really. > > Right, it's that administration overhead I was referring to. > > Perhaps we should just let the users decide: > > a) they use the default PyPI access (which we then enhance by > caching the content in the cloud) > > b) they setup their easy_install or zc.buildout to pull data > from a mirror network by enabling a configuration option > > Since implementing option b) will require updating existing > package tools on the client side anyway, the extra configuration > shouldn't be a problem. > > Option a) requires no changes whatsoever on the client side. It seems to me that: If the problem of availability with pypi is anything like the problems with bugs... and extending 'pypi...' with cloud service could be done relatively quickly (within a month), then that seems reasonable. If 'free to psf' mirrors are feasible and needed, then they will still be useful, especially is high-download regions. Since Amazon's cloud service is metered on a region by region basis, any off-loading of demand to regional mirrors will reduce PSF charges. Based on what I have read in the thread, I would not be surprised if full mirror deployment takes a year. After that, the cloud service could remain to pick up slack in a region should the mirror in a region go down. Any move to incremental update from time-based replacement will benefit either system. Terry Jan Reedy From solipsis at pitrou.net Wed Jun 16 20:41:55 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 16 Jun 2010 20:41:55 +0200 Subject: [Catalog-sig] Mercurial In-Reply-To: <4C191A92.9030404@v.loewis.de> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> <4C176BD4.3080909@egenix.com> <4C17CE55.5000601@v.loewis.de> <4C17F065.7070309@v.loewis.de> <4C191A92.9030404@v.loewis.de> Message-ID: <1276713715.3174.0.camel@localhost.localdomain> Le mercredi 16 juin 2010 ? 20:40 +0200, "Martin v. L?wis" a ?crit : > Am 16.06.2010 13:44, schrieb Antoine Pitrou: > > Martin v. L?wis v.loewis.de> writes: > >> > >>> As a maintainer of the PyPI project, it makes your workflow simpler, > >>> > >>> - contributors can clone the repo, change the code and ask you for a pull > >>> - you can pull changes by direct hg commands, and merge them > >> > >> After using Mercurial in one project, I'm skeptical that this really > >> makes things simpler. I find it very hard to find out what changes a > >> specific clone has that I still need to integrate. Also, when merging > >> with conflicts, I find it very difficult to determine whether I merged > >> all the conflicts correctly (since the diff will show all changes, not > >> just the conflicts). > >> > >> So I rather expect things to become more difficult when switching to hg. > > > > I think it would be fair to bring those points on the mercurial mailing-list. > > After all we'll be one of their "high-profile" users, so they'd probably like us > > to enjoy the experience. > > I'm just a hg beginner, so it's probably all my fault, and I'm not using > it correctly. There's no problem in asking beginner questions :) Regards Antoine. From merwok at netwok.org Wed Jun 16 21:53:11 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Wed, 16 Jun 2010 21:53:11 +0200 Subject: [Catalog-sig] Mercurial In-Reply-To: <4C17F065.7070309@v.loewis.de> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> <4C176BD4.3080909@egenix.com> <4C17CE55.5000601@v.loewis.de> <4C17F065.7070309@v.loewis.de> Message-ID: <4C192BA7.8010202@netwok.org> > After using Mercurial in one project, I'm skeptical that this really > makes things simpler. I find it very hard to find out what changes a > specific clone has that I still need to integrate. There are commands to compare repositories: incoming and outgoing (read ?hg help incoming?). > Also, when merging with conflicts, I find it very difficult to determine > whether I merged all the conflicts correctly (since the diff will show > all changes, not just the conflicts). I believe that?s a known bug. David Wolever is writing an extension to show only the diff against the automated merge, which would be more helpful: http://mercurial.selenic.com/wiki/MergediffExtension Bitbucket uses a similar algo to display merge diffs, I think. With the command-line tool or TortoiseHg, you can check the diff against the second parent of the merge, which can be more meaningful than the default diff against the first parent. We will definitely need tutorials to make the transition smooth. More advanced users are on python-dev and #python-dev and bugs.python.org to help newcomers, so don?t hesitate to complain about Mercurial. BTW, the willingness to learn a new tool in such a fundamental area as version control tells you?re not so old. hginit.com is a really short tutorial that starts with version control reeducation for Subversion users. Regards From simon at ikanobori.jp Wed Jun 16 23:15:56 2010 From: simon at ikanobori.jp (Simon de Vlieger) Date: Wed, 16 Jun 2010 23:15:56 +0200 Subject: [Catalog-sig] PyPI template improvements Message-ID: Hey all, the recent activity on this mailinglist has kickstarted my contributing sense. As long as the mirroring debate is still ongoing I will focus my efforts somewhere else. Namely: the HTML/Javascript/CSS. This email has also been submitted to the distutils-sig list as a lot of power users of PyPI are on there. In this regard I have a few questions before I really dig into these templates: - - Is there a list of improvements, maybe a nice TODO of points which people want to see improved? - - How are design changes handled, is there a committee to run them through? People who decide on what gets in and what not? (I'll outline some of my first thoughts lower in this mail) - - What are the supported browser versions by PyPI, I reckon it's IE6/7/8+, Fx 2+, Opera 9+ Safari 4+? The changes I have on my personal 'todo list' are: - - Add labels to all forms. - - Make tables consistent width (see for example the table in the top of the "Browse packages" page and compare with the table when you actually select one of the classifiers). - - Restyle the metadata display on package pages and move it up in the page. - - Have downloads readily available on the right side of the screen (at least the latest release). - - Look sternly at the top right floating account information page. - - Look at the your details page where the form does not align with the right floating profile box. - - Make one consistent styling for all forms. Include help texts in all forms. There are more things I want to do, but this is the start. I have already cloned Tarek's PyPI clone on Bitbucket and I'll add my changes there. Is there anything you guys (and the users) would really like to see improved? Regards, Simon de Vlieger- From martin at v.loewis.de Wed Jun 16 23:51:17 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 16 Jun 2010 23:51:17 +0200 Subject: [Catalog-sig] PyPI template improvements In-Reply-To: References: Message-ID: <4C194755.2060704@v.loewis.de> > - - Is there a list of improvements, maybe a nice TODO of points which > people want to see improved? The bug tracker: sf.net/projects/pypi > - - How are design changes handled, is there a committee to run them > through? People who decide on what gets in and what not? (I'll outline > some of my first thoughts lower in this mail) No. There are virtually no design changes being proposed that actually come with a patch, so nothing needs to be decided on. > - - What are the supported browser versions by PyPI, I reckon it's > IE6/7/8+, Fx 2+, Opera 9+ Safari 4+? What do you mean by "supported"? Officially supported, so that you can make a help desk call if it won't work? None. Or do you mean that the browser should be able to use the site? All of them, plus any other browser you can think of, including Lynx and wget. > The changes I have on my personal 'todo list' are: > - - Add labels to all forms. Please submit a patch. I have no clue what a label of a form is. > - - Make tables consistent width (see for example the table in the top > of the "Browse packages" page and compare with the table when you > actually select one of the classifiers). Again, please submit a patch. > - - Restyle the metadata display on package pages and move it up in the > page. Please submit a patch; this would probably need to get support of this list. > - - Have downloads readily available on the right side of the screen (at > least the latest release). Not sure what that means; please submit a patch. > - - Look sternly at the top right floating account information page. Hmm. Whom do you want to look sternly? > There are more things I want to do, but this is the start. The key here really is "I ... do". This sounds good. Regards, Martin From lists at zopyx.com Thu Jun 17 06:22:32 2010 From: lists at zopyx.com (Andreas Jung) Date: Thu, 17 Jun 2010 06:22:32 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI Message-ID: <4C19A308.5040806@zopyx.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi there, I propose a policy change for packages registered with PyPI: - packages registered on PyPI have at least one release - one release of registered package on PyPI _must_ contain a valid source code distribution (sdist) - packages registered on PyPI without releases or without source code release are subject to be removed after N days after the day of registration Why? Any package registered on PyPI is possibly crucial to any kind of development and deployment. Packages hosted on external servers (referenced through a download_url) are subject to come and go - packages once released should be available at any time from a well-known location (PyPI). Dependencies on the availability of external downloads servers other than PyPI are hardly acceptable for real-world development and deployments. As an example: the Plone CMS buildouts depend on python-openid. This package is registered with PyPI http://pypi.python.org/pypi/python-openid but references to http://openidenabled.com/files/python-openid/packages/python-openid-2.2.4.tar.gz For whatever reason the download URL is no longer working. In fact: openidenabled.com now points to http://www.janrain.com. Other reasons for disappearing package in the past: - network or server outages of external servers - users changed their organization and the organization removed content of their former employees PyPI is a valuable and crucial resource for Python development. It must be kept up-to-date and consistent. I don't care about the arguments that were made in the past against stronger rules ("openness" etc.). There are a lot of Python programmers around that are not Python geeks as most of us are and they just become pissed of when packages come and go or are not in the place where one would expect them. PyPI is a community resource - but community does not mean anarchy where everyone should be able to upload its package crap without looking left and right and having the community and its needs in mind. PyPI must become a stable package index. Everything registered with PyPI must be available at any time (mirrors, distributing PyPI in the cloud...). Andreas - -- ZOPYX Limited | zopyx group Charlottenstr. 37/1 | The full-service network for Zope & Plone D-72070 T?bingen | Produce & Publish www.zopyx.com | www.produce-and-publish.com - ------------------------------------------------------------------------ E-Publishing, Python, Zope & Plone development, Consulting -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwZowgACgkQCJIWIbr9KYyclQCglMaIFnObClOn3sPfwBWbnV1w YboAoL8OSErCHFi0nXD4tbF8VnYgbc/i =3m/N -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: lists.vcf Type: text/x-vcard Size: 316 bytes Desc: not available URL: From ianb at colorstudy.com Thu Jun 17 06:30:46 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Wed, 16 Jun 2010 23:30:46 -0500 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C1919F1.9080506@v.loewis.de> References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> <4C17BBC3.3050205@egenix.com> <4C1919F1.9080506@v.loewis.de> Message-ID: On Wed, Jun 16, 2010 at 1:37 PM, "Martin v. L?wis" wrote: > It is likely that some people will setup a mirror and then "forget" to >> take care >> about it. Like our buildbots really. >> > > > The same can happen to any infrastructure, though. Amazon may decide to > change the setup, and then the automated update procedure would break. > Of course, they would give advance notice - but then somebody would > have to react to that advance notice. > That's not very likely, and if something does change it will be extremely well announced and documented. Amazon is providing a commercial service lots of people rely on, their process is formalized and professionalized. And if Amazon makes mistakes they'll figure out how to avoid them next time, while mirror providers are a rotating crew that is unlikely to easily or reliably learn from past mistakes. If we actually understood each time PyPI broke and fixed it none of this would be a problem; I'm not blaming anyone for that, but it's also not going to change and adding lots of mirror systems just adds more systems with exactly the same management problems that our current system has. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From aclark at aclark.net Thu Jun 17 07:11:41 2010 From: aclark at aclark.net (Alex Clark) Date: Thu, 17 Jun 2010 01:11:41 -0400 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19A308.5040806@zopyx.com> References: <4C19A308.5040806@zopyx.com> Message-ID: Hi, Andreas Jung wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi there, > > I propose a policy change for packages registered with PyPI: > > - packages registered on PyPI have at least one release > > - one release of registered package on PyPI _must_ contain > a valid source code distribution (sdist) > > - packages registered on PyPI without releases or without > source code release are subject to be removed after N days > after the day of registration > > Why? > > Any package registered on PyPI is possibly crucial to any kind of > development and deployment. > > Packages hosted on external servers (referenced through a download_url) > are subject to come and go - packages once released should be available > at any time from a well-known location (PyPI). Dependencies on the > availability of external downloads servers other than PyPI are hardly > acceptable for real-world development and deployments. > > As an example: the Plone CMS buildouts depend on python-openid. > This package is registered with PyPI > > http://pypi.python.org/pypi/python-openid > > but references to > > http://openidenabled.com/files/python-openid/packages/python-openid-2.2.4.tar.gz > > For whatever reason the download URL is no longer working. In fact: > openidenabled.com now points to http://www.janrain.com. FWIW, I have uploaded a local copy of that file to: http://dist.plone.org/thirdparty/python-openid-2.2.4.tar.gz > > Other reasons for disappearing package in the past: > > - network or server outages of external servers > - users changed their organization and the organization removed > content of their former employees > > PyPI is a valuable and crucial resource for Python development. > It must be kept up-to-date and consistent. > > I don't care about the arguments that were made in the past against > stronger rules ("openness" etc.). > > There are a lot of Python programmers around that are not Python geeks > as most of us are and they just become pissed of when packages come and > go or are not in the place where one would expect them. > > PyPI is a community resource - but community does not mean anarchy where > everyone should be able to upload its package crap without looking left > and right and having the community and its needs in mind. > > PyPI must become a stable package index. Everything registered with PyPI > must be available at any time (mirrors, distributing PyPI in the cloud...). > > Andreas > > - -- > ZOPYX Limited | zopyx group > Charlottenstr. 37/1 | The full-service network for Zope& Plone > D-72070 T?bingen | Produce& Publish > www.zopyx.com | www.produce-and-publish.com > - ------------------------------------------------------------------------ > E-Publishing, Python, Zope& Plone development, Consulting > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (Darwin) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAkwZowgACgkQCJIWIbr9KYyclQCglMaIFnObClOn3sPfwBWbnV1w > YboAoL8OSErCHFi0nXD4tbF8VnYgbc/i > =3m/N > -----END PGP SIGNATURE----- > > > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig -- Alex Clark ? http://aclark.net Author ? Plone 3.3 Site Administration ? http://aclark.net/admin From sridharr at activestate.com Thu Jun 17 08:01:08 2010 From: sridharr at activestate.com (Sridhar) Date: Wed, 16 Jun 2010 23:01:08 -0700 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19A308.5040806@zopyx.com> References: <4C19A308.5040806@zopyx.com> Message-ID: <4C19BA24.7020709@activestate.com> On 6/16/2010 9:22 PM, Andreas Jung wrote: > As an example: the Plone CMS buildouts depend on python-openid. > This package is registered with PyPI > > http://pypi.python.org/pypi/python-openid > > but references to > > http://openidenabled.com/files/python-openid/packages/python-openid-2.2.4.tar.gz > > For whatever reason the download URL is no longer working. In fact: > openidenabled.com now points tohttp://www.janrain.com. > This is one of the limitations with z3c.pypimirror that prompted me to write my own "mirroring" solution. I have a configuration file which allows me to "override" package metadata for such "crap" data in PyPI. Things like PyPI entry for a package pointing to an older version of tarball, no tarball at all or broken link such as the one you mentioned here. > PyPI is a valuable and crucial resource for Python development. > It must be kept up-to-date and consistent. > > I don't care about the arguments that were made in the past against > stronger rules ("openness" etc.). > > There are a lot of Python programmers around that are not Python geeks > as most of us are and they just become pissed of when packages come and > go or are not in the place where one would expect them. > > PyPI is a community resource - but community does not mean anarchy where > everyone should be able to upload its package crap without looking left > and right and having the community and its needs in mind. > > PyPI must become a stable package index. Everything registered with PyPI > must be available at any time (mirrors, distributing PyPI in the cloud...). > BTW, I posted a similar proposal in distutils-sig@ before, and it lead to nowhere. I have no hope as to this one either. :-/ So much for participating in a community. -srid From cz at gocept.com Thu Jun 17 08:11:19 2010 From: cz at gocept.com (Christian Zagrodnick) Date: Thu, 17 Jun 2010 08:11:19 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI References: <4C19A308.5040806@zopyx.com> Message-ID: On 2010-06-17 06:22:32 +0200, Andreas Jung said: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi there, > > I propose a policy change for packages registered with PyPI: > > - packages registered on PyPI have at least one release > > - one release of registered package on PyPI _must_ contain > a valid source code distribution (sdist) > > - packages registered on PyPI without releases or without > source code release are subject to be removed after N days > after the day of registration > > Why? > > Any package registered on PyPI is possibly crucial to any kind of > development and deployment. > > Packages hosted on external servers (referenced through a download_url) > are subject to come and go - packages once released should be available > at any time from a well-known location (PyPI). Dependencies on the > availability of external downloads servers other than PyPI are hardly > acceptable for real-world development and deployments. I second that. External download URLs are really a pain. I don't think that removing packages that way would really solve the problem. I think the core is: * Require the package to have a source dist *on* PyPI * Forbid removing any source package. [...] > PyPI must become a stable package index. Everything registered with PyPI > must be available at any time (mirrors, distributing PyPI in the cloud...= > ). ack. -- Christian Zagrodnick ? cz at gocept.com gocept gmbh & co. kg ? forsterstra?e 29 ? 06112 halle (saale) ? germany http://gocept.com ? tel +49 345 1229889 4 ? fax +49 345 1229889 1 Zope and Plone consulting and development From martin at v.loewis.de Thu Jun 17 08:58:40 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 17 Jun 2010 08:58:40 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19A308.5040806@zopyx.com> References: <4C19A308.5040806@zopyx.com> Message-ID: <4C19C7A0.9080800@v.loewis.de> > I propose a policy change for packages registered with PyPI: > > - packages registered on PyPI have at least one release > > - one release of registered package on PyPI _must_ contain > a valid source code distribution (sdist) > > - packages registered on PyPI without releases or without > source code release are subject to be removed after N days > after the day of registration So how would you implement that policy change? Please propose a phased approach, that gives affected people plenty of options to intervene if they disagree with the policy. Regards, Martin From lists at zopyx.com Thu Jun 17 09:09:55 2010 From: lists at zopyx.com (Andreas Jung) Date: Thu, 17 Jun 2010 09:09:55 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19C7A0.9080800@v.loewis.de> References: <4C19A308.5040806@zopyx.com> <4C19C7A0.9080800@v.loewis.de> Message-ID: <4C19CA43.9000509@zopyx.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martin v. L?wis wrote: >> I propose a policy change for packages registered with PyPI: >> >> - packages registered on PyPI have at least one release >> >> - one release of registered package on PyPI _must_ contain >> a valid source code distribution (sdist) >> >> - packages registered on PyPI without releases or without >> source code release are subject to be removed after N days >> after the day of registration > > So how would you implement that policy change? Please propose a phased > approach, that gives affected people plenty of options to intervene if > they disagree with the policy. > It should be fairly easy to figure out affected packages through some DB query (in fact a similar functionality is already implemented on top of the XMLRPC API in my zopyx.trashfinder package). For such packages: send out an email to the package maintainer informing him about the problem and instructing him to fix the problem within N days. After N days: recheck the package state and unregister the package if necessary. Or perhaps a less rude approach: introduce status field for each package (ACTIVE/INACTIVE) and set the state to INACTIVE when the package does not comply with this policy. Inactive packages won't be listed on PyPI and won't be searchable on PyPI. Inactive status should be visible to the author (in logged-in state) with some warning "Package is inactive..please upload your sdist....). Andreas -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwZykMACgkQCJIWIbr9KYy81wCfWjjQ8yTQbhO6xIfqPYiHQHcc 44sAn2YYFxFPHwJ0PywX306DcMOcabix =UtO+ -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: lists.vcf Type: text/x-vcard Size: 316 bytes Desc: not available URL: From marrakis at gmail.com Thu Jun 17 09:27:27 2010 From: marrakis at gmail.com (Mathieu Leduc-Hamel) Date: Thu, 17 Jun 2010 09:27:27 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: <362E7782-303B-4ED1-803A-EA82762F6365@activestate.com> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <362E7782-303B-4ED1-803A-EA82762F6365@activestate.com> Message-ID: Yeah for sure there's different implementation of Pypi in django or with other framework. You can check this one too: http://pypi.python.org/pypi/chishop/0.2.0 But the question was not necessarily how difficult it was to do it but if it would acceptable by the community, but we are on the right list to discuss that. have you try it, is it working properly ? On Wed, Jun 16, 2010 at 7:06 PM, Sridhar Ratnakumar < sridharr at activestate.com> wrote: > > On 2010-06-11, at 1:56 PM, Martin v. L?wis wrote: > > > If you are willing to invest *a lot* of time, then it seems that > rewriting PyPI in Django would make a lot of people happy, because > > they claim they can't contribute to the current code base because > > they don't understand that. I don't want to do such a rewrite on > > my own because I *do* understand the code base (despite not having > written it in the first place, so I think that if you really want > > to contribute, you can learn how it works); it also violates Joel > > Spolsky's principle of never ever doing rewrites. > > FYI: I just happened to stumble upon what claims to be a "re-implementation > of PyPI" in Django: > http://pypi.python.org/pypi/djangopypi/0.4 > > -srid > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Thu Jun 17 09:36:01 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 17 Jun 2010 09:36:01 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19CA43.9000509@zopyx.com> References: <4C19A308.5040806@zopyx.com> <4C19C7A0.9080800@v.loewis.de> <4C19CA43.9000509@zopyx.com> Message-ID: <4C19D061.5020303@v.loewis.de> > For such packages: send out an email to the package maintainer informing > him about the problem and instructing him to fix the problem within N days. > > After N days: recheck the package state and unregister the package if > necessary. > > Or perhaps a less rude approach: introduce status field for each package > (ACTIVE/INACTIVE) and set the state to INACTIVE when the package does > not comply with this policy. Inactive packages won't be listed on PyPI > and won't be searchable on PyPI. Inactive status should be visible > to the author (in logged-in state) with some warning "Package is > inactive..please upload your sdist....). Ok. If nobody opposes to this right now, it's fine with me as well. However, I won't be able to work on this for several months to come. IMO, it's a waste of energy: if a package is useless, just don't use it, and be done. There are many packages on PyPI that are useless to me despite having a source release. Regards, Martin From lists at zopyx.com Thu Jun 17 09:39:52 2010 From: lists at zopyx.com (Andreas Jung) Date: Thu, 17 Jun 2010 09:39:52 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19D061.5020303@v.loewis.de> References: <4C19A308.5040806@zopyx.com> <4C19C7A0.9080800@v.loewis.de> <4C19CA43.9000509@zopyx.com> <4C19D061.5020303@v.loewis.de> Message-ID: <4C19D148.4000308@zopyx.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martin v. L?wis wrote: > IMO, it's a waste of energy: if a package is useless, just don't use it, > and be done. There are many packages on PyPI that are useless to me > despite having a source release. > "useless" is not the point. The "availability" matters - the availability of package must not depend externals servers other than an official PyPI server. Andreas -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwZ0UcACgkQCJIWIbr9KYxILwCfSEEdo+Eod9xYSjIVdrNzbBir X3MAoL/78mNwU52k0K4dkWHkQO+4F//s =Nnpq -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: lists.vcf Type: text/x-vcard Size: 316 bytes Desc: not available URL: From mal at egenix.com Thu Jun 17 09:54:50 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 17 Jun 2010 09:54:50 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19A308.5040806@zopyx.com> References: <4C19A308.5040806@zopyx.com> Message-ID: <4C19D4CA.1090304@egenix.com> Andreas Jung wrote: > Hi there, > > I propose a policy change for packages registered with PyPI: > > - packages registered on PyPI have at least one release I'm not sure what you mean with "release". Every package on PyPI is a release, since it comes with a version number. > - one release of registered package on PyPI _must_ contain > a valid source code distribution (sdist) -100 You'd outrule commercial packages that don't come with a source distribution. PyPI is for everyone, not only for open source packages. Furthermore, not all package authors want to upload their packages to PyPI. And lastly, uploading packages to PyPI (still) has a serious problem: setuptools doesn't know the distinction between UCS2 and UCS4, so uploading eggs for Unix platforms doesn't work out in practice. setuptools also doesn't know that e.g. a Mac OS X fat release may still contain the right binaries for a non-fat build of Python. There are other issues as well, e.g. eGenix produces around 50 release files for every package release amounting to around 150 MB in some cases. It's currently just not feasable to use PyPI for that. > - packages registered on PyPI without releases or without > source code release are subject to be removed after N days > after the day of registration Same as above. > Why? > > Any package registered on PyPI is possibly crucial to any kind of > development and deployment. > > Packages hosted on external servers (referenced through a download_url) > are subject to come and go - packages once released should be available > at any time from a well-known location (PyPI). Dependencies on the > availability of external downloads servers other than PyPI are hardly > acceptable for real-world development and deployments. I think it's for the package users to decide whether they trust a package author to maintain his or her package. That's not something PyPI can change. > As an example: the Plone CMS buildouts depend on python-openid. > This package is registered with PyPI > > http://pypi.python.org/pypi/python-openid > > but references to > > http://openidenabled.com/files/python-openid/packages/python-openid-2.2.4.tar.gz > > For whatever reason the download URL is no longer working. In fact: > openidenabled.com now points to http://www.janrain.com. That's a problem with that particular package, so you should contact the package author. Just because one URL goes away doesn't mean that *all* PyPI package authors who host their software elsewhere are in poor standing. > Other reasons for disappearing package in the past: > > - network or server outages of external servers > - users changed their organization and the organization removed > content of their former employees I'd say you open a support request for PyPI and then let a sys admin add a note to the package or remove the broken download URL. > PyPI is a valuable and crucial resource for Python development. > It must be kept up-to-date and consistent. > > I don't care about the arguments that were made in the past against > stronger rules ("openness" etc.). If that's so, but why should we then care about your arguments ? > There are a lot of Python programmers around that are not Python geeks > as most of us are and they just become pissed of when packages come and > go or are not in the place where one would expect them. That's the nature of the Internet. Besides, would you really want to use a package that's not being maintained anymore ? Even if you do have a source or binary distribution for a package on PyPI, would you really continue to use it if you don't know the author and it hadn't had any release for 3 years ? You can't just blindly rely on things that were uploaded to PyPI and the proposed policy change won't make a difference in that respect. > PyPI is a community resource - but community does not mean anarchy where > everyone should be able to upload its package crap without looking left > and right and having the community and its needs in mind. I think that's asked a bit too much of the package authors. PyPI is just a resource to announce and catalog Python packages, nothing more. > PyPI must become a stable package index. Everything registered with PyPI > must be available at any time (mirrors, distributing PyPI in the cloud...). I agree that everything uploaded to PyPI should be available anytime, but not that everything registered with PyPI also has to be uploaded to PyPI. Making PyPI more reliable will likely increase the number of package authors who trust PyPI to host their packages. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 17 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 31 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Thu Jun 17 09:57:52 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 17 Jun 2010 09:57:52 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19D061.5020303@v.loewis.de> References: <4C19A308.5040806@zopyx.com> <4C19C7A0.9080800@v.loewis.de> <4C19CA43.9000509@zopyx.com> <4C19D061.5020303@v.loewis.de> Message-ID: <4C19D580.7080909@egenix.com> "Martin v. L?wis" wrote: >> For such packages: send out an email to the package maintainer informing >> him about the problem and instructing him to fix the problem within N >> days. >> >> After N days: recheck the package state and unregister the package if >> necessary. >> >> Or perhaps a less rude approach: introduce status field for each package >> (ACTIVE/INACTIVE) and set the state to INACTIVE when the package does >> not comply with this policy. Inactive packages won't be listed on PyPI >> and won't be searchable on PyPI. Inactive status should be visible >> to the author (in logged-in state) with some warning "Package is >> inactive..please upload your sdist....). > > Ok. If nobody opposes to this right now, it's fine with me as well. > However, I won't be able to work on this for several months to come. > > IMO, it's a waste of energy: if a package is useless, just don't use it, > and be done. There are many packages on PyPI that are useless to me > despite having a source release. Agreed. PyPI can't replace the due-diligence that every package user has to apply before making a choice to invest time into using it. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 17 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 31 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From lists at zopyx.com Thu Jun 17 10:05:25 2010 From: lists at zopyx.com (Andreas Jung) Date: Thu, 17 Jun 2010 10:05:25 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19D4CA.1090304@egenix.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> Message-ID: <4C19D745.3050900@zopyx.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 M.-A. Lemburg wrote: > Andreas Jung wrote: >> Hi there, >> >> I propose a policy change for packages registered with PyPI: >> >> - packages registered on PyPI have at least one release > > I'm not sure what you mean with "release". Every package on > PyPI is a release, since it comes with a version number. This is a package without a release: http://pypi.python.org/pypi/python-openid > >> - one release of registered package on PyPI _must_ contain >> a valid source code distribution (sdist) > > -100 > > You'd outrule commercial packages that don't come with a > source distribution. PyPI is for everyone, not only for > open source packages. Commercial package are a special case - I agree. The majority of all PyPI are non-commercial. In addition you could also upload binary release in addition to your own download server. > > Furthermore, not all package authors want to upload their > packages to PyPI. And this is _exactly_ the problem. If you are a package author and want to make your packages available to the public through PyPI, you should be obligated for publishing the related distribution files on PyPI: for the sake of availability and in order for being independent of your own infrastructure. Otherwise I have the (arrogant) opinion: go away - if you are a package author and want to use PyPI: ensure that your software is available to everyone at any time. PyPI is not a kindergarten - PyPI is an important resource for professional Python development. CPAN is better organized and more reliable for more than ten years than PyPI ever was. Andreas - -- ZOPYX Limited | zopyx group Charlottenstr. 37/1 | The full-service network for Zope & Plone D-72070 T?bingen | Produce & Publish www.zopyx.com | www.produce-and-publish.com - ------------------------------------------------------------------------ E-Publishing, Python, Zope & Plone development, Consulting -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwZ10UACgkQCJIWIbr9KYxXwACfSpGgjaEE1Yk9+UYk7nBqodJr cfsAn2SlxwFAhXn/LIiOC4TnOEI0F31t =qxLs -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: lists.vcf Type: text/x-vcard Size: 316 bytes Desc: not available URL: From jannis at leidel.info Thu Jun 17 10:14:31 2010 From: jannis at leidel.info (Jannis Leidel) Date: Thu, 17 Jun 2010 10:14:31 +0200 Subject: [Catalog-sig] PyPI down again... In-Reply-To: References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <362E7782-303B-4ED1-803A-EA82762F6365@activestate.com> Message-ID: <60277BDC-FB55-4901-A7BE-AD67ED6D35E3@leidel.info> Am 17.06.2010 um 09:27 schrieb Mathieu Leduc-Hamel: > Yeah for sure there's different implementation of Pypi in django or with other framework. > > You can check this one too: http://pypi.python.org/pypi/chishop/0.2.0 FYI, djangopypi is a fork of chishop to separate the reusable and example server parts better. I already contributed a few patches lately and will keep working on it over the summer. > But the question was not necessarily how difficult it was to do it but if it would acceptable by the community, but we are on the right list to discuss that. > > have you try it, is it working properly ? It worked in my manual tests but needs more testing with easy_install, et al. If anyone is interested, there is a buildout config included in the repository [1] which should get you up and running quickly. Best, Jannis 1: http://github.com/benliles/chishop > On Wed, Jun 16, 2010 at 7:06 PM, Sridhar Ratnakumar wrote: > > On 2010-06-11, at 1:56 PM, Martin v. L?wis wrote: > > > If you are willing to invest *a lot* of time, then it seems that rewriting PyPI in Django would make a lot of people happy, because > > they claim they can't contribute to the current code base because > > they don't understand that. I don't want to do such a rewrite on > > my own because I *do* understand the code base (despite not having written it in the first place, so I think that if you really want > > to contribute, you can learn how it works); it also violates Joel > > Spolsky's principle of never ever doing rewrites. > > FYI: I just happened to stumble upon what claims to be a "re-implementation of PyPI" in Django: > http://pypi.python.org/pypi/djangopypi/0.4 > > -srid > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig From mal at egenix.com Thu Jun 17 10:28:25 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 17 Jun 2010 10:28:25 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19D745.3050900@zopyx.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> Message-ID: <4C19DCA9.5010308@egenix.com> Andreas Jung wrote: > M.-A. Lemburg wrote: >> Andreas Jung wrote: >>> Hi there, >>> >>> I propose a policy change for packages registered with PyPI: >>> >>> - packages registered on PyPI have at least one release > >> I'm not sure what you mean with "release". Every package on >> PyPI is a release, since it comes with a version number. > > This is a package without a release: > > http://pypi.python.org/pypi/python-openid It has a name and a version number, so it's a release. It may be an unavailable release, just like say, Windows 98, is not available anymore - and that didn't have a source release file to download either :-) And I can see that you've added a comment to the package that the download URL is not working - that's good, since it will warn users to double-check. >>> - one release of registered package on PyPI _must_ contain >>> a valid source code distribution (sdist) > >> -100 > >> You'd outrule commercial packages that don't come with a >> source distribution. PyPI is for everyone, not only for >> open source packages. > > Commercial package are a special case - I agree. The majority > of all PyPI are non-commercial. In addition you could also > upload binary release in addition to your own download server. See my other comments: we might want to do that in the future, but at the moment, uploading 50 release files with around 150MB every time we do a release is not within range. >> Furthermore, not all package authors want to upload their >> packages to PyPI. > > And this is _exactly_ the problem. If you are a package author > and want to make your packages available to the public through PyPI, > you should be obligated for publishing the related distribution > files on PyPI: for the sake of availability and in order for being > independent of your own infrastructure. Otherwise I have the (arrogant) > opinion: go away - if you are a package author and want to use PyPI: > ensure that your software is available to everyone at any time. What about those package authors who host their package elsewhere for various reasons and *do* make sure that their infrastructure is available - even if PyPI is down ? I have the feeling that you had a problem with that one package you mentioned and the proposal was just a reaction to the associated anger with that. It's not fair to start policing all packages on PyPI just because of that one incident you had. > PyPI is not a kindergarten - PyPI is an important resource for > professional Python development. CPAN is better organized and more > reliable for more than ten years than PyPI ever was. To be fair, CPAN has been around a lot longer than PyPI. Regarding reliability of PyPI: as you've probably seen, I'm taking that seriously and want to enhance the reliability of PyPI. Regarding PyPI being used as resource for professional development: the zc.buildout approach has taken that idea a bit far, IMHO. PyPI wasn't designed to be used by automated download and installation tools that install hundreds of packages as opposed to the few packages that users request manually via easy_install. It's good to see, that PyPI can still cope with that approach and pushing the data to the cloud and/or mirror servers will enhance that performance even more. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 17 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 31 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From lists at zopyx.com Thu Jun 17 10:40:15 2010 From: lists at zopyx.com (Andreas Jung) Date: Thu, 17 Jun 2010 10:40:15 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19DCA9.5010308@egenix.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> Message-ID: <4C19DF6F.9050106@zopyx.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 M.-A. Lemburg wrote: > Andreas Jung wrote: >> M.-A. Lemburg wrote: >>> Andreas Jung wrote: >>>> Hi there, >>>> >>>> I propose a policy change for packages registered with PyPI: >>>> >>>> - packages registered on PyPI have at least one release >>> I'm not sure what you mean with "release". Every package on >>> PyPI is a release, since it comes with a version number. >> This is a package without a release: >> >> http://pypi.python.org/pypi/python-openid > > It has a name and a version number, so it's a release. It may > be an unavailable release, just like say, Windows 98, is not > available anymore - and that didn't have a source release > file to download either :-) I don't care if it has a name and a version number. I was not able to work on my project - other co-workers also complained...this is a not acceptable situation...as Python geek I can likely deal with that, others can't :) > > And I can see that you've added a comment to the package > that the download URL is not working - that's good, since > it will warn users to double-check. > >>>> - one release of registered package on PyPI _must_ contain >>>> a valid source code distribution (sdist) >>> -100 >>> You'd outrule commercial packages that don't come with a >>> source distribution. PyPI is for everyone, not only for >>> open source packages. >> Commercial package are a special case - I agree. The majority >> of all PyPI are non-commercial. In addition you could also >> upload binary release in addition to your own download server. > > See my other comments: we might want to do that in the future, > but at the moment, uploading 50 release files with around > 150MB every time we do a release is not within range. Point taken - but as said: your case is likely different. When we do releases in the Zope world we also have to deal with lots of packages...so doable somehow :) >>> Furthermore, not all package authors want to upload their >>> packages to PyPI. >> And this is _exactly_ the problem. If you are a package author >> and want to make your packages available to the public through PyPI, >> you should be obligated for publishing the related distribution >> files on PyPI: for the sake of availability and in order for being >> independent of your own infrastructure. Otherwise I have the (arrogant) >> opinion: go away - if you are a package author and want to use PyPI: >> ensure that your software is available to everyone at any time. > > What about those package authors who host their package > elsewhere for various reasons and *do* make sure that their > infrastructure is available - even if PyPI is down ? > > I have the feeling that you had a problem with that one > package you mentioned and the proposal was just a reaction > to the associated anger with that. > > It's not fair to start policing all packages on PyPI just > because of that one incident you had. We had such issues over and over again over the last years. A typical Zope/Plone installation requires over hundred different packages and we have seen such failures with external servers various times. The workaround was creating PyPI mirrors, project related mirrors or download caches....just workarounds but not really a reliable and working infrastructure.. Andreas - -- ZOPYX Limited | zopyx group Charlottenstr. 37/1 | The full-service network for Zope & Plone D-72070 T?bingen | Produce & Publish www.zopyx.com | www.produce-and-publish.com - ------------------------------------------------------------------------ E-Publishing, Python, Zope & Plone development, Consulting -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwZ328ACgkQCJIWIbr9KYxyrACdESkhtKnlZmyBFc6SMnuY+1an E70AoKrzyzcrCsLMrftXKAfz9UPtbcD5 =QFQd -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: lists.vcf Type: text/x-vcard Size: 316 bytes Desc: not available URL: From mal at egenix.com Thu Jun 17 10:59:53 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 17 Jun 2010 10:59:53 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19DF6F.9050106@zopyx.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> Message-ID: <4C19E409.8060603@egenix.com> Andreas Jung wrote: > M.-A. Lemburg wrote: >>>> Furthermore, not all package authors want to upload their >>>> packages to PyPI. >>> And this is _exactly_ the problem. If you are a package author >>> and want to make your packages available to the public through PyPI, >>> you should be obligated for publishing the related distribution >>> files on PyPI: for the sake of availability and in order for being >>> independent of your own infrastructure. Otherwise I have the (arrogant) >>> opinion: go away - if you are a package author and want to use PyPI: >>> ensure that your software is available to everyone at any time. > >> What about those package authors who host their package >> elsewhere for various reasons and *do* make sure that their >> infrastructure is available - even if PyPI is down ? > >> I have the feeling that you had a problem with that one >> package you mentioned and the proposal was just a reaction >> to the associated anger with that. > >> It's not fair to start policing all packages on PyPI just >> because of that one incident you had. > > We had such issues over and over again over the last years. > A typical Zope/Plone installation requires over hundred different > packages and we have seen such failures with external servers > various times. The workaround was creating PyPI mirrors, project related > mirrors or download caches....just workarounds but not really a reliable > and working infrastructure.. I guess it's better to tell the package authors about your use of their packages and offer them help in hosting their packages on more reliable infrastructures. If that doesn't solve your problem, it's likely better to either setup your own index to override the PyPI one (should be easy to do in zc.buildout and AFAIK at least Plone is already doing that), or you stop using the package and look for alternatives. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 17 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 31 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From lists at zopyx.com Thu Jun 17 11:05:19 2010 From: lists at zopyx.com (Andreas Jung) Date: Thu, 17 Jun 2010 11:05:19 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19E409.8060603@egenix.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> Message-ID: <4C19E54F.6030203@zopyx.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 M.-A. Lemburg wrote: > > I guess it's better to tell the package authors about your > use of their packages and offer them help in hosting their > packages on more reliable infrastructures. > > If that doesn't solve your problem, it's likely better > to either setup your own index to override the PyPI one > (should be easy to do in zc.buildout and AFAIK at least > Plone is already doing that), or you stop using > the package and look for alternatives. Sorry - I disagree completely. As developer I am into developing software and not into building private infrastructure to get around the deficiencies of PyPI and the ignorance of some package maintainers caring about the needs of the developers using their packages. Andreas -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwZ5U8ACgkQCJIWIbr9KYxrVACdH6G8zDI/6RMjAywRSvUhri8M F08Anins1oOc3abEMSc4FZggol0cQjXl =5fgV -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: lists.vcf Type: text/x-vcard Size: 316 bytes Desc: not available URL: From mal at egenix.com Thu Jun 17 11:51:13 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 17 Jun 2010 11:51:13 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19E54F.6030203@zopyx.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> Message-ID: <4C19F011.6010501@egenix.com> Andreas Jung wrote: > M.-A. Lemburg wrote: > > >> I guess it's better to tell the package authors about your >> use of their packages and offer them help in hosting their >> packages on more reliable infrastructures. > > > > >> If that doesn't solve your problem, it's likely better >> to either setup your own index to override the PyPI one >> (should be easy to do in zc.buildout and AFAIK at least >> Plone is already doing that), or you stop using >> the package and look for alternatives. > > Sorry - I disagree completely. As developer I am into developing > software and not into building private infrastructure to get around the > deficiencies of PyPI Well, we're trying to change those ... > and the ignorance of some package maintainers > caring about the needs of the developers using their packages. ... can't help with this, though. Package authors typically have a wide range of motivations to write and share software for others to use. They don't necessarily share your views or see a need to fulfill your particular requirements. If you do have a business requirement to rely on their packages, I'd suggest you'd ask those package authors for a support contract. That would likely help them adapt to your needs ;-) Back to your proposal: In your particular case, I don't see how the proposal would have helped you - under the proposal, the package would have been removed from the PyPI index, so either way, there would have been no working automatic access to the package download links. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 17 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 31 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From kai.diefenbach at iqpp.de Thu Jun 17 12:27:41 2010 From: kai.diefenbach at iqpp.de (Kai Diefenbach) Date: Thu, 17 Jun 2010 12:27:41 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> Message-ID: Hi, On 2010-06-17 11:51:13 +0200, M.-A. Lemburg said: > Back to your proposal: In your particular case, I don't see > how the proposal would have helped you - under the proposal, > the package would have been removed from the PyPI index, > so either way, there would have been no working automatic > access to the package download links. Why? Crap without source code distribution will never be published so no one can ever build a dependency on that. AJ: "packages once released should be available at any time from a well-known location (PyPI)" Problem solved. Kai From mal at egenix.com Thu Jun 17 12:47:06 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 17 Jun 2010 12:47:06 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> Message-ID: <4C19FD2A.3050801@egenix.com> Kai Diefenbach wrote: > Hi, > > On 2010-06-17 11:51:13 +0200, M.-A. Lemburg said: > >> Back to your proposal: In your particular case, I don't see >> how the proposal would have helped you - under the proposal, >> the package would have been removed from the PyPI index, >> so either way, there would have been no working automatic >> access to the package download links. > > Why? > > Crap without source code distribution will never be published so no one > can ever build a dependency on that. > > AJ: "packages once released should be available at any time from a > well-known location (PyPI)" > > Problem solved. Please have a look at the package in question. The only problem with it is that the download URL registered on PyPI no longer works. It redirects to the download page where you can find the source distribution. Not much or a problem for a user searching for the archives. Only a problem for setuptools and zc.buildout that don't ship with enough AI to figure out :-) To get back to your argument: Crap *with* source code distribution would still get published, so people would still build dependencies on it. How does this solve the problem ? Note that Andreas wasn't talking about crappy software, he was only complaining about the fact that automatic downloads via setuptools sometimes don't work for some packages on PyPI. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 17 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 31 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From do3ccqrv at googlemail.com Thu Jun 17 13:20:55 2010 From: do3ccqrv at googlemail.com (Patrick Gerken) Date: Thu, 17 Jun 2010 13:20:55 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19FD2A.3050801@egenix.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> <4C19FD2A.3050801@egenix.com> Message-ID: On Thu, Jun 17, 2010 at 12:47, M.-A. Lemburg wrote: > Kai Diefenbach wrote: > > Hi, > > > > On 2010-06-17 11:51:13 +0200, M.-A. Lemburg said: > > > >> Back to your proposal: In your particular case, I don't see > >> how the proposal would have helped you - under the proposal, > >> the package would have been removed from the PyPI index, > >> so either way, there would have been no working automatic > >> access to the package download links. > > > > Why? > > > > Crap without source code distribution will never be published so no one > > can ever build a dependency on that. > > > > AJ: "packages once released should be available at any time from a > > well-known location (PyPI)" > > > > Problem solved. > > Please have a look at the package in question. The only problem > with it is that the download URL registered on PyPI no longer works. > It redirects to the download page where you can find the source > distribution. > And thats exactly what Andreas' argument is targeting. > Not much or a problem for a user searching for the archives. > > Only a problem for setuptools and zc.buildout that don't ship > with enough AI to figure out :-) > > To get back to your argument: > > Crap *with* source code distribution would still get published, > so people would still build dependencies on it. > > How does this solve the problem ? > Not putting the source release on pypi is just one indicator of crappy software. I agree that this is not a crap indicator for commercial software. There is a big number of users using tools that download tools in an automated fashion from pypi, and it is a reasonable request that source once being published to be available forever. If I understand it correctly, you are against this proposal, that would have protected users of setuptools/distribute/zc.buildouts from problems due to python-openid, because it would disallow the publication of information about commercial packages on pypi? I see a point in that, but what is more important, having a catalog to browse or having a reliable repository of software to download? As a plone user who uses zc.buildout I very much prefer reliable downloads. Its not fun to search for the reason a supposedly repeatable buildout suddenly fails because a company decided to rename itself. How about only listing packages with provided source code on the simple interface? afaik buildout always uses that, so a package python-openid is visible in the end-user view, but not installable via buildout. That way nobody would ever have had created a dependency on it in the first place. Best regards, Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Thu Jun 17 13:40:02 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 17 Jun 2010 13:40:02 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> <4C19FD2A.3050801@egenix.com> Message-ID: <4C1A0992.7070507@egenix.com> Patrick Gerken wrote: > On Thu, Jun 17, 2010 at 12:47, M.-A. Lemburg wrote: > >> Kai Diefenbach wrote: >>> Hi, >>> >>> On 2010-06-17 11:51:13 +0200, M.-A. Lemburg said: >>> >>>> Back to your proposal: In your particular case, I don't see >>>> how the proposal would have helped you - under the proposal, >>>> the package would have been removed from the PyPI index, >>>> so either way, there would have been no working automatic >>>> access to the package download links. >>> >>> Why? >>> >>> Crap without source code distribution will never be published so no one >>> can ever build a dependency on that. >>> >>> AJ: "packages once released should be available at any time from a >>> well-known location (PyPI)" >>> >>> Problem solved. >> >> Please have a look at the package in question. The only problem >> with it is that the download URL registered on PyPI no longer works. >> It redirects to the download page where you can find the source >> distribution. >> > > And thats exactly what Andreas' argument is targeting. > > >> Not much or a problem for a user searching for the archives. >> >> Only a problem for setuptools and zc.buildout that don't ship >> with enough AI to figure out :-) >> > >> To get back to your argument: >> >> Crap *with* source code distribution would still get published, >> so people would still build dependencies on it. >> >> How does this solve the problem ? >> > > Not putting the source release on pypi is just one indicator of crappy > software. > I agree that this is not a crap indicator for commercial software. > > There is a big number of users using tools that download tools in an > automated > fashion from pypi, and it is a reasonable request that source once being > published > to be available forever. > > If I understand it correctly, you are against this proposal, that would have > protected users of setuptools/distribute/zc.buildouts from problems due to > python-openid, because it would disallow the publication of information > about commercial packages on pypi? What I'm saying is that it's better to contact the package authors whose entries cause problems than to force some policy on all PyPI package entries which carelessly puts packages that are not hosted on PyPI into the same category as crappy software. > I see a point in that, but what is more important, having a catalog to > browse or > having a reliable repository of software to download? > > As a plone user who uses zc.buildout I very much prefer reliable downloads. > Its not fun > to search for the reason a supposedly repeatable buildout suddenly fails > because > a company decided to rename itself. It is well possible to delete package listings on PyPI. Wouldn't you rather be informed about this by way of an error report in zc.buildout than by finding that the package name has changed a few years later ? > How about only listing packages with provided source code on the simple > interface? > afaik buildout always uses that, so a package python-openid is visible in > the > end-user view, but not installable via buildout. That way nobody would ever > have had > created a dependency on it in the first place. If such external links are a problem for zc.buildout, why don't you add an option to zc.buildout that prevents using such packages ? This is well possible by checking the /simple index entry for links to package download files: http://pypi.python.org/simple/python-openid/ vs. http://pypi.python.org/simple/zc.buildout/ BTW: what are all those bug links doing on the zc.buildout index page ? They look a lot like a good possibility for injecting trojans. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 17 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 31 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From lists at zopyx.com Thu Jun 17 13:55:40 2010 From: lists at zopyx.com (Andreas Jung) Date: Thu, 17 Jun 2010 13:55:40 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C1A0992.7070507@egenix.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> <4C19FD2A.3050801@egenix.com> <4C1A0992.7070507@egenix.com> Message-ID: <4C1A0D3C.4050402@zopyx.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 M.-A. Lemburg wrote: > What I'm saying is that it's better to contact the package > authors whose entries cause problems than to force some > policy on all PyPI package entries which carelessly puts > packages that are not hosted on PyPI into the same category > as crappy software. In theory yes, in real life no - I approached several package maintainers in the past due to several reasons..some agree with the complaints, others just don't care. Some consider PyPI as their own private repository with their own rules and no need to care about the community e.g. by providing proper metadata (I call this anti-social and PyPI-misuse). Andreas - -- ZOPYX Limited | zopyx group Charlottenstr. 37/1 | The full-service network for Zope & Plone D-72070 T?bingen | Produce & Publish www.zopyx.com | www.produce-and-publish.com - ------------------------------------------------------------------------ E-Publishing, Python, Zope & Plone development, Consulting -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwaDTwACgkQCJIWIbr9KYzFdQCdEGXCwjb/2qsEfzhzRNUK1Dpy Dn8AoNyVoO6F3nMcacmCxeWTOC8muYYO =UkLD -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: lists.vcf Type: text/x-vcard Size: 316 bytes Desc: not available URL: From tseaver at palladion.com Thu Jun 17 14:14:45 2010 From: tseaver at palladion.com (Tres Seaver) Date: Thu, 17 Jun 2010 08:14:45 -0400 Subject: [Catalog-sig] PyPI template improvements In-Reply-To: <4C194755.2060704@v.loewis.de> References: <4C194755.2060704@v.loewis.de> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martin v. L?wis wrote: >> - - What are the supported browser versions by PyPI, I reckon it's >> IE6/7/8+, Fx 2+, Opera 9+ Safari 4+? > > What do you mean by "supported"? Officially supported, so that you can > make a help desk call if it won't work? None. > > Or do you mean that the browser should be able to use the site? All of > them, plus any other browser you can think of, including Lynx and wget. In web app land, "supported browsers" usually means the ones the designer targets: e.g., including "IE >= 7" in the list means that the designer doesn't have to include workarounds for stupid glitches in earlier IEs (or even test the design against those versions). For CSS, this means that the site's appearance will be sometimes wonky when running with an older-than-supported browser version. Features which depend on Javascript may not work at all, or only in degraded mode. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwaEbUACgkQ+gerLs4ltQ5gSACeJwvouqmyCfKDZxDQzD27EBfk CFkAnAlSDA63Gaw79ag4hZA4G7hwjXLU =So/m -----END PGP SIGNATURE----- From tseaver at palladion.com Thu Jun 17 14:22:54 2010 From: tseaver at palladion.com (Tres Seaver) Date: Thu, 17 Jun 2010 08:22:54 -0400 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19D4CA.1090304@egenix.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 M.-A. Lemburg wrote: > And lastly, uploading packages to PyPI (still) has a serious > problem: setuptools doesn't know the distinction between > UCS2 and UCS4, so uploading eggs for Unix platforms doesn't > work out in practice. setuptools also doesn't know that > e.g. a Mac OS X fat release may still contain the right binaries > for a non-fat build of Python. Uploading any 'bdist_egg' build is basically a losing proposition. Windows may be the exception, except that at least a vocal segment of Windows PyPI users prefer 'bdist_wininst' distributions, which can also be consumed by setuptools / distribute. Note however that Andreas' proposal was to require that 'sdists' be uploaded. I personally won't use binary-only packages, but it has historically been true that PyPI was intended to support them, as well as to support registration of packages hosted offsite. Andreas' proposal doesn't address either of those cases. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwaE54ACgkQ+gerLs4ltQ7uBQCbBdAlRDxaiyWZNN3esR5GG/An ZfsAnR83RqzGIx6hO+Ni+eZs2e1U0xkr =Z1kG -----END PGP SIGNATURE----- From lists at zopyx.com Thu Jun 17 14:26:33 2010 From: lists at zopyx.com (Andreas Jung) Date: Thu, 17 Jun 2010 14:26:33 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> Message-ID: <4C1A1479.80909@zopyx.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tres Seaver wrote: > > Note however that Andreas' proposal was to require that 'sdists' be > uploaded. I personally won't use binary-only packages, but it has > historically been true that PyPI was intended to support them, as well > as to support registration of packages hosted offsite. Andreas' > proposal doesn't address either of those cases. A more precise requirement would be: - upload the sdist if your package is open-source - upload the official distribution package if you are package is commercial Basically...upload everything that you would also keep on your own server as official distribution. Andreas -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwaFHkACgkQCJIWIbr9KYwvJgCfW+Ar1vTYyNlDwXfuS31Jvl4M fAsAnR9exynFltTLE0hVwTy7QH8rxvYC =ldIp -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: lists.vcf Type: text/x-vcard Size: 316 bytes Desc: not available URL: From benji at benjiyork.com Thu Jun 17 14:29:49 2010 From: benji at benjiyork.com (Benji York) Date: Thu, 17 Jun 2010 08:29:49 -0400 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C1A0992.7070507@egenix.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> <4C19FD2A.3050801@egenix.com> <4C1A0992.7070507@egenix.com> Message-ID: On Thu, Jun 17, 2010 at 7:40 AM, M.-A. Lemburg wrote: > http://pypi.python.org/simple/zc.buildout/ > > BTW: what are all those bug links doing on the zc.buildout index page ? PyPI scrapes all the links from the long description; for many projects that includes a change log with links to fixed bugs. -- Benji York From ronaldoussoren at mac.com Thu Jun 17 14:38:01 2010 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Thu, 17 Jun 2010 14:38:01 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> <4C19FD2A.3050801@egenix.com> Message-ID: On 17 Jun, 2010, at 13:20, Patrick Gerken wrote: > > Please have a look at the package in question. The only problem > with it is that the download URL registered on PyPI no longer works. > It redirects to the download page where you can find the source > distribution. > > And thats exactly what Andreas' argument is targeting. > Note that even a requirement to upload a package to PyPI won't reliably solve Andreas' problem, the package owner could remove a release or even the entire package. In an ideal world there would be no reasons for removing a package, but as we don't live in such a world there are valid reasons for wanting to remove a package. One example is being sued by some organization that claims you're using their IP without a license. Ronald -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3567 bytes Desc: not available URL: From mal at egenix.com Thu Jun 17 14:46:28 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 17 Jun 2010 14:46:28 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C1A1479.80909@zopyx.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C1A1479.80909@zopyx.com> Message-ID: <4C1A1924.7060302@egenix.com> Andreas Jung wrote: > Tres Seaver wrote: > >> Note however that Andreas' proposal was to require that 'sdists' be >> uploaded. I personally won't use binary-only packages, but it has >> historically been true that PyPI was intended to support them, as well >> as to support registration of packages hosted offsite. Andreas' >> proposal doesn't address either of those cases. > > A more precise requirement would be: > > - upload the sdist if your package is open-source > - upload the official distribution package if you are package > is commercial > > Basically...upload everything that you would also keep on your own > server as official distribution. We cannot force authors to do this. There may be other reasons why they can't upload such things to PyPI, e.g. crypto, trademark and copyright laws, or even corporate rules if the author is maintaining the package as part of his or her job. What we can do, is make it more attractive to upload distribution files to PyPI and also to make the whole "find the right file to download and install" story easy enough for automatic tools to not just give up. For that to work, we'd need to rethink the infrastructure a bit more, though: If more package authors start shipping egg files for the various Unix platforms as both UCS2 and UCS4 and for 3 or 4 different Python versions and keep those files around for several releases, we'll run into problems with having to mirror all those download files. We've been doing this for several years now and it's probably an extreme example, but just as reference: we have almost 6GB of Python archives up on our servers and that's just for ~10 packages. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 17 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 31 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From do3ccqrv at googlemail.com Thu Jun 17 14:54:35 2010 From: do3ccqrv at googlemail.com (Patrick Gerken) Date: Thu, 17 Jun 2010 14:54:35 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> <4C19FD2A.3050801@egenix.com> <4C1A0992.7070507@egenix.com> Message-ID: On Thu, Jun 17, 2010 at 13:40, M.-A. Lemburg wrote: Patrick Gerken wrote: > > > As a plone user who uses zc.buildout I very much prefer reliable > downloads. > > Its not fun > > to search for the reason a supposedly repeatable buildout suddenly fails > > because > > a company decided to rename itself. > > It is well possible to delete package listings on PyPI. Wouldn't > you rather be informed about this by way of an error report in > zc.buildout than by finding that the package name has changed > a few years later ? > I would prefer to have my buildout to be working. I do not always need the newest versions, and we have cases where customers are working with a specific version of plone where some additional packages made backward incompatible changes that prohibit us from using them for these clients. So yes, I prefer working on a potentially outdated version. During development we check regulary for new versions. We have tools for that. > How about only listing packages with provided source code on the simple > interface? > afaik buildout always uses that, so a package python-openid is visible in > the > end-user view, but not installable via buildout. That way nobody would ever > have had > created a dependency on it in the first place. If such external links are a problem for zc.buildout, why don't > you add an option to zc.buildout that prevents using such > packages ? > Because I consider pypi the root cause of the problem. Not the tools. pip also allows repeatable package sets be defining specific version requirements. Should this then be patched too? This is well possible by checking the /simple index entry > for links to package download files: > > http://pypi.python.org/simple/python-openid/ > > vs. > > http://pypi.python.org/simple/zc.buildout/ > > BTW: what are all those bug links doing on the zc.buildout index page ? > They look a lot like a good possibility for injecting trojans. > I don't know. What about the suggestion to show all packages on pypi but not all on the simple view? I can imagine that having your packages advertised on pypi generates reasonable revenue and I am absolutely not against that. But I am against a pypi index that can not promise to keep its advertised packages available. the simple index view is meant for machines, and I'd perfectly happy if constraints suggested by Andreas would only be applied to that simple index. Best regards, Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Jun 17 14:55:48 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 17 Jun 2010 22:55:48 +1000 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: <4C19A308.5040806@zopyx.com> <4C19FD2A.3050801@egenix.com> Message-ID: <201006172255.49175.steve@pearwood.info> On Thu, 17 Jun 2010 09:20:55 pm Patrick Gerken wrote: > Not putting the source release on pypi is just one indicator of > crappy software. Yeah, like that infamous example of crappy software, Numpy. http://pypi.python.org/pypi/numpy/1.4.1 -- Steven D'Aprano From do3ccqrv at googlemail.com Thu Jun 17 15:10:37 2010 From: do3ccqrv at googlemail.com (Patrick Gerken) Date: Thu, 17 Jun 2010 15:10:37 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <201006172255.49175.steve@pearwood.info> References: <4C19A308.5040806@zopyx.com> <4C19FD2A.3050801@egenix.com> <201006172255.49175.steve@pearwood.info> Message-ID: On Thu, Jun 17, 2010 at 14:55, Steven D'Aprano wrote: > On Thu, 17 Jun 2010 09:20:55 pm Patrick Gerken wrote: > > Not putting the source release on pypi is just one indicator of > > crappy software. > > Yeah, like that infamous example of crappy software, Numpy. > > http://pypi.python.org/pypi/numpy/1.4.1 > I am sorry if I offended you. I do not call every software that does not release sources on pypi crappy. I also don't call numpy crappy. Now, please tell me what you would do if sourceforge changes its url and returns a 404 on the old download page. Would you update all release informations? If not, the next time I run a buildout where the configuration requires numpy in an old version and the download link is broken, my buildout breaks too. And there might be reasons why I stick to a specific older version. Thats what I would like to avoid. Best regards, Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Thu Jun 17 15:16:15 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 17 Jun 2010 15:16:15 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> <4C19FD2A.3050801@egenix.com> <4C1A0992.7070507@egenix.com> Message-ID: <4C1A201F.6080609@egenix.com> Benji York wrote: > On Thu, Jun 17, 2010 at 7:40 AM, M.-A. Lemburg wrote: >> http://pypi.python.org/simple/zc.buildout/ >> >> BTW: what are all those bug links doing on the zc.buildout index page ? > > PyPI scrapes all the links from the long description; for many projects > that includes a change log with links to fixed bugs. Isn't that dangerous ? AFAIK, setuptools would start opening all those URLs and might find download files which are not necessarily under full control of the author, e.g. anyone could add a comment to a bug report or wiki page with a link to an egg file on some rogue server. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 17 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 31 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ben+python at benfinney.id.au Thu Jun 17 16:00:11 2010 From: ben+python at benfinney.id.au (Ben Finney) Date: Fri, 18 Jun 2010 00:00:11 +1000 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> Message-ID: <874oh1rc7o.fsf@benfinney.id.au> Andreas Jung writes: > M.-A. Lemburg wrote: > > You'd outrule commercial packages that don't come with a source > > distribution. PyPI is for everyone, not only for open source > > packages. > > Commercial package are a special case - I agree. The majority of all > PyPI are non-commercial. That's irrelevant to whether an sdist is uploaded. Rather, the majority of PyPI packages are free software; whether they are commercial or not is a separate dimension. Commercial is not the opposite of free; proprietary is the opposite of free. Commercial and proprietary are not at all the same thing. -- \ ?Facts are meaningless. You could use facts to prove anything | `\ that's even remotely true!? ?Homer, _The Simpsons_ | _o__) | Ben Finney -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: From mark at geek.net Thu Jun 17 16:15:06 2010 From: mark at geek.net (Mark Ramm) Date: Thu, 17 Jun 2010 10:15:06 -0400 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19C7A0.9080800@v.loewis.de> References: <4C19A308.5040806@zopyx.com> <4C19C7A0.9080800@v.loewis.de> Message-ID: This would also impact projects like turbogears (perhaps we're the only one, I don't know) that point to our own pypi compatable index with the download URL. We do this because then we can fix things like packages with no windows eggs, packages that are broken on PyPi or whatever. And to help control which versions of which packages get installed by settuptools/distribute when you easy_install tg. I'm fine with putting sdists up on pypi, but still want people to be downloading files from our controlled index by default where possible. --Mark Ramm On Thu, Jun 17, 2010 at 2:58 AM, "Martin v. L?wis" wrote: >> I propose a policy change for packages registered with PyPI: >> >> ?- packages registered on PyPI have at least one release >> >> ?- one release of registered package on PyPI _must_ contain >> ? ?a valid source code distribution (sdist) >> >> ?- packages registered on PyPI without releases or without >> ? ?source code release are subject to be removed after N days >> ? ?after the day of registration > > So how would you implement that policy change? Please propose a phased > approach, that gives affected people plenty of options to intervene if > they disagree with the policy. > > Regards, > Martin > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From tseaver at palladion.com Thu Jun 17 16:59:37 2010 From: tseaver at palladion.com (Tres Seaver) Date: Thu, 17 Jun 2010 10:59:37 -0400 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: <4C19A308.5040806@zopyx.com> <4C19C7A0.9080800@v.loewis.de> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Mark Ramm wrote: > This would also impact projects like turbogears (perhaps we're the > only one, I don't know) that point to our own pypi compatable index > with the download URL. Your *index* is the download URL, or the tarball in the index? > We do this because then we can fix things > like packages with no windows eggs, packages that are broken on PyPi > or whatever. And to help control which versions of which packages > get installed by settuptools/distribute when you easy_install tg. > > I'm fine with putting sdists up on pypi, but still want people to be > downloading files from our controlled index by default where possible. Exactly. Anybody who says "repeatable deployment" and "install from PyPI" in the same breath is fooling themselves already. - - People rename projects on PyPI. - - People remove distributions from PyPI. - - People *replace* distributions on PyPI. All of which make it impossible to reliably and repeatably deploy arbitrary software configurations (directly) from PyPI. Managing your own project-specific index is the only real solution. Gonna-shoot-the-next-programmer-who-tells-me-don't-make-me-think'ly Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwaOFQACgkQ+gerLs4ltQ7m4gCeMm5iCTBsZnLIFAY92ivjSs+f uXcAn0NCff1qBu2HscoJzmfB/kQ7v7sA =d2HM -----END PGP SIGNATURE----- From steve at pearwood.info Thu Jun 17 17:11:01 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 18 Jun 2010 01:11:01 +1000 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: <4C19A308.5040806@zopyx.com> Message-ID: <201006180111.02363.steve@pearwood.info> On Thu, 17 Jun 2010 04:11:19 pm Christian Zagrodnick wrote: > On 2010-06-17 06:22:32 +0200, Andreas Jung said: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > Hi there, > > > > I propose a policy change for packages registered with PyPI: > > > > - packages registered on PyPI have at least one release > > > > - one release of registered package on PyPI _must_ contain > > a valid source code distribution (sdist) -1000 Please take your religious wars elsewhere. Python might be open source software, but there is no requirement that only open source software can be written in Python, and PyPI is for all Python developers, not just FOSS developers. > > - packages registered on PyPI without releases or without > > source code release are subject to be removed after N days > > after the day of registration > > > > Why? > > > > Any package registered on PyPI is possibly crucial to any kind of > > development and deployment. Just because it's crucial to you doesn't mean you own it and can dictate what the package owner does with it. The important question here is, who controls the package? Is it the package owner, or PyPI? Your proposal is to give control over the package to PyPI rather than the owner and strip the developer of control in return for indexing the package on PyPI. Not only is that in my opinion rude and unethical, but I expect it will lead to a lot of authors abandoning PyPI. Instead of being the one obvious place to index Python packages, this proposal will fragment the package space. Not where the packages are hosted, but where they are indexed. > > Packages hosted on external servers (referenced through a > > download_url) are subject to come and go - packages once released > > should be available at any time from a well-known location (PyPI). And packages that are crucial to development should be bug-free, so perhaps we should ban packages that contain bugs too? > > Dependencies on the availability of external downloads servers > > other than PyPI are hardly acceptable for real-world development > > and deployments. > > I second that. External download URLs are really a pain. Then don't use them. Problem solved. > I don't think that removing packages that way would really solve the > problem. I think the core is: > > * Require the package to have a source dist *on* PyPI > * Forbid removing any source package. You would FORBID the package author from removing his or her own package? Whiskey-Tango-Foxtrot. There are all sorts of reasons, some good, some bad, why an author might decide to remove his package from public distribution. What gives you the right to decide that he should be prohibited from doing so? -- Steven D'Aprano From l at lrowe.co.uk Thu Jun 17 18:19:03 2010 From: l at lrowe.co.uk (Laurence Rowe) Date: Thu, 17 Jun 2010 09:19:03 -0700 (PDT) Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C1A0992.7070507@egenix.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> <4C19FD2A.3050801@egenix.com> <4C1A0992.7070507@egenix.com> Message-ID: <28916555.post@talk.nabble.com> M.-A. Lemburg wrote: > > If such external links are a problem for zc.buildout, why don't > you add an option to zc.buildout that prevents using such > packages ? > > This is well possible by checking the /simple index entry > for links to package download files: > > http://pypi.python.org/simple/python-openid/ > > vs. > > http://pypi.python.org/simple/zc.buildout/ > > BTW: what are all those bug links doing on the zc.buildout index page ? > They look a lot like a good possibility for injecting trojans. > That's an artefact of setuptools looking for downloadable packages from the download_url or any url linked from the description. If all packages were uploaded to pypi, the simple index would be much simpler. Laurence -- View this message in context: http://old.nabble.com/-Proposal--Registered-packages-must-provide-the-source-code-distribution-on-PyPI-tp28910327p28916555.html Sent from the Python - catalog-sig mailing list archive at Nabble.com. From l at lrowe.co.uk Thu Jun 17 18:37:22 2010 From: l at lrowe.co.uk (Laurence Rowe) Date: Thu, 17 Jun 2010 09:37:22 -0700 (PDT) Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19A308.5040806@zopyx.com> References: <4C19A308.5040806@zopyx.com> Message-ID: <28916768.post@talk.nabble.com> Andreas Jung-5 wrote: > > Hi there, > > I propose a policy change for packages registered with PyPI: > > - packages registered on PyPI have at least one release > > - one release of registered package on PyPI _must_ contain > a valid source code distribution (sdist) > > - packages registered on PyPI without releases or without > source code release are subject to be removed after N days > after the day of registration > > Why? > > Any package registered on PyPI is possibly crucial to any kind of > development and deployment. > > Packages hosted on external servers (referenced through a download_url) > are subject to come and go - packages once released should be available > at any time from a well-known location (PyPI). Dependencies on the > availability of external downloads servers other than PyPI are hardly > acceptable for real-world development and deployments. > > As an example: the Plone CMS buildouts depend on python-openid. > This package is registered with PyPI > > http://pypi.python.org/pypi/python-openid > > but references to > > http://openidenabled.com/files/python-openid/packages/python-openid-2.2.4.tar.gz > > For whatever reason the download URL is no longer working. In fact: > openidenabled.com now points to http://www.janrain.com. > > Other reasons for disappearing package in the past: > > - network or server outages of external servers > - users changed their organization and the organization removed > content of their former employees > > PyPI is a valuable and crucial resource for Python development. > It must be kept up-to-date and consistent. > > I don't care about the arguments that were made in the past against > stronger rules ("openness" etc.). > > There are a lot of Python programmers around that are not Python geeks > as most of us are and they just become pissed of when packages come and > go or are not in the place where one would expect them. > > PyPI is a community resource - but community does not mean anarchy where > everyone should be able to upload its package crap without looking left > and right and having the community and its needs in mind. > > PyPI must become a stable package index. Everything registered with PyPI > must be available at any time (mirrors, distributing PyPI in the > cloud...). > While I agree it would be great if we could enforce source packages being uploaded to pypi (at least for open source packages), agreement on this is looking unlikely. What us buildout users really want is for the simple index to contain a copy of the uploaded files (or at least the source packages). Instead of creating links to other referenced urls in the simple index, setuptools / distribute could be used to fetch the package and store a copy. A flag could be set on indexed proprietary packages to exclude them from the simple index. There would seem to be a great benefit to doing this centrally and mirroring out the result rather than multiple companies maintaining their own individual pypi mirrors. Laurence -- View this message in context: http://old.nabble.com/-Proposal--Registered-packages-must-provide-the-source-code-distribution-on-PyPI-tp28910327p28916768.html Sent from the Python - catalog-sig mailing list archive at Nabble.com. From lists at zopyx.com Thu Jun 17 18:53:41 2010 From: lists at zopyx.com (Andreas Jung) Date: Thu, 17 Jun 2010 18:53:41 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> <4C19FD2A.3050801@egenix.com> Message-ID: <4C1A5315.6000501@zopyx.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ronald Oussoren wrote: > > On 17 Jun, 2010, at 13:20, Patrick Gerken wrote: >> >> >> Please have a look at the package in question. The only problem >> with it is that the download URL registered on PyPI no longer works. >> It redirects to the download page where you can find the source >> distribution. >> >> >> And thats exactly what Andreas' argument is targeting. >> > > Note that even a requirement to upload a package to PyPI won't reliably > solve Andreas' problem, the package owner could remove a release or even > the entire package. Released is released. There are only very few cases where one should be allowed to remove packages (e.g. containing viruses, malware etc.). Otherwise released stuff must not be touched. - -aj -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwaUxUACgkQCJIWIbr9KYxmnACaAwDSSRLdU4wViW+Bql6sKMmt XXkAoLSsgw7A5BIizfZcEqM9WxqnT2+C =j+F8 -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: lists.vcf Type: text/x-vcard Size: 316 bytes Desc: not available URL: From lists at zopyx.com Thu Jun 17 18:57:18 2010 From: lists at zopyx.com (Andreas Jung) Date: Thu, 17 Jun 2010 18:57:18 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C1A1924.7060302@egenix.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C1A1479.80909@zopyx.com> <4C1A1924.7060302@egenix.com> Message-ID: <4C1A53EE.6030806@zopyx.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 M.-A. Lemburg wrote: > Andreas Jung wrote: >> Tres Seaver wrote: >> >>> Note however that Andreas' proposal was to require that 'sdists' be >>> uploaded. I personally won't use binary-only packages, but it has >>> historically been true that PyPI was intended to support them, as well >>> as to support registration of packages hosted offsite. Andreas' >>> proposal doesn't address either of those cases. >> A more precise requirement would be: >> >> - upload the sdist if your package is open-source >> - upload the official distribution package if you are package >> is commercial >> >> Basically...upload everything that you would also keep on your own >> server as official distribution. > > We cannot force authors to do this. There may be other reasons > why they can't upload such things to PyPI, e.g. crypto, trademark > and copyright laws, or even corporate rules if the author is > maintaining the package as part of his or her job. You are once again talking about edge cases. In general the majority of all externally hosted packages are not affected by such issues and should be hosted on PyPI. - -aj Everything that is currently available on external > > If more package authors start shipping egg files for > the various Unix platforms as both UCS2 and UCS4 and for 3 or > 4 different Python versions and keep those files around for > several releases, we'll run into problems with having > to mirror all those download files. There is in general zero need for uploading eggs for various Python versions if the module is Python only. I have seen packages with upload for Python 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 3.0, 3.1 for Python-only packages. This is really nonsense...a single sdist is usally good enough...I bring it to the point: a bunch of Python developer have no idea about package hygiene and use PyPI as package toilet. - -aj - -- ZOPYX Limited | zopyx group Charlottenstr. 37/1 | The full-service network for Zope & Plone D-72070 T?bingen | Produce & Publish www.zopyx.com | www.produce-and-publish.com - ------------------------------------------------------------------------ E-Publishing, Python, Zope & Plone development, Consulting -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwaU+4ACgkQCJIWIbr9KYz2xQCg5HSoNn0Niim6HLA7Q3vtPkzu 0jQAoLo2lovtteUjEl/1Tj8Pxiyec9Th =aN8k -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: lists.vcf Type: text/x-vcard Size: 316 bytes Desc: not available URL: From lists at zopyx.com Thu Jun 17 18:58:31 2010 From: lists at zopyx.com (Andreas Jung) Date: Thu, 17 Jun 2010 18:58:31 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <201006172255.49175.steve@pearwood.info> References: <4C19A308.5040806@zopyx.com> <4C19FD2A.3050801@egenix.com> <201006172255.49175.steve@pearwood.info> Message-ID: <4C1A5437.4090804@zopyx.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Steven D'Aprano wrote: > On Thu, 17 Jun 2010 09:20:55 pm Patrick Gerken wrote: >> Not putting the source release on pypi is just one indicator of >> crappy software. > > Yeah, like that infamous example of crappy software, Numpy. > > http://pypi.python.org/pypi/numpy/1.4.1 > What's wrong with this package? It seems properly packaged, has proper metadata....? - -aj -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwaVDcACgkQCJIWIbr9KYxIBgCg5GGMxE2dd5MIxzRcrsYP9OAV zSIAoIOUgBxT4PRuwLFrwhggZIJhdn6+ =YmPy -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: lists.vcf Type: text/x-vcard Size: 316 bytes Desc: not available URL: From mal at egenix.com Thu Jun 17 19:21:45 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 17 Jun 2010 19:21:45 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C1A53EE.6030806@zopyx.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C1A1479.80909@zopyx.com> <4C1A1924.7060302@egenix.com> <4C1A53EE.6030806@zopyx.com> Message-ID: <4C1A59A9.7030204@egenix.com> Andreas Jung wrote: > M.-A. Lemburg wrote: >> Andreas Jung wrote: >>> Tres Seaver wrote: >>> >>>> Note however that Andreas' proposal was to require that 'sdists' be >>>> uploaded. I personally won't use binary-only packages, but it has >>>> historically been true that PyPI was intended to support them, as well >>>> as to support registration of packages hosted offsite. Andreas' >>>> proposal doesn't address either of those cases. >>> A more precise requirement would be: >>> >>> - upload the sdist if your package is open-source >>> - upload the official distribution package if you are package >>> is commercial >>> >>> Basically...upload everything that you would also keep on your own >>> server as official distribution. > >> We cannot force authors to do this. There may be other reasons >> why they can't upload such things to PyPI, e.g. crypto, trademark >> and copyright laws, or even corporate rules if the author is >> maintaining the package as part of his or her job. > > You are once again talking about edge cases. In general the majority of > all externally hosted packages are not affected by such issues and > should be hosted on PyPI. Well, there's certainly some reason why the authors chose not to host on PyPI. I can only list a few. >> If more package authors start shipping egg files for >> the various Unix platforms as both UCS2 and UCS4 and for 3 or >> 4 different Python versions and keep those files around for >> several releases, we'll run into problems with having >> to mirror all those download files. > > There is in general zero need for uploading eggs for various > Python versions if the module is Python only. I have seen packages > with upload for Python 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 3.0, 3.1 for > Python-only packages. This is really nonsense...a single sdist > is usally good enough...I bring it to the point: a bunch of Python > developer have no idea about package hygiene and use PyPI as package toilet. If you ship Python-only packages with precompiled .pyc/.pyo files, you do need to upload one version per Python version. The marshal format and pyc magic often changes between releases. Some developers probably don't know that if they switch off the pyc compilation step, they'd get a single .egg file for all Python versions they support. In that case, we'd need to educate them, not call them names. If you want more people to upload and host their packages on PyPI, you have to: * make PyPI itself more robust and stable (we're working on that) * improve the tools to make both uploads and downloads easier (perhaps you could help with this) * convince people that their code is in good hands on PyPI (we'd need to get the PyPI terms straightened to help with this part) Suggesting that they can never remove a release from PyPI or are not allowed to rename a package is not going to attract more developers to PyPI. Calling them names, suggesting that their software is crap or that they use PyPI as dump, isn't going to attract anyone either. Anyway, I think I've said everything I wanted to say about this. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 17 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 31 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From lists at zopyx.com Thu Jun 17 19:40:29 2010 From: lists at zopyx.com (Andreas Jung) Date: Thu, 17 Jun 2010 19:40:29 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C1A59A9.7030204@egenix.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C1A1479.80909@zopyx.com> <4C1A1924.7060302@egenix.com> <4C1A53EE.6030806@zopyx.com> <4C1A59A9.7030204@egenix.com> Message-ID: <4C1A5E0D.7060102@zopyx.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 M.-A. Lemburg wrote: > If you ship Python-only packages with precompiled .pyc/.pyo > files, you do need to upload one version per Python version. > The marshal format and pyc magic often changes between releases. Once again: I am talking about the majority of packages that are neither commercial nor shipping without the Python source code. > > * make PyPI itself more robust and stable (we're working on that) PyPI is pretty robust and this has nothing to do with packages hosted externally. > > * improve the tools to make both uploads and downloads > easier (perhaps you could help with this) What can be easier than python setup.py register upload ? Uploading a package to your own server is likely more complicated than an upload to PyPI. > > Suggesting that they can never remove a release from PyPI > or are not allowed to rename a package is not going to > attract more developers to PyPI. I would not care about such developers. Someone renaming or removing a release and (intentionally breaking) the setup of other people acts irresponsible. The basic question is: do we want PyPI being a reliable and valuable community resource or a partly unflushed package toilet? Andreas - -- ZOPYX Limited | zopyx group Charlottenstr. 37/1 | The full-service network for Zope & Plone D-72070 T?bingen | Produce & Publish www.zopyx.com | www.produce-and-publish.com - ------------------------------------------------------------------------ E-Publishing, Python, Zope & Plone development, Consulting -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwaXgsACgkQCJIWIbr9KYybiwCgvi+IexiOksr3vLgjd6CJFDym /ooAoIvYGrXybXMVwaB/7aw7s5Wc15D4 =85d7 -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: lists.vcf Type: text/x-vcard Size: 316 bytes Desc: not available URL: From mark at geek.net Thu Jun 17 19:44:52 2010 From: mark at geek.net (Mark Ramm) Date: Thu, 17 Jun 2010 13:44:52 -0400 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: <4C19A308.5040806@zopyx.com> <4C19C7A0.9080800@v.loewis.de> Message-ID: > Your *index* is the download URL, or the tarball in the index? We don't have a tarball on pypi but the Download URL points to our index: http://pypi.python.org/pypi/TurboGears2/2.0.3 Which contains just: Download URL: http://www.turbogears.org/2.0/downloads/2.0.3/ and easy_install TG gets tg and all it's dependencies from our specific index. I don't care if it works in just exactly this way, but maintaining the ability to create a controlled index is critical to making the turbogears install process repeatable and reliable. Also note, we have a new index url for each release -- so you'll always be able to do a tg install for a specific version with known working results. --Mark Ramm From lists at zopyx.com Thu Jun 17 19:50:54 2010 From: lists at zopyx.com (Andreas Jung) Date: Thu, 17 Jun 2010 19:50:54 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: <4C19A308.5040806@zopyx.com> <4C19C7A0.9080800@v.loewis.de> Message-ID: <4C1A607E.2030904@zopyx.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Mark Ramm wrote: >> Your *index* is the download URL, or the tarball in the index? > > We don't have a tarball on pypi but the Download URL points to our index: > > http://pypi.python.org/pypi/TurboGears2/2.0.3 > > Which contains just: > > Download URL: http://www.turbogears.org/2.0/downloads/2.0.3/ > How do you ensure the availability of the index and the packages at any time? - -aj -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwaYH4ACgkQCJIWIbr9KYx7wQCfaTqgtfXv7qgfLGX2TvjDB1sP 99sAoMxgyK6l7YDvbk/7Ur0IbiSsTXYJ =YwUK -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: lists.vcf Type: text/x-vcard Size: 316 bytes Desc: not available URL: From mark at geek.net Thu Jun 17 19:56:37 2010 From: mark at geek.net (Mark Ramm) Date: Thu, 17 Jun 2010 13:56:37 -0400 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C1A607E.2030904@zopyx.com> References: <4C19A308.5040806@zopyx.com> <4C19C7A0.9080800@v.loewis.de> <4C1A607E.2030904@zopyx.com> Message-ID: > How do you ensure the availability of the index and the packages at > any time? By keeping our server up, and not depending on pypi. If our server goes down, packages will become unavailable, but if you want a mirror for a particular revision of tg and all it's dependencies you can just grab a copy of http://www.turbogears.org/2.0/downloads/2.0.3/ and host it on your own servers at your company. You can always use the -i http://www.turbogears.org/2.0/downloads/2.0.3/ command to skip past pypi completely and just use our (or if you made your own copy, your very own) index. We use http://pypi.python.org/pypi/basketweaver/ to make the index once we've got a pile of eggs and tarballs in a local directory. Which anybody with enough time can do. --Mark Ramm From lists at zopyx.com Thu Jun 17 20:03:47 2010 From: lists at zopyx.com (Andreas Jung) Date: Thu, 17 Jun 2010 20:03:47 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: <4C19A308.5040806@zopyx.com> <4C19C7A0.9080800@v.loewis.de> <4C1A607E.2030904@zopyx.com> Message-ID: <4C1A6383.80105@zopyx.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Mark Ramm wrote: >> How do you ensure the availability of the index and the packages at >> any time? > > By keeping our server up, and not depending on pypi. If our server > goes down, packages will become unavailable, but if you want a mirror > for a particular revision of tg and all it's dependencies you can just > grab a copy of Would you use PyPI as download server or as primary location if it would be more reliable or having a usuable mirroring infrastructure? The point: of course I can create own internal mirror - but do we really want or need that? My business is building software - not mirrors or workarounds for a missing or unreliable package infrastructure. Side note: just checked CPAN - CPAN has 228 official mirrors, PyPI has no official mirrors (only four or five) inofficial mirrors as part of the PyPI mirroring project). Andreas -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwaY4MACgkQCJIWIbr9KYwmSACfbAbmuff4Jboy7UDcecwviTht u9oAn35dq99B6Kqe4/YAZNuzyZ26MhU4 =cAj8 -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: lists.vcf Type: text/x-vcard Size: 316 bytes Desc: not available URL: From mark at geek.net Thu Jun 17 20:15:34 2010 From: mark at geek.net (Mark Ramm) Date: Thu, 17 Jun 2010 14:15:34 -0400 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C1A6383.80105@zopyx.com> References: <4C19A308.5040806@zopyx.com> <4C19C7A0.9080800@v.loewis.de> <4C1A607E.2030904@zopyx.com> <4C1A6383.80105@zopyx.com> Message-ID: > Would you use PyPI as download server or as primary location if it would > be more reliable or having a usuable mirroring infrastructure? No. Because it would still drop old packages, allow people to upload new packages and otherwise make the repeatable builds difficult. I'm most frustrated by the dropping of old packages, but unless I lock down things super tightly in setup.py new versions turn out to break the tg install process often enough that we need more control than pypi provides. > The point: of course I can create own internal mirror - but do we really > want or need that? My business is building software - not mirrors or > workarounds for a missing or unreliable package infrastructure. Well, I need it. I've spent work implementing it, and I want it to continue to be supported, and for my use of this feature to continue to work. If you think it's bad and don't want that, then fine. But I'm more interested in making the tools I have now work now for the users we have now. And making pypi more available doesn't solve my whole problem, and the proposal at the start of the thread, makes it worse for me. > Side note: just checked CPAN - CPAN has 228 official mirrors, PyPI has > no official mirrors (only four or five) inofficial mirrors as part of > the PyPI mirroring project). Yea, more mirrors would be better. No doubt. --Mark Ramm From ianb at colorstudy.com Thu Jun 17 20:33:58 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 17 Jun 2010 13:33:58 -0500 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: <4C19A308.5040806@zopyx.com> <4C19C7A0.9080800@v.loewis.de> <4C1A607E.2030904@zopyx.com> <4C1A6383.80105@zopyx.com> Message-ID: On Thu, Jun 17, 2010 at 1:15 PM, Mark Ramm wrote: > > Would you use PyPI as download server or as primary location if it would > > be more reliable or having a usuable mirroring infrastructure? > > No. Because it would still drop old packages, allow people to upload > new packages and otherwise make the repeatable builds difficult. > It does? I thought PyPI kept everything around (but hidden) unless the author went in and manually deleted old stuff. You just need to go to a deep link, e.g., http://pypi.python.org/pypi/SomePackage/0.1 -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jess.austin at gmail.com Thu Jun 17 21:58:17 2010 From: jess.austin at gmail.com (Jess Austin) Date: Thu, 17 Jun 2010 14:58:17 -0500 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI Message-ID: On Thu, Jun 17, 2010 at 12:40 PM, Andreas Jung wrote: > Once again: I am talking about the majority of packages that are neither > commercial nor shipping without the Python source code. This seems to say either that you don't care about the supposed minority of packages that are "justified" in not releasing or in removing sources, or that it will be easy to differentiate between such packages and the remainder of the packages that are to suffer your procrustean rules. I don't accept, and you certainly haven't made any arguments to support, either of those propositions. >> Suggesting that they can never remove a release from PyPI >> or are not allowed to rename a package is not going to >> attract more developers to PyPI. > > I would not care about such developers. Someone renaming or removing a > release and (intentionally breaking) the setup of other people acts > irresponsible. > > The basic question is: do we want PyPI being a reliable and valuable > community resource or a partly unflushed package toilet? Stipulated, you are unabashed in your lack of care for the needs of other PyPI users, for whom PyPI is already a valuable resource. In response, a question: is there anyone who supports this radical policy change who is NOT a zc.buildout user? Previously in this thread, there have been several plausible suggestions for modifying (improving?) zc.buildout to cope with the issues you've identified. Have you relayed these suggestions to the zc.buildout developers and administrators? Do you know for a fact that zc.buildout can't be fixed? If so, perhaps it should be removed from PyPI; I certainly wouldn't want to rely on it. cheers, Jess From kevin at bud.ca Thu Jun 17 22:18:52 2010 From: kevin at bud.ca (Kevin Teague) Date: Thu, 17 Jun 2010 13:18:52 -0700 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: Message-ID: > Previously in this thread, there have been several plausible > suggestions for modifying (improving?) zc.buildout to cope with the > issues you've identified. Have you relayed these suggestions to the > zc.buildout developers and administrators? Do you know for a fact > that zc.buildout can't be fixed? If so, perhaps it should be removed > from PyPI; I certainly wouldn't want to rely on it. > > Didn't Setuptools/easy_install began this policy of following the download_url from PyPI's early days when it wasn't even possible to upload to PyPI (or at least during the transition when a majority of packages only provided download_urls). easy_install has been repeatedly critiqued for this behaviour. Can anyone say why pip and buildout follow this policy? Has there been any thought to changing the install tools themselves? I know that relying on PyPI doesn't give 100% repeatability, but it does tend much more towards repeatability than following download_urls. I know I'd much rather prefer that these tools require a flag to use this behaviour, since many initially assume that these tools only download from an index and find it quite unexpected that they'll follow links to other servers. -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Thu Jun 17 22:44:24 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 17 Jun 2010 22:44:24 +0200 Subject: [Catalog-sig] PyPI template improvements In-Reply-To: References: <4C194755.2060704@v.loewis.de> Message-ID: <4C1A8928.8090709@v.loewis.de> > In web app land, "supported browsers" usually means the ones the > designer targets: e.g., including "IE>= 7" in the list means that the > designer doesn't have to include workarounds for stupid glitches in > earlier IEs (or even test the design against those versions). > > For CSS, this means that the site's appearance will be sometimes wonky > when running with an older-than-supported browser version. Features > which depend on Javascript may not work at all, or only in degraded mode. I have a really hard time answering that question then: there was no web designer involved in creating PyPI (*). The browser that the *authors* of the service target are really the ones I mentioned: all of them. There is one browser that gets special attention, and flaws relating to it get fixed faster than for any other browser: setuptools. Regards, Martin (*) of course, it uses the layout of python.org, which did have a web designer; for this design, I don't know the answer. From ianb at colorstudy.com Thu Jun 17 22:54:29 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 17 Jun 2010 15:54:29 -0500 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: Message-ID: On Thu, Jun 17, 2010 at 3:18 PM, Kevin Teague wrote: > > Previously in this thread, there have been several plausible >> suggestions for modifying (improving?) zc.buildout to cope with the >> issues you've identified. Have you relayed these suggestions to the >> zc.buildout developers and administrators? Do you know for a fact >> that zc.buildout can't be fixed? If so, perhaps it should be removed >> from PyPI; I certainly wouldn't want to rely on it. >> >> > Didn't Setuptools/easy_install began this policy of following the > download_url from PyPI's early days when it wasn't even possible to upload > to PyPI (or at least during the transition when a majority of packages only > provided download_urls). easy_install has been repeatedly critiqued for this > behaviour. > > Can anyone say why pip and buildout follow this policy? Has there been any > thought to changing the install tools themselves? > To the degree people have tested their installation procedures, they've usually tested that it works with easy_install. easy_install in turn was written to install stuff when there was some sane way to figure out what to install. So the tools are largely reactive. Putting in a hard warning (e.g., one that requires hitting enter) might be okay for some class of problematic behavior. Deeper searching of links could be handled this way, though for now we'd have to actually look in those pages and only warn if something was found... so there'd be many of the same problems but at least a path to removing the behavior completely. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Thu Jun 17 23:17:26 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 17 Jun 2010 23:17:26 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19D148.4000308@zopyx.com> References: <4C19A308.5040806@zopyx.com> <4C19C7A0.9080800@v.loewis.de> <4C19CA43.9000509@zopyx.com> <4C19D061.5020303@v.loewis.de> <4C19D148.4000308@zopyx.com> Message-ID: <4C1A90E6.8010304@v.loewis.de> Am 17.06.2010 09:39, schrieb Andreas Jung: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Martin v. L?wis wrote: > >> IMO, it's a waste of energy: if a package is useless, just don't use it, >> and be done. There are many packages on PyPI that are useless to me >> despite having a source release. >> > > "useless" is not the point. The "availability" matters - the > availability of package must not depend externals servers other than an > official PyPI server. Why is that? You are talk about the Python Package *INDEX*. File hosting is an optional feature of the service. Regards, Martin From martin at v.loewis.de Thu Jun 17 23:21:53 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 17 Jun 2010 23:21:53 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19DF6F.9050106@zopyx.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> Message-ID: <4C1A91F1.3040907@v.loewis.de> > I don't care if it has a name and a version number. I was not able > to work on my project - other co-workers also complained...this > is a not acceptable situation...as Python geek I can likely deal with > that, others can't :) Then complain to the python-openid authors. It's their fault that the package is unavailable, not PyPI's. > We had such issues over and over again over the last years. > A typical Zope/Plone installation requires over hundred different > packages and we have seen such failures with external servers > various times. The workaround was creating PyPI mirrors, project related > mirrors or download caches....just workarounds but not really a reliable > and working infrastructure.. So go through and ask the authors of all these packages to upload to PyPI. Some may comply, others may not. But first and foremost: reduce the set of dependencies. I see a ridiculous growth in dependencies. Consider rewriting small pieces of code instead of depending on a huge library just for a little function. Regards, Martin From martin at v.loewis.de Thu Jun 17 23:27:49 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 17 Jun 2010 23:27:49 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> <4C19FD2A.3050801@egenix.com> Message-ID: <4C1A9355.2030807@v.loewis.de> > I see a point in that, but what is more important, having a catalog to > browse or having a reliable repository of software to download? It's the Python Package Index, so clearly, the catalog function is more important than the reliable repository function. People use PyPI to find out whether a Python module for a certain problem exists. Only some of the users use it to automatically download from it in a regular manner. > How about only listing packages with provided source code on the simple > interface? If you, as a user, have a policy to not use packages which you can't download from PyPI, can't you just ignore those packages when browsing? > afaik buildout always uses that, so a package python-openid is visible > in the > end-user view, but not installable via buildout. That way nobody would > ever have had > created a dependency on it in the first place. Apparently, whoever created the dependency to python-openid didn't worry about this specific issue. FWIW, I evaluated python-openid, and found that it's better to rewrite it than to reuse it (regardless of where it's hosted). Regards, Martin From martin at v.loewis.de Thu Jun 17 23:30:19 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 17 Jun 2010 23:30:19 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C1A0D3C.4050402@zopyx.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> <4C19FD2A.3050801@egenix.com> <4C1A0992.7070507@egenix.com> <4C1A0D3C.4050402@zopyx.com> Message-ID: <4C1A93EB.9020308@v.loewis.de> > In theory yes, in real life no - I approached several package > maintainers in the past due to several reasons..some agree with the > complaints, others just don't care. Some consider PyPI as their own > private repository with their own rules and no need to care about the > community e.g. by providing proper metadata (I call this anti-social and > PyPI-misuse). As the PyPI maintainer, I assure you that it is no misuse. Whether it's anti-social, I don't know. So given that discussion, I'm now opposed to enforcing a policy here. It's not a policy that all users can agree to. Regards, Martin From martin at v.loewis.de Thu Jun 17 23:32:55 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 17 Jun 2010 23:32:55 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C1A201F.6080609@egenix.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> <4C19FD2A.3050801@egenix.com> <4C1A0992.7070507@egenix.com> <4C1A201F.6080609@egenix.com> Message-ID: <4C1A9487.5070108@v.loewis.de> Am 17.06.2010 15:16, schrieb M.-A. Lemburg: > Benji York wrote: >> On Thu, Jun 17, 2010 at 7:40 AM, M.-A. Lemburg wrote: >>> http://pypi.python.org/simple/zc.buildout/ >>> >>> BTW: what are all those bug links doing on the zc.buildout index page ? >> >> PyPI scrapes all the links from the long description; for many projects >> that includes a change log with links to fixed bugs. > > Isn't that dangerous ? > > AFAIK, setuptools would start opening all those URLs and might > find download files which are not necessarily under full control of > the author, e.g. anyone could add a comment to a bug report or > wiki page with a link to an egg file on some rogue server. I think you misunderstand. Links originate *only* from the long description. The package owner has full control over that. If you think the package owner is opening up a security threat by including the links in the first place - yes, that's indeed a risk. Regards, Martin From martin at v.loewis.de Thu Jun 17 23:35:03 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 17 Jun 2010 23:35:03 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C1A5315.6000501@zopyx.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> <4C19FD2A.3050801@egenix.com> <4C1A5315.6000501@zopyx.com> Message-ID: <4C1A9507.5090302@v.loewis.de> >> Note that even a requirement to upload a package to PyPI won't reliably >> solve Andreas' problem, the package owner could remove a release or even >> the entire package. > > Released is released. There are only very few cases where one should be > allowed to remove packages (e.g. containing viruses, malware etc.). > Otherwise released stuff must not be touched. Not at all. If a package owner decides to delete a package, the package is completely erased from PyPI. This is how it is, and how it should be. PyPI has no right to keep the file against the author's will. Regards, Martin From martin at v.loewis.de Thu Jun 17 23:36:39 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 17 Jun 2010 23:36:39 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: <4C19A308.5040806@zopyx.com> <4C19FD2A.3050801@egenix.com> <201006172255.49175.steve@pearwood.info> Message-ID: <4C1A9567.1010703@v.loewis.de> > Now, please tell me what you would do if sourceforge changes its url and > returns a > 404 on the old download page. Would you update all release informations? > If not, the next time I run a buildout where the configuration requires > numpy in an old version > and the download link is broken, my buildout breaks too. And there might > be reasons why > I stick to a specific older version. > Thats what I would like to avoid. Maybe you should stop using buildout then, and switch to Debian packages. They typically get the dependencies right, and available. Regards, Martin From martin at v.loewis.de Thu Jun 17 23:41:58 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 17 Jun 2010 23:41:58 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: <4C19A308.5040806@zopyx.com> <4C19C7A0.9080800@v.loewis.de> <4C1A607E.2030904@zopyx.com> <4C1A6383.80105@zopyx.com> Message-ID: <4C1A96A6.3050101@v.loewis.de> > It does? I thought PyPI kept everything around (but hidden) unless the > author went in and manually deleted old stuff. You just need to go to a > deep link, e.g., http://pypi.python.org/pypi/SomePackage/0.1 Sure, but owners *do* manually delete old stuff. Regards, Martin From martin at v.loewis.de Thu Jun 17 23:45:08 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 17 Jun 2010 23:45:08 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <28916768.post@talk.nabble.com> References: <4C19A308.5040806@zopyx.com> <28916768.post@talk.nabble.com> Message-ID: <4C1A9764.9080408@v.loewis.de> > What us buildout users really want is for the simple index to contain a copy > of the uploaded files (or at least the source packages). Instead of creating > links to other referenced urls in the simple index, setuptools / distribute > could be used to fetch the package and store a copy. A flag could be set on > indexed proprietary packages to exclude them from the simple index. > > There would seem to be a great benefit to doing this centrally and mirroring > out the result rather than multiple companies maintaining their own > individual pypi mirrors. I can understand the need, but I would propose an entirely different solution: Have buildout, by default, reject downloads from a different server. Then, when you create the dependency, you already notice the problem, and may chose to drop the dependency. I don't think any policy change will force users to upload if they really don't want to. Instead, the major effect of the policy (apparently) would be that they stop registering with PyPI. Regards, Martin From martin at v.loewis.de Thu Jun 17 23:51:21 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 17 Jun 2010 23:51:21 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: Message-ID: <4C1A98D9.6070707@v.loewis.de> > Didn't Setuptools/easy_install began this policy of following the > download_url from PyPI's early days when it wasn't even possible to > upload to PyPI (or at least during the transition when a majority of > packages only provided download_urls). Not sure whether this was rhetoric: yes, that's how it all started. PyPI/the cheeseshop was originally *just* a package index, and designed as such. Automated downloads wheren't even considered, but the objective was to give people a way of registering and finding Python software (because the manually-maintained lists of Python software started to rot). I added file upload at some point, primarily because people asked for it who didn't have any web hosting elsewhere. It was assumed that most packages would release somewhere to the net, and only few packages would use the file upload. FWIW, the documentation upload started with the very same assumption, and it's probably still the case that people host documentation at PyPI only if they have nothing better. Regards, Martin From ronaldoussoren at mac.com Thu Jun 17 23:40:13 2010 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Thu, 17 Jun 2010 23:40:13 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C1A5315.6000501@zopyx.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> <4C19FD2A.3050801@egenix.com> <4C1A5315.6000501@zopyx.com> Message-ID: On Jun 17, 2010, at 18:53, Andreas Jung wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Ronald Oussoren wrote: >> >> On 17 Jun, 2010, at 13:20, Patrick Gerken wrote: >>> >>> >>> Please have a look at the package in question. The only problem >>> with it is that the download URL registered on PyPI no longer works. >>> It redirects to the download page where you can find the source >>> distribution. >>> >>> >>> And thats exactly what Andreas' argument is targeting. >>> >> >> Note that even a requirement to upload a package to PyPI won't reliably >> solve Andreas' problem, the package owner could remove a release or even >> the entire package. > > Released is released. There are only very few cases where one should be > allowed to remove packages (e.g. containing viruses, malware etc.). > Otherwise released stuff must not be touched. I agree that it would in mist cases be better to keep releases around, but a developer might not have the option to do so for legal reasons. And as someone else noted uploading to pypi might not be possible either for legal reasons, such as for cryptographic software. Ronald > > - -aj > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (Darwin) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAkwaUxUACgkQCJIWIbr9KYxmnACaAwDSSRLdU4wViW+Bql6sKMmt > XXkAoLSsgw7A5BIizfZcEqM9WxqnT2+C > =j+F8 > -----END PGP SIGNATURE----- > From ben+python at benfinney.id.au Fri Jun 18 01:20:35 2010 From: ben+python at benfinney.id.au (Ben Finney) Date: Fri, 18 Jun 2010 09:20:35 +1000 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI References: <4C19A308.5040806@zopyx.com> <201006180111.02363.steve@pearwood.info> Message-ID: <87vd9hp7p8.fsf@benfinney.id.au> Steven D'Aprano writes: > On Thu, 17 Jun 2010 04:11:19 pm Christian Zagrodnick wrote: > > On 2010-06-17 06:22:32 +0200, Andreas Jung said: > > > - one release of registered package on PyPI _must_ contain > > > a valid source code distribution (sdist) > > -1000 > > Please take your religious wars elsewhere. Please address the substance of the proposal. It's neither religious, nor anything to do with war. > Python might be open source software, but there is no requirement that > only open source software can be written in Python, and PyPI is for > all Python developers, not just FOSS developers. True enough. It could be otherwise, though, so the proposal is hardly deserving of the slurs you hurled in your first paragraphs. -- \ ?We now have access to so much information that we can find | `\ support for any prejudice or opinion.? ?David Suzuki, 2008-06-27 | _o__) | Ben Finney From domen at dev.si Fri Jun 18 01:29:09 2010 From: domen at dev.si (Domen =?UTF-8?Q?Ko=C5=BEar?=) Date: Fri, 18 Jun 2010 01:29:09 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI Message-ID: <1276817349.5093.19.camel@oblak> I'm looking at PyPi as infrastructure and upstream source for Linux distributions. * Renaming packages I would strongly say NO to this one. Once you make a release, don't change it. If mistake in metadata/packaging was done, make new release like 1.0 -> 1.0-r1 * Source code requirement This one really depends on the main purpose of PyPi. If it's only there as provider of metadata garbage, then no rules should be applied. If it's main goal is to provide downloadable package companied with metadata, then source could be an requirement. Companies using PyPi as index of metadata, that's nonsense. They can setup their own pypi mirror and that would even be a more proper way. My 2 cents, Domen -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 490 bytes Desc: This is a digitally signed message part URL: From steve at pearwood.info Fri Jun 18 04:21:16 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 18 Jun 2010 12:21:16 +1000 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C1A5437.4090804@zopyx.com> References: <4C19A308.5040806@zopyx.com> <201006172255.49175.steve@pearwood.info> <4C1A5437.4090804@zopyx.com> Message-ID: <201006181221.16797.steve@pearwood.info> On Fri, 18 Jun 2010 02:58:31 am Andreas Jung wrote: > Steven D'Aprano wrote: > > On Thu, 17 Jun 2010 09:20:55 pm Patrick Gerken wrote: > >> Not putting the source release on pypi is just one indicator of > >> crappy software. > > > > Yeah, like that infamous example of crappy software, Numpy. > > > > http://pypi.python.org/pypi/numpy/1.4.1 > > What's wrong with this package? It seems properly packaged, has > proper metadata....? And it is hosted on Sourceforge. But you're right, there is nothing wrong with the package. That includes the fact that it's hosted external to PyPI. Why should we force numpy to change? Whatever their reasons for hosting on Sourceforge, it is their package and their choice and we should respect that and not try to dictate where they host it. -- Steven D'Aprano From steve at pearwood.info Fri Jun 18 04:35:04 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 18 Jun 2010 12:35:04 +1000 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C1A5E0D.7060102@zopyx.com> References: <4C19A308.5040806@zopyx.com> <4C1A59A9.7030204@egenix.com> <4C1A5E0D.7060102@zopyx.com> Message-ID: <201006181235.04598.steve@pearwood.info> On Fri, 18 Jun 2010 03:40:29 am Andreas Jung wrote: > M.-A. Lemburg wrote: > > If you ship Python-only packages with precompiled .pyc/.pyo > > files, you do need to upload one version per Python version. > > The marshal format and pyc magic often changes between releases. > > Once again: I am talking about the majority of packages that are > neither commercial nor shipping without the Python source code. Firstly, commercial is not the opposite of source-code provided. Why do so many FOSS advocates insist on giving the message that it is? *Closed source* is the opposite of open source. You earlier said that PyPI should force all packages to include source code. Are you now saying that PyPI should only force packages to include source code if they include source code, that is, that package owners can opt-out of this rule "you must provide source code" by simply not providing source code? If not, then what exactly are you saying? > > * make PyPI itself more robust and stable (we're working on that) > > PyPI is pretty robust and this has nothing to do with packages hosted > externally. "Pretty robust" isn't robust enough, which is why there are proposals to shift PyPI to a commercial high-availability hosting service *and* to mirror it extensively. As for the second part of your statement, of course externally hosted packages don't increase the stability of PyPI itself, but they limit the harm from any single outage and distribute the load over the entire internet rather than one single site. External hosting is "don't put all your eggs in one basket", as well as "competition between hosting providers" and "freedom of choice". After all, PyPI is intended to be an *index* of Python software, not a hosting service. The hosting is an optional bonus. Don't think I'm not grateful for that, but I object strongly to your suggestion that I should be *forced* to host my packages on PyPI if I want to register the package there. > > Suggesting that they can never remove a release from PyPI > > or are not allowed to rename a package is not going to > > attract more developers to PyPI. > > I would not care about such developers. Then don't use their packages, but don't stop other people from using them. > The basic question is: do we want PyPI being a reliable and valuable > community resource or a partly unflushed package toilet? The basic question is, who has the right to control the packages indexed on PyPI? Is it the package author, or you? -- Steven D'Aprano From ben+python at benfinney.id.au Fri Jun 18 04:57:33 2010 From: ben+python at benfinney.id.au (Ben Finney) Date: Fri, 18 Jun 2010 12:57:33 +1000 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI References: <4C19A308.5040806@zopyx.com> <4C1A59A9.7030204@egenix.com> <4C1A5E0D.7060102@zopyx.com> <201006181235.04598.steve@pearwood.info> Message-ID: <87mxutoxnm.fsf@benfinney.id.au> Steven D'Aprano writes: > On Fri, 18 Jun 2010 03:40:29 am Andreas Jung wrote: > > The basic question is: do we want PyPI being a reliable and valuable > > community resource or a partly unflushed package toilet? > > The basic question is, who has the right to control the packages indexed > on PyPI? Is it the package author, or you? That doesn't seem to be a question that addresses Andreas's argument (as I understand it). I don't see Andreas arguing for anyone but the copyright holder to have control of the *package*. Rather, a more germane question would be: Who has the right to control *which* packages get indexed at PyPI (of all those that might be submitted to the index)? My understanding is that Andreas is arguing that PyPI does, and should, have that control; and that control can be exercised in different ways. -- \ ?Often, the surest way to convey misinformation is to tell the | `\ strict truth.? ?Mark Twain, _Following the Equator_ | _o__) | Ben Finney From lists at zopyx.com Fri Jun 18 05:35:44 2010 From: lists at zopyx.com (Andreas Jung) Date: Fri, 18 Jun 2010 05:35:44 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C1A93EB.9020308@v.loewis.de> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> <4C19FD2A.3050801@egenix.com> <4C1A0992.7070507@egenix.com> <4C1A0D3C.4050402@zopyx.com> <4C1A93EB.9020308@v.loewis.de> Message-ID: <4C1AE990.8030901@zopyx.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martin v. L?wis wrote: >> In theory yes, in real life no - I approached several package >> maintainers in the past due to several reasons..some agree with the >> complaints, others just don't care. Some consider PyPI as their own >> private repository with their own rules and no need to care about the >> community e.g. by providing proper metadata (I call this anti-social and >> PyPI-misuse). > > As the PyPI maintainer, I assure you that it is no misuse. Whether it's > anti-social, I don't know. ok - so you claim that it should be allowed for everyone to unload its unlabeled garbage on public places? > > So given that discussion, I'm now opposed to enforcing a policy here. Ok - so we have live with PyPI as a package dumpster. Andreas -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwa6Y8ACgkQCJIWIbr9KYyH5ACcDgFo+H3fjpUWyAWc8L/V+dcP MU8AnjdVWUfmaPSl3l74kOYAg/rKhdTv =Ht7H -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: lists.vcf Type: text/x-vcard Size: 316 bytes Desc: not available URL: From lists at zopyx.com Fri Jun 18 05:49:46 2010 From: lists at zopyx.com (Andreas Jung) Date: Fri, 18 Jun 2010 05:49:46 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C19A308.5040806@zopyx.com> References: <4C19A308.5040806@zopyx.com> Message-ID: <4C1AECDA.2000009@zopyx.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I retract this proposal and accepting the fact that obviously nobody outside the Zope/Plone world is really interested in bringing PyPI forward and putting the freedom to register and upload packages in whatever state to PyPI over the needs of a well-maintained and reliable package index. After almost 20 years I am still under the impression that we are still in the kindergarten. Deeply frustrated, Andreas Andreas Jung wrote: > Hi there, > > I propose a policy change for packages registered with PyPI: > > - packages registered on PyPI have at least one release > > - one release of registered package on PyPI _must_ contain > a valid source code distribution (sdist) > > - packages registered on PyPI without releases or without > source code release are subject to be removed after N days > after the day of registration > > Why? > > Any package registered on PyPI is possibly crucial to any kind of > development and deployment. > > Packages hosted on external servers (referenced through a download_url) > are subject to come and go - packages once released should be available > at any time from a well-known location (PyPI). Dependencies on the > availability of external downloads servers other than PyPI are hardly > acceptable for real-world development and deployments. > > As an example: the Plone CMS buildouts depend on python-openid. > This package is registered with PyPI > > http://pypi.python.org/pypi/python-openid > > but references to > > http://openidenabled.com/files/python-openid/packages/python-openid-2.2.4.tar.gz > > For whatever reason the download URL is no longer working. In fact: > openidenabled.com now points to http://www.janrain.com. > > Other reasons for disappearing package in the past: > > - network or server outages of external servers > - users changed their organization and the organization removed > content of their former employees > > PyPI is a valuable and crucial resource for Python development. > It must be kept up-to-date and consistent. > > I don't care about the arguments that were made in the past against > stronger rules ("openness" etc.). > > There are a lot of Python programmers around that are not Python geeks > as most of us are and they just become pissed of when packages come and > go or are not in the place where one would expect them. > > PyPI is a community resource - but community does not mean anarchy where > everyone should be able to upload its package crap without looking left > and right and having the community and its needs in mind. > > PyPI must become a stable package index. Everything registered with PyPI > must be available at any time (mirrors, distributing PyPI in the cloud...). > > Andreas > - ------------------------------------------------------------------------ _______________________________________________ Catalog-SIG mailing list Catalog-SIG at python.org http://mail.python.org/mailman/listinfo/catalog-sig - -- ZOPYX Limited | zopyx group Charlottenstr. 37/1 | The full-service network for Zope & Plone D-72070 T?bingen | Produce & Publish www.zopyx.com | www.produce-and-publish.com - ------------------------------------------------------------------------ E-Publishing, Python, Zope & Plone development, Consulting -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwa7NoACgkQCJIWIbr9KYxOpgCcD6DBM0ThxmShMrOzFQEAJkye ZVoAoMavJSWWfTg/3ahy1X3bQ5PN7bLk =7/GJ -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: lists.vcf Type: text/x-vcard Size: 316 bytes Desc: not available URL: From fdrake at acm.org Fri Jun 18 06:02:28 2010 From: fdrake at acm.org (Fred Drake) Date: Fri, 18 Jun 2010 00:02:28 -0400 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: Message-ID: On Thu, Jun 17, 2010 at 3:58 PM, Jess Austin wrote: > In response, a question: is there anyone who supports this radical policy > change who is NOT a zc.buildout user? I'm a zc.buildout user, and I *don't* support this policy change. This change is entirely unnecessary. -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From martin at v.loewis.de Fri Jun 18 07:53:18 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 18 Jun 2010 07:53:18 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C1AE990.8030901@zopyx.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> <4C19FD2A.3050801@egenix.com> <4C1A0992.7070507@egenix.com> <4C1A0D3C.4050402@zopyx.com> <4C1A93EB.9020308@v.loewis.de> <4C1AE990.8030901@zopyx.com> Message-ID: <4C1B09CE.5080608@v.loewis.de> >> As the PyPI maintainer, I assure you that it is no misuse. Whether it's >> anti-social, I don't know. > > ok - so you claim that it should be allowed for everyone to unload its > unlabeled garbage on public places? Correct - as long as it's a Python package. Mere spam will be deleted. >> >> So given that discussion, I'm now opposed to enforcing a policy here. > > Ok - so we have live with PyPI as a package dumpster. Indeed. I don't think that requiring a source upload would change that. Regards, Martin From simon at ikanobori.jp Fri Jun 18 10:24:32 2010 From: simon at ikanobori.jp (Simon de Vlieger) Date: Fri, 18 Jun 2010 10:24:32 +0200 Subject: [Catalog-sig] PyPI template improvements In-Reply-To: <4C1A8928.8090709@v.loewis.de> References: <4C194755.2060704@v.loewis.de> <4C1A8928.8090709@v.loewis.de> Message-ID: On 17 jun 2010, at 22:44, Martin v. L?wis wrote: >> In web app land, "supported browsers" usually means the ones the >> designer targets: e.g., including "IE>= 7" in the list means that >> the >> designer doesn't have to include workarounds for stupid glitches in >> earlier IEs (or even test the design against those versions). >> >> For CSS, this means that the site's appearance will be sometimes >> wonky >> when running with an older-than-supported browser version. Features >> which depend on Javascript may not work at all, or only in degraded >> mode. > > I have a really hard time answering that question then: there was no > web designer involved in creating PyPI (*). The browser that the > *authors* of the service target are really the ones I mentioned: all > of them. > > There is one browser that gets special attention, and flaws relating > to it get fixed faster than for any other browser: setuptools. > > Regards, > Martin > > (*) of course, it uses the layout of python.org, which did have a > web designer; for this design, I don't know the answer. Martin, a question from me. Does setuptools browse the main pypi pages or does it use the simple version? Another question is, if there is a need for Javascript on the page (don't worry about making it unaccessible, I'll make everything degrade nicely) am I allowed to include JavaScript framework. Right now I'm looking at jQuery (http://jquery.com/) or would there be something against this? I have already done a few items from my list and a few of the items which were proposed by the distutils-sig mailinglist. Over the weekend I'm looking at doing a nice chunk of work. Regards, Simon de Vlieger From mal at egenix.com Fri Jun 18 10:33:42 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 18 Jun 2010 10:33:42 +0200 Subject: [Catalog-sig] PyPI template improvements In-Reply-To: References: <4C194755.2060704@v.loewis.de> <4C1A8928.8090709@v.loewis.de> Message-ID: <4C1B2F66.1050502@egenix.com> Simon de Vlieger wrote: > On 17 jun 2010, at 22:44, Martin v. L?wis wrote: > >>> In web app land, "supported browsers" usually means the ones the >>> designer targets: e.g., including "IE>= 7" in the list means that the >>> designer doesn't have to include workarounds for stupid glitches in >>> earlier IEs (or even test the design against those versions). >>> >>> For CSS, this means that the site's appearance will be sometimes wonky >>> when running with an older-than-supported browser version. Features >>> which depend on Javascript may not work at all, or only in degraded >>> mode. >> >> I have a really hard time answering that question then: there was no >> web designer involved in creating PyPI (*). The browser that the >> *authors* of the service target are really the ones I mentioned: all >> of them. >> >> There is one browser that gets special attention, and flaws relating >> to it get fixed faster than for any other browser: setuptools. >> >> Regards, >> Martin >> >> (*) of course, it uses the layout of python.org, which did have a web >> designer; for this design, I don't know the answer. > > Martin, > > a question from me. Does setuptools browse the main pypi pages or does > it use the simple version? setuptools used to parse the main web pages of PyPI. This was then changed and the /simple index invented. All recent versions of setuptools default to using the /simple index. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 18 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 30 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From do3ccqrv at googlemail.com Fri Jun 18 10:49:04 2010 From: do3ccqrv at googlemail.com (Patrick Gerken) Date: Fri, 18 Jun 2010 10:49:04 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: <4C19A308.5040806@zopyx.com> <4C19C7A0.9080800@v.loewis.de> Message-ID: On Thu, Jun 17, 2010 at 16:59, Tres Seaver wrote: > All of which make it impossible to reliably and repeatably deploy > arbitrary software configurations (directly) from PyPI. Managing your > own project-specific index is the only real solution. > When I provide buildout configurations for open source packages I don't like to provide a custom index for them. It increases the effort if they would like to test the same package with newer versions, want to update the known good set or would like to extend the package. For customer project its a very good solution, I agree. Best regards, Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Fri Jun 18 11:10:43 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 18 Jun 2010 11:10:43 +0200 Subject: [Catalog-sig] Extra links on the PyPI /simple index package pages In-Reply-To: <4C1A9487.5070108@v.loewis.de> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> <4C19FD2A.3050801@egenix.com> <4C1A0992.7070507@egenix.com> <4C1A201F.6080609@egenix.com> <4C1A9487.5070108@v.loewis.de> Message-ID: <4C1B3813.3010102@egenix.com> "Martin v. L?wis" wrote: > Am 17.06.2010 15:16, schrieb M.-A. Lemburg: >> Benji York wrote: >>> On Thu, Jun 17, 2010 at 7:40 AM, M.-A. Lemburg wrote: >>>> http://pypi.python.org/simple/zc.buildout/ >>>> >>>> BTW: what are all those bug links doing on the zc.buildout index page ? >>> >>> PyPI scrapes all the links from the long description; for many projects >>> that includes a change log with links to fixed bugs. >> >> Isn't that dangerous ? >> >> AFAIK, setuptools would start opening all those URLs and might >> find download files which are not necessarily under full control of >> the author, e.g. anyone could add a comment to a bug report or >> wiki page with a link to an egg file on some rogue server. > > I think you misunderstand. Links originate *only* from the long > description. The package owner has full control over that. I was referring to the linked assets that the package owner may not have full control over, e.g. in the above case, you have links pointing to launchpad and one to "file://". Such links (except the file:// one) can be useful in the package description, e.g. to point to a bug tracking system, documentation or other resources, but they are not really needed to point setuptools to download locations. > If you think the package owner is opening up a security threat by > including the links in the first place - yes, that's indeed a risk. Is this feature still needed for setuptools ? We have download URLs and homepage URLs which should be enough for setuptools to search and find the links to package download files. If it's no longer needed, then it'd be safer not to include the long description links on the /simple index pages anymore. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 18 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 30 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ianb at colorstudy.com Fri Jun 18 17:57:10 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 18 Jun 2010 10:57:10 -0500 Subject: [Catalog-sig] Extra links on the PyPI /simple index package pages In-Reply-To: <4C1B3813.3010102@egenix.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> <4C19FD2A.3050801@egenix.com> <4C1A0992.7070507@egenix.com> <4C1A201F.6080609@egenix.com> <4C1A9487.5070108@v.loewis.de> <4C1B3813.3010102@egenix.com> Message-ID: On Fri, Jun 18, 2010 at 4:10 AM, M.-A. Lemburg wrote: > > If you think the package owner is opening up a security threat by > > including the links in the first place - yes, that's indeed a risk. > > Is this feature still needed for setuptools ? > It's fairly regularly used to link to repositories, e.g., I might put this text in a description: To install `the tip tarball < http://bitbucket.org/ianb/webob/get/tip.gz#egg=webob-dev>`_ use ``pip install webob==dev`` -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianb at colorstudy.com Fri Jun 18 18:01:44 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 18 Jun 2010 11:01:44 -0500 Subject: [Catalog-sig] Extra links on the PyPI /simple index package pages In-Reply-To: References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> <4C19FD2A.3050801@egenix.com> <4C1A0992.7070507@egenix.com> <4C1A201F.6080609@egenix.com> <4C1A9487.5070108@v.loewis.de> <4C1B3813.3010102@egenix.com> Message-ID: On Fri, Jun 18, 2010 at 10:57 AM, Ian Bicking wrote: > On Fri, Jun 18, 2010 at 4:10 AM, M.-A. Lemburg wrote: > >> > If you think the package owner is opening up a security threat by >> > including the links in the first place - yes, that's indeed a risk. >> >> Is this feature still needed for setuptools ? >> > > It's fairly regularly used to link to repositories, e.g., I might put this > text in a description: > > To install `the tip tarball < > http://bitbucket.org/ianb/webob/get/tip.gz#egg=webob-dev>`_ use ``pip > install webob==dev`` > It should be noted, though, that these links must be self-describing, with #egg in this case, or with a URL that is more obviously self describing like http://example.com/nightlies/webob-nightly.tar.gz -- the problems people are describing here are with fetching other pages and scanning them for links. If I remember correctly homepage and download_url are fetched and scanned for links, and those cause all the problems (especially homepage, as download_url tends to point to something simpler and more reliable). A simple security hole would be having a homepage that is a wiki -- anyone could edit the wiki and put up a link to a trojan package and it could get found and installed. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianb at colorstudy.com Fri Jun 18 18:44:25 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 18 Jun 2010 11:44:25 -0500 Subject: [Catalog-sig] Rewrite PyPI for App Engine? Message-ID: With all the reliability discussion, I thought I'd offer a kind of counterproposal, that we rewrite PyPI to use App Engine. Of course, this means writing code, etc., but I believe this is a reasonable goal. I think if "we" (Catalog-SIG? PyPI maintainers?) committed to using such an implementation (assuming it was of good quality) that we could find people (probably not on this list) to write and maintain the code. People have already rewritten PyPI a couple times, but no one knows what exactly to *do* with the rewrite so they haven't gone anywhere. And PyPI is not a particularly complicated application. I think we can set the bar high on the implementation quality and that people will meet it, so long as they know their effort won't be in vain. Why App Engine? The primary reason I'm proposing it is because it will be much easier to manage. If it runs out of memory it won't bring down a machine. If new people maintain the system it's easy to describe how to do deployments, for instance. It's easy for people to install their own PyPI instances for development and to generate patches. Hosted services can have downtimes of course, but unlike currently there are other people (the App Engine maintainers) who will resolve those problems. There's still a class of bugs like badly indexed tables or weird locking issues that could bring PyPI down and "we" would have to fix it, and with a rewrite there's more of a risk of that, but... it'll just take some testing to make sure things are okay. In terms of cost, I expect we can get free hosting, and packages can be stored directly in the data store. That doesn't preclude using a CDN like CloudFront, but that can be handled separately. Also since the index just links to packages, packages can be incrementally uploaded to a CDN. Besides a commitment to using the code (which I think is really important to motivate people), a scrubbed dump of the database would be really helpful for development. I know we've passed around complete dumps to people, but it contains private information so we can't put it up publicly which creates a speed bump for developers. Linkage... A buzz post where I asked about it: http://www.google.com/buzz/ianbicking/BRWDjsMCGWQ/I-like-the-original-proposal-move-PyPI-stuff-into A PyPI *mirror* written for App Engine: http://pypi.appspot.com/ A PyPI implementation in Django (one is a fork of the other?), database-backed (would take some work to get it on App Engine): http://pypi.python.org/pypi/djangopypi/ http://github.com/benliles/chishop -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at geek.net Fri Jun 18 18:47:21 2010 From: mark at geek.net (Mark Ramm) Date: Fri, 18 Jun 2010 12:47:21 -0400 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C1A96A6.3050101@v.loewis.de> References: <4C19A308.5040806@zopyx.com> <4C19C7A0.9080800@v.loewis.de> <4C1A607E.2030904@zopyx.com> <4C1A6383.80105@zopyx.com> <4C1A96A6.3050101@v.loewis.de> Message-ID: On Thu, Jun 17, 2010 at 5:41 PM, "Martin v. L?wis" wrote: >> It does? ?I thought PyPI kept everything around (but hidden) unless the >> author went in and manually deleted old stuff. ?You just need to go to a >> deep link, e.g., http://pypi.python.org/pypi/SomePackage/0.1 > > Sure, but owners *do* manually delete old stuff. Am I wrong in remembering that old packages get dropped from the simple index? I'm not saying they get deleted from the server, but they are made unavailable to easy_install without special knowledge of how to get them, So old packages can have requirements in setup.py which become unavailable for simple install. --Mark From mark at geek.net Fri Jun 18 18:49:39 2010 From: mark at geek.net (Mark Ramm) Date: Fri, 18 Jun 2010 12:49:39 -0400 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C1A9567.1010703@v.loewis.de> References: <4C19A308.5040806@zopyx.com> <4C19FD2A.3050801@egenix.com> <201006172255.49175.steve@pearwood.info> <4C1A9567.1010703@v.loewis.de> Message-ID: On Thu, Jun 17, 2010 at 5:36 PM, "Martin v. L?wis" wrote: >> Now, please tell me what you would do if sourceforge changes its url and >> returns a >> 404 on the old download page. Would you update all release informations? Well, at this point if sourceforge 404'ed on an old download page (as opposed to redirecting) I'd get pretty mad, and either fix it or make somebody fix it. We depend on easy_install/pip as much as anybody -- so we'd be shooting ourselves in the foot too. --Mark Ramm From ianb at colorstudy.com Fri Jun 18 19:01:48 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 18 Jun 2010 12:01:48 -0500 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: <4C19A308.5040806@zopyx.com> <4C19C7A0.9080800@v.loewis.de> <4C1A607E.2030904@zopyx.com> <4C1A6383.80105@zopyx.com> <4C1A96A6.3050101@v.loewis.de> Message-ID: On Fri, Jun 18, 2010 at 11:47 AM, Mark Ramm wrote: > On Thu, Jun 17, 2010 at 5:41 PM, "Martin v. L?wis" > wrote: > >> It does? I thought PyPI kept everything around (but hidden) unless the > >> author went in and manually deleted old stuff. You just need to go to a > >> deep link, e.g., http://pypi.python.org/pypi/SomePackage/0.1 > > > > Sure, but owners *do* manually delete old stuff. > > Am I wrong in remembering that old packages get dropped from the > simple index? > > I'm not saying they get deleted from the server, but they are made > unavailable to easy_install without special knowledge of how to get > them, So old packages can have requirements in setup.py which become > unavailable for simple install. > If you give pip or easy_install (or I assume buildout) a requirement like Foo==0.1, then they will look at http://pypi.python.org/simple/Foo/0.1, and if the release is hidden that will still return the links for that version of the package. If you give a version like Foo<=0.1 then it won't work (assuming 0.1 is hidden), as there's no deep link that either installer will look at. A weird case is that links in long_description in old releases will show up regardless, so if you actually want to purge a link (e.g., to a non-existent repository) then it require editing all versions of the package. This might be unintentional. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcrute at gmail.com Fri Jun 18 21:11:29 2010 From: mcrute at gmail.com (Michael Crute) Date: Fri, 18 Jun 2010 15:11:29 -0400 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: Message-ID: On Fri, Jun 18, 2010 at 12:44 PM, Ian Bicking wrote: > With all the reliability discussion, I thought I'd offer a kind of > counterproposal, that we rewrite PyPI to use App Engine. > > Of course, this means writing code, etc., but I believe this is a reasonable > goal.? I think if "we" (Catalog-SIG?? PyPI maintainers?) committed to using > such an implementation (assuming it was of good quality) that we could find > people (probably not on this list) to write and maintain the code.? People > have already rewritten PyPI a couple times, but no one knows what exactly to > *do* with the rewrite so they haven't gone anywhere.? And PyPI is not a > particularly complicated application.? I think we can set the bar high on > the implementation quality and that people will meet it, so long as they > know their effort won't be in vain. > > Why App Engine?? The primary reason I'm proposing it is because it will be > much easier to manage.? If it runs out of memory it won't bring down a > machine.? If new people maintain the system it's easy to describe how to do > deployments, for instance.? It's easy for people to install their own PyPI > instances for development and to generate patches.? Hosted services can have > downtimes of course, but unlike currently there are other people (the App > Engine maintainers) who will resolve those problems.? There's still a class > of bugs like badly indexed tables or weird locking issues that could bring > PyPI down and "we" would have to fix it, and with a rewrite there's more of > a risk of that, but... it'll just take some testing to make sure things are > okay. > > In terms of cost, I expect we can get free hosting, and packages can be > stored directly in the data store.? That doesn't preclude using a CDN like > CloudFront, but that can be handled separately.? Also since the index just > links to packages, packages can be incrementally uploaded to a CDN. > > Besides a commitment to using the code (which I think is really important to > motivate people), a scrubbed dump of the database would be really helpful > for development.? I know we've passed around complete dumps to people, but > it contains private information so we can't put it up publicly which creates > a speed bump for developers. I would very much like to see pypi start using chishop. I've been working to implement the complete set of features that pypi supports (including the mirroring PEP) for use inside of the company I work for. The code is in reasonably good shape and I would love to see that become the official implementation of PyPi. Though I haven't tested it I don't see any reason that it wouldn't run on AppEngine with no additional work. -- Michael E. Crute http://mike.crute.org It is a mistake to think you can solve any major problem just with potatoes. --Douglas Adams From pje at telecommunity.com Fri Jun 18 23:13:40 2010 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 18 Jun 2010 17:13:40 -0400 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI Message-ID: <20100618211350.41F903A414B@sparrow.telecommunity.com> At 12:01 PM 6/18/2010 -0500, Ian Bicking wrote: >On Fri, Jun 18, 2010 at 11:47 AM, Mark Ramm ><mark at geek.net> wrote: >On Thu, Jun 17, 2010 at 5:41 PM, "Martin v. L??wis" ><martin at v.loewis.de> wrote: > >> It does? ? I thought PyPI kept everything around (but hidden) unless the > >> author went in and manually deleted old stuff. ? You just need to go to a > >> deep link, e.g., > http://pypi.python.org/pypi/SomePackage/0.1 > > > > > Sure, but owners *do* manually delete old stuff. >Am I wrong in remembering that old packages get dropped from the >simple index? >I'm not saying they get deleted from the server, but they are made >unavailable to easy_install without special knowledge of how to get >them, ? So old packages can have requirements in setup.py which become >unavailable ? for simple install. > > >If you give pip or easy_install (or I assume buildout) a requirement >like Foo==0.1, then they will look at >http://pypi.python.org/simple/Foo/0.1, easy_install doesn't do that, unless you explicitly add that URL via -f or --find-links. Is that a feature you added in pip? >and if the release is hidden that will still return the links for >that version of the package.? If you give a version like Foo<=0.1 >then it won't work (assuming 0.1 is hidden), as there's no deep link >that either installer will look at. > >A weird case is that links in long_description in old releases will >show up regardless, so if you actually want to purge a link (e.g., >to a non-existent repository) then it require editing all versions >of the package.? This might be unintentional. It's at least consistent -- all URLs for all versions (whether hidden or not) show up when you access the packagewide page. There was some discussion in the past about whether this was appropriate; IMO it's not, as it was an effective API change from the pre-/simple days. Before, if a release was hidden, there was no way for easy_install to find it except via explicit -f usage. Now, there is no way for an author to hide a release from automatic installation and still allow for manual installation. From pje at telecommunity.com Fri Jun 18 23:14:31 2010 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 18 Jun 2010 17:14:31 -0400 Subject: [Catalog-sig] Extra links on the PyPI /simple index package pages Message-ID: <20100618211441.6EEE93A414B@sparrow.telecommunity.com> At 11:01 AM 6/18/2010 -0500, Ian Bicking wrote: >A simple security hole would be having a homepage that is a wiki -- >anyone could edit the wiki and put up a link to a trojan package and >it could get found and installed. Of course, that's also a security hole even if you're *not* using an automated installation. From pje at telecommunity.com Fri Jun 18 23:14:45 2010 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 18 Jun 2010 17:14:45 -0400 Subject: [Catalog-sig] Extra links on the PyPI /simple index package pages Message-ID: <20100618211455.2BDE73A414B@sparrow.telecommunity.com> At 11:10 AM 6/18/2010 +0200, M.-A. Lemburg wrote: >"Martin v. L?wis" wrote: > > Am 17.06.2010 15:16, schrieb M.-A. Lemburg: > >> Benji York wrote: > >>> On Thu, Jun 17, 2010 at 7:40 AM, M.-A. Lemburg wrote: > >>>> http://pypi.python.org/simple/zc.buildout/ > >>>> > >>>> BTW: what are all those bug links doing on the zc.buildout index page ? > >>> > >>> PyPI scrapes all the links from the long description; for many projects > >>> that includes a change log with links to fixed bugs. > >> > >> Isn't that dangerous ? > >> > >> AFAIK, setuptools would start opening all those URLs and might > >> find download files which are not necessarily under full control of > >> the author, e.g. anyone could add a comment to a bug report or > >> wiki page with a link to an egg file on some rogue server. > > > > I think you misunderstand. Links originate *only* from the long > > description. The package owner has full control over that. > >I was referring to the linked assets that the package owner >may not have full control over, e.g. in the above case, >you have links pointing to launchpad and one to "file://". > >Such links (except the file:// one) can be useful in the >package description, e.g. to point to a bug tracking >system, documentation or other resources, but they are >not really needed to point setuptools to download locations. This is a misunderstanding of what setuptools does. Setuptools only retrieves URLs that are *specifically designated* as a "home page" or "download" link (using the "rel" field of the A tag on the PyPI /simple page), or which are a recognizable download URL supplied by way of the long_description. So, the risk you are describing does not actually exist. > > If you think the package owner is opening up a security threat by > > including the links in the first place - yes, that's indeed a risk. > >Is this feature still needed for setuptools ? Yes. >We have download URLs and homepage URLs which should be enough for >setuptools to search and find the links to package download files. No. This would only be the case if the project's author had some other form of hosting. For example, if you had a subversion repository for your development trunk, but didn't have any place to host an HTML page to link to it, the long_description would be the only way (AFAIK at present) for you to securely provide a link to that repository for setuptools (or humans) to use. See also: http://peak.telecommunity.com/DevCenter/setuptools#making-your-package-available-for-easyinstall and: http://peak.telecommunity.com/DevCenter/PackageIndexAPI for more information on how the link parsing and retrieval works. It is a common misconception that setuptools spiders pages for links; the truth is, it only reads the "home" and "download" URLs provided via the PyPI metadata, and those only if they're not obviously links to a package tarball (or zip, egg, etc.). All other links must visibly point to something downloadable, or else they're ignored. That means unless your bug tracking system's URL ends with "/myproject-1.2.tgz", it ain't gonna get downloaded. And unless you used it as your "home page" link, it won't be searched for links, either. ;-) From ziade.tarek at gmail.com Fri Jun 18 23:39:32 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Fri, 18 Jun 2010 23:39:32 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> <4C17BBC3.3050205@egenix.com> <4C1919F1.9080506@v.loewis.de> Message-ID: On Thu, Jun 17, 2010 at 6:30 AM, Ian Bicking wrote: > On Wed, Jun 16, 2010 at 1:37 PM, "Martin v. L?wis" > wrote: >>> >>> It is likely that some people will setup a mirror and then "forget" to >>> take care >>> about it. Like our buildbots really. >> >> >> The same can happen to any infrastructure, though. Amazon may decide to >> change the setup, and then the automated update procedure would break. >> Of course, they would give advance notice - but then somebody would >> have to react to that advance notice. > > That's not very likely, and if something does change it will be extremely > well announced and documented.? Amazon is providing a commercial service > lots of people rely on, their process is formalized and professionalized. > And if Amazon makes mistakes they'll figure out how to avoid them next time, > while mirror providers are a rotating crew that is unlikely to easily or > reliably learn from past mistakes. if a mirror manager don't do a good job, he'll just be taken out of the ring after a while. If we depend 100% on Amazon, and if there's a problem, the mirroring will be down for the time being and we won't be able to do nothing about it. > If we actually understood each time PyPI > broke and fixed it none of this would be a problem; I'm not blaming anyone > for that, but it's also not going to change and adding lots of mirror > systems just adds more systems with exactly the same management problems > that our current system has. Yes but the difference is that you don't put all your eggs in the same basket: it's very unlikely that ALL community mirrors will be down at the same time, thus a fall-back mechanism on the client side will raise the availability automatically. About Amazon: what will happen in 5 years with their offer ? will our Cloud-PyPI infrastructure will still work ? what will be the workload to maintain it ? You can't be 100% sure the Python community will be able to dedicate that time. PyPI works today because it is not forced by a third party to evolve, it can evolve as its own pace. On the contrary, once the mirrors system is set, it will be dead easy to add/remove a mirror in the ring, and each node won't act as a SPOF IMHO it's a bad idea to make this piece of our infrastructure depend on one third party commercial entity, where we can provide a community answer. Now, a mirror could use Amazon, that would make more sense to me. Regards Tarek > > -- > Ian Bicking ?| ?http://blog.ianbicking.org > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > > -- Tarek Ziad? | http://ziade.org From exarkun at twistedmatrix.com Fri Jun 18 23:47:00 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Fri, 18 Jun 2010 21:47:00 -0000 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> <4C17BBC3.3050205@egenix.com> <4C1919F1.9080506@v.loewis.de> Message-ID: <20100618214700.2412.1860572271.divmod.xquotient.104@localhost.localdomain> On 09:39 pm, ziade.tarek at gmail.com wrote: >On Thu, Jun 17, 2010 at 6:30 AM, Ian Bicking >wrote: >>On Wed, Jun 16, 2010 at 1:37 PM, "Martin v. L?wis" >> >>wrote: >>>> >>>>It is likely that some people will setup a mirror and then "forget" >>>>to >>>>take care >>>>about it. Like our buildbots really. >>> >>> >>>The same can happen to any infrastructure, though. Amazon may decide >>>to >>>change the setup, and then the automated update procedure would >>>break. >>>Of course, they would give advance notice - but then somebody would >>>have to react to that advance notice. >> >>That's not very likely, and if something does change it will be >>extremely >>well announced and documented.? Amazon is providing a commercial >>service >>lots of people rely on, their process is formalized and >>professionalized. >>And if Amazon makes mistakes they'll figure out how to avoid them next >>time, >>while mirror providers are a rotating crew that is unlikely to easily >>or >>reliably learn from past mistakes. > >if a mirror manager don't do a good job, he'll just be taken out of >the ring after a while. >If we depend 100% on Amazon, and if there's a problem, the mirroring >will be down for the time being and we won't be able to do nothing >about it. >>If we actually understood each time PyPI >>broke and fixed it none of this would be a problem; I'm not blaming >>anyone >>for that, but it's also not going to change and adding lots of mirror >>systems just adds more systems with exactly the same management >>problems >>that our current system has. > >Yes but the difference is that you don't put all your eggs in the same >basket: >it's very unlikely that ALL community mirrors will be down at the same >time, thus >a fall-back mechanism on the client side will raise the availability >automatically. > >About Amazon: what will happen in 5 years with their offer ? will our >Cloud-PyPI infrastructure will still work ? what will be the workload >to maintain it ? You can't >be 100% sure the Python community will be able to dedicate that time. >PyPI works today because it is not forced by a third party to evolve, >it can evolve as its own pace. > >On the contrary, once the mirrors system is set, it will be dead easy >to add/remove a mirror in the ring, and each node won't act as a SPOF > >IMHO it's a bad idea to make this piece of our infrastructure depend >on one third party commercial entity, where we can provide a community >answer. There are (multiple!) open source implementations of the Amazon API. If Amazon decides to discontinue their cloud services (something I doubt should really be one of the top ten concerns here), then anyone else can set up their own cloud with the same interface. If I were going to run a PyPI mirroring service, I'd probably want to do it this way *anyway* because managing virtual machines is far easier than managing actual hardware. So there are probably many other much more significant issues to be worrying about. Jean-Paul From ianb at colorstudy.com Sat Jun 19 00:05:29 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 18 Jun 2010 17:05:29 -0500 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <20100618211350.41F903A414B@sparrow.telecommunity.com> References: <20100618211350.41F903A414B@sparrow.telecommunity.com> Message-ID: On Fri, Jun 18, 2010 at 4:13 PM, P.J. Eby wrote: > If you give pip or easy_install (or I assume buildout) a requirement like >> Foo==0.1, then they will look at >> http://pypi.python.org/simple/Foo/0.1, >> > > > easy_install doesn't do that, unless you explicitly add that URL via -f or > --find-links. Is that a feature you added in pip? > Hmm... somehow I imagined I was copying easy_install functionality when I added that, but I guess not. But yes, pip does look at a version-specific link as a special case. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Sat Jun 19 00:33:19 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 19 Jun 2010 00:33:19 +0200 Subject: [Catalog-sig] PyPI template improvements In-Reply-To: References: <4C194755.2060704@v.loewis.de> <4C1A8928.8090709@v.loewis.de> Message-ID: <4C1BF42F.9050300@v.loewis.de> > a question from me. Does setuptools browse the main pypi pages or does > it use the simple version? Both. Old versions (which still need to be supported) go to the main pages; new versions to the simple index. IOW, you need to maintain all links on the main pages that also exist on the simple pages. > Another question is, if there is a need for Javascript on the page > (don't worry about making it unaccessible, I'll make everything degrade > nicely) am I allowed to include JavaScript framework. Right now I'm > looking at jQuery (http://jquery.com/) or would there be something > against this? Not sure how this can be deployed, but if you come up with a solution, that's fine with me. Regards, Martin From martin at v.loewis.de Sat Jun 19 00:34:00 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 19 Jun 2010 00:34:00 +0200 Subject: [Catalog-sig] PyPI template improvements In-Reply-To: <4C1B2F66.1050502@egenix.com> References: <4C194755.2060704@v.loewis.de> <4C1A8928.8090709@v.loewis.de> <4C1B2F66.1050502@egenix.com> Message-ID: <4C1BF458.8030503@v.loewis.de> > setuptools used to parse the main web pages of PyPI. This was > then changed and the /simple index invented. All recent versions > of setuptools default to using the /simple index. See my response, though. The old versions still need to be supported. Regards, Martin From martin at v.loewis.de Sat Jun 19 00:47:24 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 19 Jun 2010 00:47:24 +0200 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: Message-ID: <4C1BF77C.10306@v.loewis.de> > I would very much like to see pypi start using chishop. I've been > working to implement the complete set of features that pypi supports > (including the mirroring PEP) for use inside of the company I work > for. The code is in reasonably good shape and I would love to see that > become the official implementation of PyPi. Though I haven't tested it > I don't see any reason that it wouldn't run on AppEngine with no > additional work. AFAICT, it is still way off being a replacement for PyPI. Where are the rendered web pages? Where is the account management? Where is file upload, documentation upload? Browsing for classifiers? and so on. This looks just like the simple index to me. Regards, Martin From mcrute at gmail.com Sat Jun 19 01:05:21 2010 From: mcrute at gmail.com (Michael Crute) Date: Fri, 18 Jun 2010 19:05:21 -0400 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: <4C1BF77C.10306@v.loewis.de> References: <4C1BF77C.10306@v.loewis.de> Message-ID: On Fri, Jun 18, 2010 at 6:47 PM, "Martin v. L?wis" wrote: >> I would very much like to see pypi start using chishop. I've been >> working to implement the complete set of features that pypi supports >> (including the mirroring PEP) for use inside of the company I work >> for. The code is in reasonably good shape and I would love to see that >> become the official implementation of PyPi. Though I haven't tested it >> I don't see any reason that it wouldn't run on AppEngine with no >> additional work. > > AFAICT, it is still way off being a replacement for PyPI. Where are the > rendered web pages? Where is the account management? Where is file > upload, documentation upload? Browsing for classifiers? and so on. > > This looks just like the simple index to me. Yes, in it's current state it's pretty basic. We are working on rolling out an internal version of PyPi at work to assist with distribution of our applications so I'm working on full compatibility with the official PyPi. We're still a little ways out but are moving in the right direction. I'm maintaining a todo list within my fork at http://github.com/mcrute/chishop/blob/master/TODO and would very much appreciate any input you might have as to which features are most important for official compatibility and what is missing from that list. -- Michael E. Crute http://mike.crute.org It is a mistake to think you can solve any major problem just with potatoes. --Douglas Adams From martin at v.loewis.de Sat Jun 19 01:07:51 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 19 Jun 2010 01:07:51 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: References: <4C19A308.5040806@zopyx.com> <4C19C7A0.9080800@v.loewis.de> <4C1A607E.2030904@zopyx.com> <4C1A6383.80105@zopyx.com> <4C1A96A6.3050101@v.loewis.de> Message-ID: <4C1BFC47.8060301@v.loewis.de> Am 18.06.2010 18:47, schrieb Mark Ramm: > On Thu, Jun 17, 2010 at 5:41 PM, "Martin v. L?wis" wrote: >>> It does? I thought PyPI kept everything around (but hidden) unless the >>> author went in and manually deleted old stuff. You just need to go to a >>> deep link, e.g., http://pypi.python.org/pypi/SomePackage/0.1 >> >> Sure, but owners *do* manually delete old stuff. > > Am I wrong in remembering that old packages get dropped from the > simple index? You are indeed misremembering. They used to, but don't, any longer, on user request. Regards, Martin From ziade.tarek at gmail.com Sat Jun 19 01:08:28 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sat, 19 Jun 2010 01:08:28 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <20100618214700.2412.1860572271.divmod.xquotient.104@localhost.localdomain> References: <4C1768AF.9040606@egenix.com> <4C17A419.4060602@egenix.com> <4C17BBC3.3050205@egenix.com> <4C1919F1.9080506@v.loewis.de> <20100618214700.2412.1860572271.divmod.xquotient.104@localhost.localdomain> Message-ID: On Fri, Jun 18, 2010 at 11:47 PM, wrote: [..] > > There are (multiple!) open source implementations of the Amazon API. ?If > Amazon decides to discontinue their cloud services (something I doubt should > really be one of the top ten concerns here), then anyone else can set up > their own cloud with the same interface. > > If I were going to run a PyPI mirroring service, I'd probably want to do it > this way *anyway* because managing virtual machines is far easier than > managing actual hardware. I am not arguing in particular against Amazon, or any other service. This is an implementation detail. My point is that having a ring of mirrors (whatever technology each one of these mirror uses) is better than setting up an infrastructure at Amazon ourselves (we will have to maintain), to solve our availability issues. Exactly because "anyone else can set up their own cloud (or whatever) with the same interface". In other words, the mirroring protocol is the interface that will give us this availability, by switching to a server that is available, when another one is down, be it the main PyPI itself > So there are probably many other much more significant issues to be worrying > about. Not sure what you mean here. If it's in general, I completely agree. I have a very long list :) Regards Tarek -- Tarek Ziad? | http://ziade.org From ziade.tarek at gmail.com Sat Jun 19 01:27:39 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sat, 19 Jun 2010 01:27:39 +0200 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: Message-ID: On Fri, Jun 18, 2010 at 6:44 PM, Ian Bicking wrote: > With all the reliability discussion, I thought I'd offer a kind of > counterproposal, that we rewrite PyPI to use App Engine. > > Of course, this means writing code, etc., but I believe this is a reasonable > goal.? I think if "we" (Catalog-SIG?? PyPI maintainers?) committed to using > such an implementation (assuming it was of good quality) that we could find > people (probably not on this list) to write and maintain the code.? People > have already rewritten PyPI a couple times, but no one knows what exactly to > *do* with the rewrite so they haven't gone anywhere.? And PyPI is not a > particularly complicated application.? I think we can set the bar high on > the implementation quality and that people will meet it, so long as they > know their effort won't be in vain. Out of curiosity : have you ever worked with the current implementation ? I have hard time to understand why some people say it's hard to work with it, I don't think its a valid argument. > > Why App Engine?? The primary reason I'm proposing it is because it will be > much easier to manage.? If it runs out of memory it won't bring down a > machine.? If new people maintain the system it's easy to describe how to do > deployments, for instance.? It's easy for people to install their own PyPI > instances for development and to generate patches.? Hosted services can have > downtimes of course, but unlike currently there are other people (the App > Engine maintainers) who will resolve those problems.? There's still a class > of bugs like badly indexed tables or weird locking issues that could bring > PyPI down and "we" would have to fix it, and with a rewrite there's more of > a risk of that, but... it'll just take some testing to make sure things are > okay. > > In terms of cost, I expect we can get free hosting, and packages can be > stored directly in the data store.? That doesn't preclude using a CDN like > CloudFront, but that can be handled separately.? Also since the index just > links to packages, packages can be incrementally uploaded to a CDN. Even if I don't think its a priority in our concerns (community mirrors come first), I wouldn't mind having the main PyPI UI in GAE. Although, if PyPI was to be ported to GAE, couldn't we reuse the existing code instead of rewriting from scratch ? we would just have to rewrite the DB layer. > Besides a commitment to using the code (which I think is really important to > motivate people), a scrubbed dump of the database would be really helpful > for development.? I know we've passed around complete dumps to people, but > it contains private information so we can't put it up publicly which creates > a speed bump for developers. Private information could be easily removed from those dumps; But I don't think it's so helpful since you have all the .sql scripts to create your own DB. But we could add a script to create some sample data on the top of those scripts. > > > Linkage... > A buzz post where I asked about it: > http://www.google.com/buzz/ianbicking/BRWDjsMCGWQ/I-like-the-original-proposal-move-PyPI-stuff-into > > A PyPI *mirror* written for App Engine: > http://pypi.appspot.com/ > > A PyPI implementation in Django (one is a fork of the other?), > database-backed (would take some work to get it on App Engine): > http://pypi.python.org/pypi/djangopypi/ > http://github.com/benliles/chishop > > > -- > Ian Bicking ?| ?http://blog.ianbicking.org > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > > -- Tarek Ziad? | http://ziade.org From martin at v.loewis.de Sat Jun 19 01:51:38 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 19 Jun 2010 01:51:38 +0200 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: <4C1BF77C.10306@v.loewis.de> Message-ID: <4C1C068A.4000607@v.loewis.de> > I'm maintaining a todo list within my fork at > http://github.com/mcrute/chishop/blob/master/TODO and would very much > appreciate any input you might have as to which features are most > important for official compatibility and what is missing from that > list. The absolute requirement is that any URLs that PyPI provides must work exactly the same way. Primarily, this means - package pages - browse interface - RSS In addition, a number of features aren't listed yet: - web registration of users - web password reset - OpenID support Regards, Martin From pje at telecommunity.com Sat Jun 19 01:57:45 2010 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 18 Jun 2010 19:57:45 -0400 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <4C1BFC47.8060301@v.loewis.de> References: <4C19A308.5040806@zopyx.com> <4C19C7A0.9080800@v.loewis.de> <4C1A607E.2030904@zopyx.com> <4C1A6383.80105@zopyx.com> <4C1A96A6.3050101@v.loewis.de> <4C1BFC47.8060301@v.loewis.de> Message-ID: <20100618235804.0B9DA3A40A5@sparrow.telecommunity.com> At 01:07 AM 6/19/2010 +0200, Martin v. L?wis wrote: >Am 18.06.2010 18:47, schrieb Mark Ramm: >>On Thu, Jun 17, 2010 at 5:41 PM, "Martin v. >>L?wis" wrote: >>>>It does? I thought PyPI kept everything around (but hidden) unless the >>>>author went in and manually deleted old stuff. You just need to go to a >>>>deep link, e.g., http://pypi.python.org/pypi/SomePackage/0.1 >>> >>>Sure, but owners *do* manually delete old stuff. >> >>Am I wrong in remembering that old packages get dropped from the >>simple index? > >You are indeed misremembering. They used to, but don't, any longer, >on user request. How many users? I'm thinking it might be better to meet this use case the way pip does -- i.e., look up the specific version when a specific hidden version is requested, but otherwise only show active versions. The current behavior makes it harder for package authors to control what versions are automatically installable by default. From ianb at colorstudy.com Sat Jun 19 01:58:00 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 18 Jun 2010 18:58:00 -0500 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: Message-ID: On Fri, Jun 18, 2010 at 6:27 PM, Tarek Ziad? wrote: > On Fri, Jun 18, 2010 at 6:44 PM, Ian Bicking wrote: > > With all the reliability discussion, I thought I'd offer a kind of > > counterproposal, that we rewrite PyPI to use App Engine. > > > > Of course, this means writing code, etc., but I believe this is a > reasonable > > goal. I think if "we" (Catalog-SIG? PyPI maintainers?) committed to > using > > such an implementation (assuming it was of good quality) that we could > find > > people (probably not on this list) to write and maintain the code. > People > > have already rewritten PyPI a couple times, but no one knows what exactly > to > > *do* with the rewrite so they haven't gone anywhere. And PyPI is not a > > particularly complicated application. I think we can set the bar high on > > the implementation quality and that people will meet it, so long as they > > know their effort won't be in vain. > > Out of curiosity : have you ever worked with the current implementation ? > > I have hard time to understand why some people say it's hard to work with > it, > I don't think its a valid argument. > I haven't looked at it in years, but I've poked around it some. I found it difficult, yes. > > Why App Engine? The primary reason I'm proposing it is because it will > be > > much easier to manage. If it runs out of memory it won't bring down a > > machine. If new people maintain the system it's easy to describe how to > do > > deployments, for instance. It's easy for people to install their own > PyPI > > instances for development and to generate patches. Hosted services can > have > > downtimes of course, but unlike currently there are other people (the App > > Engine maintainers) who will resolve those problems. There's still a > class > > of bugs like badly indexed tables or weird locking issues that could > bring > > PyPI down and "we" would have to fix it, and with a rewrite there's more > of > > a risk of that, but... it'll just take some testing to make sure things > are > > okay. > > > > In terms of cost, I expect we can get free hosting, and packages can be > > stored directly in the data store. That doesn't preclude using a CDN > like > > CloudFront, but that can be handled separately. Also since the index > just > > links to packages, packages can be incrementally uploaded to a CDN. > > Even if I don't think its a priority in our concerns (community > mirrors come first), I wouldn't mind having the main PyPI UI in GAE. > The priorities that motivate me are: 1. Make installation more reliable with respect to PyPI 2. Decrease overall maintenance burden 3. Decrease code liability Community mirrors only address 1 while App Engine addresses 2 and a rewrite addresses 3. And I think App Engine would be significantly more reliable than PyPI with mirrors. It's less moving parts, and it's built on infrastructure that is highly automated. Also because it requires less maintenance, if someone drops out of communication for a while or goes on vacation or something, it's not something that needs active tending. There's a significant number of failure conditions that a mirror network doesn't protect you from. Connection refused, connection timed out, and 500 errors are the only really obvious errors that will make a tool look to the next mirror. Because of potential synchronization problems there's a lot of new problems a mirror network could introduce. Although, if PyPI was to be ported to GAE, couldn't we reuse the > existing code instead of rewriting from scratch ? we would just have > to rewrite the DB layer. > I don't think it's worth reusing that code. > Besides a commitment to using the code (which I think is really important > to > > motivate people), a scrubbed dump of the database would be really helpful > > for development. I know we've passed around complete dumps to people, > but > > it contains private information so we can't put it up publicly which > creates > > a speed bump for developers. > > Private information could be easily removed from those dumps; > > But I don't think it's so helpful since you have all the .sql scripts to > create > your own DB. But we could add a script to create some sample data on > the top of those scripts. > It's useful to have a representative data set to test with, especially for stuff like performance testing. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Sat Jun 19 02:18:06 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sat, 19 Jun 2010 02:18:06 +0200 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: Message-ID: <4C1C0CBE.3010305@v.loewis.de> > It's useful to have a representative data set to test with, especially > for stuff like performance testing. Couldn't that be obtained through one of the many mirroring libraries? If it's going to be a complete rewrite, anyway, I doubt that a dump according to the current db schema would help. Regards, Martin From martin at v.loewis.de Sat Jun 19 02:20:34 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 19 Jun 2010 02:20:34 +0200 Subject: [Catalog-sig] [Proposal] Registered packages must provide the source code distribution on PyPI In-Reply-To: <20100618235804.0B9DA3A40A5@sparrow.telecommunity.com> References: <4C19A308.5040806@zopyx.com> <4C19C7A0.9080800@v.loewis.de> <4C1A607E.2030904@zopyx.com> <4C1A6383.80105@zopyx.com> <4C1A96A6.3050101@v.loewis.de> <4C1BFC47.8060301@v.loewis.de> <20100618235804.0B9DA3A40A5@sparrow.telecommunity.com> Message-ID: <4C1C0D52.6020404@v.loewis.de> Am 19.06.2010 01:57, schrieb P.J. Eby: > At 01:07 AM 6/19/2010 +0200, Martin v. L?wis wrote: >> Am 18.06.2010 18:47, schrieb Mark Ramm: >>> On Thu, Jun 17, 2010 at 5:41 PM, "Martin v. >>> L?wis" wrote: >>>>> It does? I thought PyPI kept everything around (but hidden) unless the >>>>> author went in and manually deleted old stuff. You just need to go >>>>> to a >>>>> deep link, e.g., http://pypi.python.org/pypi/SomePackage/0.1 >>>> >>>> Sure, but owners *do* manually delete old stuff. >>> >>> Am I wrong in remembering that old packages get dropped from the >>> simple index? >> >> You are indeed misremembering. They used to, but don't, any longer, on >> user request. > > How many users? It's been a long time; all I remember is that the users were massively (i.e. strongly, forcefully) demanding it, and there was no objection. I guess you can find the discussion in the archives. Regards, Martin From ziade.tarek at gmail.com Sat Jun 19 02:55:29 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sat, 19 Jun 2010 02:55:29 +0200 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: Message-ID: On Sat, Jun 19, 2010 at 1:58 AM, Ian Bicking wrote: .. >> Out of curiosity : have you ever worked with the current implementation ? >> >> I have hard time to understand why some people say it's hard to work with >> it, >> I don't think its a valid argument. > > I haven't looked at it in years, but I've poked around it some.? I found it > difficult, yes. Having worked with both code bases, it's much more simple that Pip, but suffers from the same syndromes : some modules just grew too big, and there are not enough tests ;) PyPI has for instance a huge webui.py file, which should be cut in pieces. .. >> Even if I don't think its a priority in our concerns (community >> mirrors come first), I wouldn't mind having the main PyPI UI in GAE. > > The priorities that motivate me are: > > 1. Make installation more reliable with respect to PyPI > 2. Decrease overall maintenance burden > 3. Decrease code liability > > Community mirrors only address 1 while App Engine addresses 2 and a rewrite > addresses 3. I agree with 2. but I don't understand 3. why AppEngine would decrease code liability ? a code can be liable in any environment. > There's a significant number of failure conditions that a mirror network > doesn't protect you from.? Connection refused, connection timed out, and 500 > errors are the only really obvious errors that will make a tool look to the > next mirror.? Because of potential synchronization problems there's a lot of > new problems a mirror network could introduce. a mirror network is not the silver bullet, but I don't think the number of failure conditions is more significant than another solution. As a matter of fact, potential synchronization problems should be addressed by the mirroring protocol itself, if you think of any use case now, or if we meet one later. but, the main use case from the client PoV : "the sever is down" is fixed by falling back to another server. >> Although, if PyPI was to be ported to GAE, couldn't we reuse the >> existing code instead of rewriting from scratch ? we would just have >> to rewrite the DB layer. > > I don't think it's worth reusing that code. Why that ? As a contributor to the project, I know this will take some time to be rewritten, even if the application is not big. Features are still added in it. So rewriting something from scratch strikes me as a bad idea. Here are the priority I think we should take to solve the issues we had with PyPI: 1. investigate deeper in why the PyPI server was down for some hours 2. make sure PyPI has more sysadmins in several timezones 3. make the existing mirrors, "official" mirrors, via PEP 381 1. 2. can be done right now. Regards Tarek From mcrute at gmail.com Sat Jun 19 03:23:54 2010 From: mcrute at gmail.com (Michael Crute) Date: Fri, 18 Jun 2010 21:23:54 -0400 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: Message-ID: On Jun 18, 2010, at 7:27 PM, Tarek Ziad? wrote: > On Fri, Jun 18, 2010 at 6:44 PM, Ian Bicking wrote: >> Of course, this means writing code, etc., but I believe this is a reasonable >> goal. I think if "we" (Catalog-SIG? PyPI maintainers?) committed to using >> such an implementation (assuming it was of good quality) that we could find >> people (probably not on this list) to write and maintain the code. People >> have already rewritten PyPI a couple times, but no one knows what exactly to >> *do* with the rewrite so they haven't gone anywhere. And PyPI is not a >> particularly complicated application. I think we can set the bar high on >> the implementation quality and that people will meet it, so long as they >> know their effort won't be in vain. > > Out of curiosity : have you ever worked with the current implementation ? > > I have hard time to understand why some people say it's hard to work with it, > I don't think its a valid argument. I briefly played with the current implementation and found it somewhat difficult to work with. Part of the problem is that the code is dated and not well tested. The other part of the problem is that there are too many dependencies and replicating the environment required to run the official code is somewhat painful. For my uses I really don't want to run postgres just to serve a version of the cheeseshop. A project like chishop eliminates many of these problems as it's main dependency is Django which is designed to make setting up the application simple and allows you to chose what kind of database you want from something very simple like sqlite all the way up to something more robust like postgres. From mcrute at gmail.com Sat Jun 19 03:26:44 2010 From: mcrute at gmail.com (Michael Crute) Date: Fri, 18 Jun 2010 21:26:44 -0400 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: <4C1C068A.4000607@v.loewis.de> References: <4C1BF77C.10306@v.loewis.de> <4C1C068A.4000607@v.loewis.de> Message-ID: On Jun 18, 2010, at 7:51 PM, "Martin v. L?wis" wrote: > >> I'm maintaining a todo list within my fork at >> http://github.com/mcrute/chishop/blob/master/TODO and would very much >> appreciate any input you might have as to which features are most >> important for official compatibility and what is missing from that >> list. > > The absolute requirement is that any URLs that PyPI provides must work exactly the same way. Primarily, this means > - package pages > - browse interface > - RSS > > In addition, a number of features aren't listed yet: > - web registration of users > - web password reset > - OpenID support Thanks, I'll update my notes accordingly. Are there specs for any of these protocols? A few things are codified in PEPs but the rest seem to just require research into the code that currently implements the functionality. From ziade.tarek at gmail.com Sat Jun 19 03:31:26 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sat, 19 Jun 2010 03:31:26 +0200 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: Message-ID: On Sat, Jun 19, 2010 at 3:23 AM, Michael Crute wrote: > On Jun 18, 2010, at 7:27 PM, Tarek Ziad? wrote: >> On Fri, Jun 18, 2010 at 6:44 PM, Ian Bicking wrote: >>> Of course, this means writing code, etc., but I believe this is a reasonable >>> goal. ?I think if "we" (Catalog-SIG? ?PyPI maintainers?) committed to using >>> such an implementation (assuming it was of good quality) that we could find >>> people (probably not on this list) to write and maintain the code. ?People >>> have already rewritten PyPI a couple times, but no one knows what exactly to >>> *do* with the rewrite so they haven't gone anywhere. ?And PyPI is not a >>> particularly complicated application. ?I think we can set the bar high on >>> the implementation quality and that people will meet it, so long as they >>> know their effort won't be in vain. >> >> Out of curiosity : have you ever worked with the current implementation ? >> >> I have hard time to understand why some people say it's hard to work with it, >> I don't think its a valid argument. > > I briefly played with the current implementation and found it somewhat difficult to work with. Part of the problem is that the code is dated and not well tested. The other part of the problem is that there are too many dependencies and replicating the environment required to run the official code is somewhat painful. For my uses I really don't want to run postgres just to serve a version of the cheeseshop. A project like chishop eliminates many of these problems as it's main dependency is Django which is designed to make setting up the application simple and allows you to chose what kind of database you want from something very simple like sqlite all the way up to something more robust like postgres. Right, switching to something like SQLAlchemy would be better -- Tarek Ziad? | http://ziade.org From martin at v.loewis.de Sat Jun 19 10:10:30 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sat, 19 Jun 2010 10:10:30 +0200 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: Message-ID: <4C1C7B76.7020409@v.loewis.de> > I briefly played with the current implementation and found it > somewhat difficult to work with. Part of the problem is that the code > is dated Can you please explain what that means? What is "dated code", how do you recognize it, and why does it make it difficult to work with? > The other part of the problem is that > there are too many dependencies and replicating the environment > required to run the official code is somewhat painful. For my uses I > really don't want to run postgres just to serve a version of the > cheeseshop. Hmm. If setting up postgres is already considered a burden, I guess I understand the problem. However, dependency-wise, I'd argue that PyPI fares much better than many of the packages on PyPI. It's list of dependencies is really short. Regards, Martin From martin at v.loewis.de Sat Jun 19 10:12:28 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sat, 19 Jun 2010 10:12:28 +0200 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: <4C1BF77C.10306@v.loewis.de> <4C1C068A.4000607@v.loewis.de> Message-ID: <4C1C7BEC.4090506@v.loewis.de> > Thanks, I'll update my notes accordingly. Are there specs for any of > these protocols? No. If you ask for a specific spec, I can write one in an email message, though. > A few things are codified in PEPs but the rest seem > to just require research into the code that currently implements the > functionality. No. I'd rather recommend using PyPI, and locating these features. It's straight-forward to derive a spec for most of them in a blackbox fashion. Regards, Martin From g.brandl at gmx.net Sat Jun 19 12:01:35 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 19 Jun 2010 12:01:35 +0200 Subject: [Catalog-sig] Mercurial In-Reply-To: <4C192BA7.8010202@netwok.org> References: <4C121377.4000008@simplistix.co.uk> <4C127DD4.5010801@v.loewis.de> <4C12A2E4.2090305@v.loewis.de> <4C12A54D.1070406@egenix.com> <4C14D8E8.4010903@egenix.com> <4C15F5F3.40501@egenix.com> <4C176BD4.3080909@egenix.com> <4C17CE55.5000601@v.loewis.de> <4C17F065.7070309@v.loewis.de> <4C192BA7.8010202@netwok.org> Message-ID: Am 16.06.2010 21:53, schrieb ?ric Araujo: >> After using Mercurial in one project, I'm skeptical that this really >> makes things simpler. I find it very hard to find out what changes a >> specific clone has that I still need to integrate. > > There are commands to compare repositories: incoming and outgoing (read > ?hg help incoming?). > >> Also, when merging with conflicts, I find it very difficult to determine >> whether I merged all the conflicts correctly (since the diff will show >> all changes, not just the conflicts). > > I believe that?s a known bug. David Wolever is writing an extension to > show only the diff against the automated merge, which would be more > helpful: http://mercurial.selenic.com/wiki/MergediffExtension > Bitbucket uses a similar algo to display merge diffs, I think. I can understand that the behavior of mergediff is useful sometimes, but the "default" one is what I want most of the time, for example when pulling changes from a fork of Sphinx. I need to make sure that all "new code" coming from the other branch integrates well with whatever may have changed between the fork and the merge, and that includes locations without conflict. Of course, one gets huge diffs soon, but there is always the possibility of "splitting" merges, i.e. for a changeset graph (after pulling) looking like this A1 A2 M /----o---o---O - / \--o--o--o-/ B1 B2 B3 where A are your changes and B the other branch's, and M the merge commit, yout can merge a number of earlier versions so that it looks like this: A1 A2 M1 M2 /----o---o---O----O - / / \--o--o----+--o-/ B1 B2 B3 If these merge points are chosen suitably (after logical units), this can make merging much less painful. Just a data point, Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From marrakis at gmail.com Sat Jun 19 12:19:50 2010 From: marrakis at gmail.com (Mathieu Leduc-Hamel) Date: Sat, 19 Jun 2010 12:19:50 +0200 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: <4C1C7BEC.4090506@v.loewis.de> References: <4C1BF77C.10306@v.loewis.de> <4C1C068A.4000607@v.loewis.de> <4C1C7BEC.4090506@v.loewis.de> Message-ID: For list of dependencies of PyPI is not that big, I would like to add some new ones in the futur like as said Tarek, SQLAlchemy. For the different problems of the current code base, we are already started to work on these. - For unittesting, the test coverage is now around 40% and it grow each week - The 2 big module store.py and webui.py we are starting to split them into multiple logical module organisation. - After the test implantation we are planning to do switch to sqlalchemy which will allow us to make it easier for contributor, for tester and will simplify the code in the store.py module. I don't see any justification to ditch the current code base since it's work ! We need to make sure everything is working properly, simplify it and clean it, but switching to something will take time and it will be the eternal debate, which framework, which database, which server... we've already start it. Oh and by the apart from the sqldump pypi is now pretty easy to install, we put a buildout in it and using paster you lauch it.. that's all... On Sat, Jun 19, 2010 at 10:12 AM, "Martin v. L?wis" wrote: > Thanks, I'll update my notes accordingly. Are there specs for any of >> these protocols? >> > > No. If you ask for a specific spec, I can write one in an email message, > though. > > > A few things are codified in PEPs but the rest seem >> to just require research into the code that currently implements the >> functionality. >> > > No. I'd rather recommend using PyPI, and locating these features. It's > straight-forward to derive a spec for most of them in a blackbox fashion. > > Regards, > Martin > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sat Jun 19 14:32:55 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 19 Jun 2010 12:32:55 +0000 (UTC) Subject: [Catalog-sig] Rewrite PyPI for App Engine? References: Message-ID: Ian Bicking colorstudy.com> writes: > > With all the reliability discussion, I thought I'd offer a kind of > counterproposal, that we rewrite PyPI to use App Engine. How reasonable is it to base PyPI on a third-party proprietary platform, infrastructure and API? Shouldn't this kind of decision at least require something such as PSF approval? Thanks Antoine. From martin at v.loewis.de Sat Jun 19 17:58:54 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 19 Jun 2010 17:58:54 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> <4C17C4B5.3000801@jcea.es> <4C17F6D4.2050504@jcea.es> <4C1804ED.8030708@v.loewis.de> <4C186AB6.2030407@v.loewis.de> Message-ID: <4C1CE93E.8070402@v.loewis.de> > A simple way to protect against just the issue you mentioned is to > have the clients retrieve the key over HTTPS or distribute the key > with the client. Ok. I have now enabled https for PyPI (https://pypi.python.org/pypi) > Okay. We'd be happy to work with you to get an easy solution put in > place. Thanks for the offer. Notice that this project is primarily about mirroring; other issues (should they exist) preferably should be dealt with separately. > TUF is fairly early stage (our first major deployment is on going), > but might be worth consideration. I think we could probably put > together a quick demo so that you and others could see how it might > work with one of the existing client updaters. I don't think adding another dependency to the clients is really acceptable. Instead, it must all be self-contained. Regards, Martin From ianb at colorstudy.com Sat Jun 19 18:24:12 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Sat, 19 Jun 2010 11:24:12 -0500 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: Message-ID: On Sat, Jun 19, 2010 at 7:32 AM, Antoine Pitrou wrote: > Ian Bicking colorstudy.com> writes: > > > > With all the reliability discussion, I thought I'd offer a kind of > > counterproposal, that we rewrite PyPI to use App Engine. > > How reasonable is it to base PyPI on a third-party proprietary platform, > infrastructure and API? > Shouldn't this kind of decision at least require something such as PSF > approval? > Yes, that seems reasonable, though this SIG would be the first step regardless. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From justinc at cs.washington.edu Sat Jun 19 20:24:00 2010 From: justinc at cs.washington.edu (Justin Cappos) Date: Sat, 19 Jun 2010 11:24:00 -0700 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability In-Reply-To: <4C1CE93E.8070402@v.loewis.de> References: <4C1768AF.9040606@egenix.com> <4C17B6CE.20209@jcea.es> <4C17BC38.6090208@egenix.com> <4C17C4B5.3000801@jcea.es> <4C17F6D4.2050504@jcea.es> <4C1804ED.8030708@v.loewis.de> <4C186AB6.2030407@v.loewis.de> <4C1CE93E.8070402@v.loewis.de> Message-ID: On Sat, Jun 19, 2010 at 8:58 AM, "Martin v. L?wis" wrote: >> A simple way to protect against just the issue you mentioned is to >> have the clients retrieve the key over HTTPS or distribute the key >> with the client. > > Ok. I have now enabled https for PyPI (https://pypi.python.org/pypi) Great. Assuming cert checking is implemented properly for the clients who retrieve your server's key, this will protect against many simple attacks. > I don't think adding another dependency to the clients is really acceptable. > Instead, it must all be self-contained. Okay, sounds good. We'll look elsewhere! Thanks, Justin From mal at egenix.com Mon Jun 21 12:57:42 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 21 Jun 2010 12:57:42 +0200 Subject: [Catalog-sig] Extra links on the PyPI /simple index package pages In-Reply-To: <20100618210449.8621B3A414B@sparrow.telecommunity.com> References: <4C19A308.5040806@zopyx.com> <4C19D4CA.1090304@egenix.com> <4C19D745.3050900@zopyx.com> <4C19DCA9.5010308@egenix.com> <4C19DF6F.9050106@zopyx.com> <4C19E409.8060603@egenix.com> <4C19E54F.6030203@zopyx.com> <4C19F011.6010501@egenix.com> <4C19FD2A.3050801@egenix.com> <4C1A0992.7070507@egenix.com> <4C1A201F.6080609@egenix.com> <4C1A9487.5070108@v.loewis.de> <4C1B3813.3010102@egenix.com> <20100618210449.8621B3A414B@sparrow.telecommunity.com> Message-ID: <4C1F45A6.6070503@egenix.com> P.J. Eby wrote: > At 11:10 AM 6/18/2010 +0200, M.-A. Lemburg wrote: >> "Martin v. L?wis" wrote: >> > Am 17.06.2010 15:16, schrieb M.-A. Lemburg: >> >> Benji York wrote: >> >>> On Thu, Jun 17, 2010 at 7:40 AM, M.-A. Lemburg >> wrote: >> >>>> http://pypi.python.org/simple/zc.buildout/ >> >>>> >> >>>> BTW: what are all those bug links doing on the zc.buildout index >> page ? >> >>> >> >>> PyPI scrapes all the links from the long description; for many >> projects >> >>> that includes a change log with links to fixed bugs. >> >> >> >> Isn't that dangerous ? >> >> >> >> AFAIK, setuptools would start opening all those URLs and might >> >> find download files which are not necessarily under full control of >> >> the author, e.g. anyone could add a comment to a bug report or >> >> wiki page with a link to an egg file on some rogue server. >> > >> > I think you misunderstand. Links originate *only* from the long >> > description. The package owner has full control over that. >> >> I was referring to the linked assets that the package owner >> may not have full control over, e.g. in the above case, >> you have links pointing to launchpad and one to "file://". >> >> Such links (except the file:// one) can be useful in the >> package description, e.g. to point to a bug tracking >> system, documentation or other resources, but they are >> not really needed to point setuptools to download locations. > > This is a misunderstanding of what setuptools does. Setuptools only > retrieves URLs that are *specifically designated* as a "home page" or > "download" link (using the "rel" field of the A tag on the PyPI /simple > page), or which are a recognizable download URL supplied by way of the > long_description. > > So, the risk you are describing does not actually exist. > > >> > If you think the package owner is opening up a security threat by >> > including the links in the first place - yes, that's indeed a risk. >> >> Is this feature still needed for setuptools ? > > Yes. > > >> We have download URLs and homepage URLs which should be enough for >> setuptools to search and find the links to package download files. > > No. This would only be the case if the project's author had some other > form of hosting. For example, if you had a subversion repository for > your development trunk, but didn't have any place to host an HTML page > to link to it, the long_description would be the only way (AFAIK at > present) for you to securely provide a link to that repository for > setuptools (or humans) to use. The author could setup the home page or download URL to point to that repository (SVN makes the repos available as HTML pages as well). > See also: > > > http://peak.telecommunity.com/DevCenter/setuptools#making-your-package-available-for-easyinstall > > > and: > > http://peak.telecommunity.com/DevCenter/PackageIndexAPI > > for more information on how the link parsing and retrieval works. > > It is a common misconception that setuptools spiders pages for links; > the truth is, it only reads the "home" and "download" URLs provided via > the PyPI metadata, and those only if they're not obviously links to a > package tarball (or zip, egg, etc.). All other links must visibly point > to something downloadable, or else they're ignored. So in summary, the /simple index page doesn't need to include any URLs from the long_description that do not have a rel attribute set, or end with one of the fixed set of archive extensions or with "#egg=...". -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 21 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 27 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From pje at telecommunity.com Mon Jun 21 16:52:06 2010 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 21 Jun 2010 10:52:06 -0400 Subject: [Catalog-sig] Extra links on the PyPI /simple index package pages Message-ID: <20100621145209.76A903A404D@sparrow.telecommunity.com> At 12:57 PM 6/21/2010 +0200, M.-A. Lemburg wrote: >So in summary, the /simple index page doesn't need to include >any URLs from the long_description that do not have a rel >attribute set, or end with one of the fixed set of archive extensions >or with "#egg=...". Such links are ignored, yes. (The 'rel' links are only generated by PyPI, btw, not from the long_description.) OTOH, I'm not sure what benefit there is to adding code that would specifically filter things down to just those URLs, since adding code always adds the potential for bugs, and the presence of those links is currently harmless. (Unless of course you're so bandwidth starved that an extra few hundred bytes of link text is a problem... in which case, you could likely save even *more* bytes by stripping off the ' i would like to help out with the move. is anyone actually opposed to moving to GAE (either moving the current code base or re-write, whichever seems more appropriate)? -- python/django hacker & sys admin http://almirkaric.com & http://twitter.com/redduck666 From mal at egenix.com Fri Jun 25 00:16:15 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 25 Jun 2010 00:16:15 +0200 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: Message-ID: <4C23D92F.5060105@egenix.com> Almir Karic wrote: > i would like to help out with the move. > > is anyone actually opposed to moving to GAE (either moving the current > code base or re-write, whichever seems more appropriate)? I don't think people are opposed to having a PyPI clone on GAE, but moving the existing installation to GAE is something we would have to discuss separately. I for one would not welcome such a change, since we then completely lose control over service availability. Someone would also have to do some math to calculate the monthly costs for the PSF: http://code.google.com/appengine/docs/quotas.html http://code.google.com/appengine/docs/billing.html http://code.google.com/appengine/business/ Please do consider helping on the already proposed PyPI enhancements and changes. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 25 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 23 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From noah at coderanger.net Fri Jun 25 00:14:36 2010 From: noah at coderanger.net (Noah Kantrowitz) Date: Thu, 24 Jun 2010 15:14:36 -0700 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: Message-ID: <233601cb13ea$a2407780$e6c16680$@net> Moving the current codebase wouldn't be possible given the direct usage of Postges for the database. I think you will find strong resistance to anything involving a rewrite given recent discussions. --Noah > -----Original Message----- > From: catalog-sig-bounces+noah=coderanger.net at python.org > [mailto:catalog-sig-bounces+noah=coderanger.net at python.org] On Behalf > Of Almir Karic > Sent: Thursday, June 24, 2010 2:24 PM > To: catalog-sig at python.org > Subject: [Catalog-sig] Rewrite PyPI for App Engine? > > i would like to help out with the move. > > is anyone actually opposed to moving to GAE (either moving the current > code base or re-write, whichever seems more appropriate)? > > -- > python/django hacker & sys admin > http://almirkaric.com & http://twitter.com/redduck666 > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig From ianb at colorstudy.com Fri Jun 25 01:37:41 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 24 Jun 2010 18:37:41 -0500 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: <233601cb13ea$a2407780$e6c16680$@net> References: <233601cb13ea$a2407780$e6c16680$@net> Message-ID: On Thu, Jun 24, 2010 at 5:14 PM, Noah Kantrowitz wrote: > Moving the current codebase wouldn't be possible given the direct usage of > Postges for the database. I think you will find strong resistance to > anything involving a rewrite given recent discussions. > My memory of the ORM used in PyPI is that it is relatively non-relational (I think it's based on Roundup's, which maybe supported non-relational backends?) -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From noah at coderanger.net Fri Jun 25 01:40:28 2010 From: noah at coderanger.net (Noah Kantrowitz) Date: Thu, 24 Jun 2010 16:40:28 -0700 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: <233601cb13ea$a2407780$e6c16680$@net> Message-ID: <233c01cb13f6$a12f8640$e38e92c0$@net> PyPI uses an ORM? As far as I know it is just running SQL via psycopg2. --Noah From: ianbicking at gmail.com [mailto:ianbicking at gmail.com] On Behalf Of Ian Bicking Sent: Thursday, June 24, 2010 4:38 PM To: Noah Kantrowitz Cc: Almir Karic; catalog-sig at python.org Subject: Re: [Catalog-sig] Rewrite PyPI for App Engine? On Thu, Jun 24, 2010 at 5:14 PM, Noah Kantrowitz wrote: Moving the current codebase wouldn't be possible given the direct usage of Postges for the database. I think you will find strong resistance to anything involving a rewrite given recent discussions. My memory of the ORM used in PyPI is that it is relatively non-relational (I think it's based on Roundup's, which maybe supported non-relational backends?) -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianb at colorstudy.com Fri Jun 25 01:49:04 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 24 Jun 2010 18:49:04 -0500 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: <4C23D92F.5060105@egenix.com> References: <4C23D92F.5060105@egenix.com> Message-ID: On Thu, Jun 24, 2010 at 5:16 PM, M.-A. Lemburg wrote: > Almir Karic wrote: > > i would like to help out with the move. > > > > is anyone actually opposed to moving to GAE (either moving the current > > code base or re-write, whichever seems more appropriate)? > > I don't think people are opposed to having a PyPI clone on GAE, > but moving the existing installation to GAE is something we would > have to discuss separately. > > I for one would not welcome such a change, since we then completely > lose control over service availability. > I don't really understand what this means. Services become unavailable sometimes. A computer breaks, a company shuts down, an agreement ends. We don't necessarily have "control" over these situations, but we can respond to them. If App Engine goes down and the App Engine team is all like "whatever, we'll get around to fixing stuff sometime" then sure it's a problem. But it's not a plausible problem. The plausible problem is that App Engine goes down, as it has from time to time, and we have to wait for them to figure out what's wrong and fix it. *We* don't have to fix it, we only have to *wait for someone else to do it*. I don't see any reason why *we* are any better at fixing issues than the App Engine team would be. Also presumably when there is a failure we want for the failure to be understood and avoided in the future. The App Engine team does that. And they do that *for us*. In some catastrophic case we could move the site to another server, use TyphoonAE to move the code over (or simply require that there is a sufficient abstraction layer to allow for a more normal environment) and bring the site up. We control the domain, we can ultimately control where it is hosted. This kind of failure seems like it would be far more likely given our current situation than on App Engine, but moving to App Engine would not somehow make this kind of move impossible. Someone would also have to do some math to calculate the monthly > costs for the PSF: > > http://code.google.com/appengine/docs/quotas.html > http://code.google.com/appengine/docs/billing.html > http://code.google.com/appengine/business/ > It seems unlikely we'd have to pay for the service. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri Jun 25 09:21:29 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 25 Jun 2010 07:21:29 +0000 (UTC) Subject: [Catalog-sig] Rewrite PyPI for App Engine? References: Message-ID: Almir Karic almirkaric.com> writes: > > i would like to help out with the move. > > is anyone actually opposed to moving to GAE (either moving the current > code base or re-write, whichever seems more appropriate)? As I already said, I don't think it's reasonable to do it without first getting the community's (and the PSF's) agreement that a vital Python infrastructure can be managed under a proprietary API, platform and datastore. I would myself be strongly opposed to such a move. (and I don't even get the point, technically, of wanting to use GAE, which seems to provide a crippled version of Python) Regards Antoine. From noah at coderanger.net Fri Jun 25 09:39:21 2010 From: noah at coderanger.net (Noah Kantrowitz) Date: Fri, 25 Jun 2010 00:39:21 -0700 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: Message-ID: On Jun 25, 2010, at 12:21 AM, Antoine Pitrou wrote: > Almir Karic almirkaric.com> writes: >> >> i would like to help out with the move. >> >> is anyone actually opposed to moving to GAE (either moving the current >> code base or re-write, whichever seems more appropriate)? > > As I already said, I don't think it's reasonable to do it without first getting > the community's (and the PSF's) agreement that a vital Python infrastructure can > be managed under a proprietary API, platform and datastore. > > I would myself be strongly opposed to such a move. > > (and I don't even get the point, technically, of wanting to use GAE, which seems > to provide a crippled version of Python) GAE provides a professionally managed, "infinitely" scalable (or at least a heck of a lot more scalable than any other single server is likely to be, still not a substitute for mirrors), battle tested platform. There are already implementations of the GAE APIs that can be run independently, so I don't think it is quite as proprietary as you might think (though you do lose most of the benefits without having their services available, you just end up with yet another not-so-amazing web framework). I'm not saying that I think GAE is 100% the best path forward, but it certainly has a lot going for it. Also, while Google is real company and has its own business to attend to, they have almost always been an ally and partner to the Python community and would likely be willing to work with us moreso than, say, Amazon Web Services (Rackspace is also a big Python proponent though, and has cloud offerings similar to AWS). Similar arguments can be made for things like using S3/Cloudfront for content hosting, it isn't a replacement for mirroring, but it would allow the main server (or possibly a "primary mirror") to take advantage of these powerful services towards better uptime, responsiveness, management, etc. This isn't a slight against the current system, just pointing out that while we have a few volunteers taking care of the PyPI server, Google and GAE have dozens of people who keep GAE running smoothly as their full-time job. --Noah From solipsis at pitrou.net Fri Jun 25 09:53:57 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 25 Jun 2010 07:53:57 +0000 (UTC) Subject: [Catalog-sig] Rewrite PyPI for App Engine? References: Message-ID: Noah Kantrowitz coderanger.net> writes: > > GAE provides a professionally managed, "infinitely" scalable (or at least a > heck of a lot more scalable > than any other single server is likely to be, still not a substitute for > mirrors), battle tested platform. Infinite scalability is the new fashionable thing. But most websites can run on a single server fine, and PyPI seems to be one of those. As for "battle tested", the most popular frameworks are, as is SQLAlchemy, as is Apache, as is PostgreSQL... I don't get what GAE buys in this area. > There are already implementations of the GAE APIs that can be run > independently, so I don't think it is > quite as proprietary as you might think Isn't it like chasing a moving target, though? For an analogy, there are independent implementations of the Win32 APIs, but I'm not sure anyone would trust Wine for running production services. > Also, while Google is real > company and has its own business to attend to, they have almost always been > an ally and partner to the Python > community and would likely be willing to work with us more > so than, say, Amazon Web Services (Rackspace is also a big Python proponent > though, and has cloud offerings > similar to AWS). But the point in this discussion is not to try to pit the various service providers one against another. It's to choose whether we want to rely on a proprietary platform (modulo alternate implementations, see above), or on a similarly battle-tested "standard" FLOSS-based stack. And, assuming Google would like to provide servers and hosting, why wouldn't they simply provide Linux servers on which to run Apache and anything else we need to? Regards Antoine. From noah at coderanger.net Fri Jun 25 10:03:23 2010 From: noah at coderanger.net (Noah Kantrowitz) Date: Fri, 25 Jun 2010 01:03:23 -0700 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: Message-ID: On Jun 25, 2010, at 12:53 AM, Antoine Pitrou wrote: > Noah Kantrowitz coderanger.net> writes: >> >> GAE provides a professionally managed, "infinitely" scalable (or at least a >> heck of a lot more scalable >> than any other single server is likely to be, still not a substitute for >> mirrors), battle tested platform. > > Infinite scalability is the new fashionable thing. But most websites can run on > a single server fine, and PyPI seems to be one of those. > > As for "battle tested", the most popular frameworks are, as is SQLAlchemy, as is > Apache, as is PostgreSQL... I don't get what GAE buys in this area. > >> There are already implementations of the GAE APIs that can be run >> independently, so I don't think it is >> quite as proprietary as you might think > > Isn't it like chasing a moving target, though? > For an analogy, there are independent implementations of the Win32 APIs, but I'm > not sure anyone would trust Wine for running production services. > >> Also, while Google is real >> company and has its own business to attend to, they have almost always been >> an ally and partner to the Python >> community and would likely be willing to work with us more >> so than, say, Amazon Web Services (Rackspace is also a big Python proponent > > though, and has cloud offerings >> similar to AWS). > > But the point in this discussion is not to try to pit the various service > providers one against another. It's to choose whether we want to rely on a > proprietary platform (modulo alternate implementations, see above), or on a > similarly battle-tested "standard" FLOSS-based stack. > > And, assuming Google would like to provide servers and hosting, why wouldn't > they simply provide Linux servers on which to run Apache and anything else we > need to? Its mostly a question of ongoing management. Apache+Linux+$SQLSERVER+etc can certainly handle our needs (which, lets face it, aren't really that complex), but we don't have a full-time management staff for our server. By leaning on Google (or Amazon, Rackspace, etc) we don't have to worry about the day-to-day details of running the site. How many of the recent PyPI downtimes have just required bouncing Apache? Wouldn't it have been nice if a site engineer got paged within 60 seconds and had it dealt with soon after instead of having to wait for one of the PyPI volunteers to notice and get to a computer? It isn't a question of capability, it is just where are our man-hours best spent: simple maintenance or actually improving the site? --Noah From mal at egenix.com Fri Jun 25 10:39:45 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 25 Jun 2010 10:39:45 +0200 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: <4C23D92F.5060105@egenix.com> Message-ID: <4C246B51.9010700@egenix.com> Ian Bicking wrote: > On Thu, Jun 24, 2010 at 5:16 PM, M.-A. Lemburg wrote: > >> Almir Karic wrote: >>> i would like to help out with the move. >>> >>> is anyone actually opposed to moving to GAE (either moving the current >>> code base or re-write, whichever seems more appropriate)? >> >> I don't think people are opposed to having a PyPI clone on GAE, >> but moving the existing installation to GAE is something we would >> have to discuss separately. >> >> I for one would not welcome such a change, since we then completely >> lose control over service availability. >> > > I don't really understand what this means. Services become unavailable > sometimes. A computer breaks, a company shuts down, an agreement ends. We > don't necessarily have "control" over these situations, but we can respond > to them. If App Engine goes down and the App Engine team is all like > "whatever, we'll get around to fixing stuff sometime" then sure it's a > problem. But it's not a plausible problem. The plausible problem is that > App Engine goes down, as it has from time to time, and we have to wait for > them to figure out what's wrong and fix it. *We* don't have to fix it, we > only have to *wait for someone else to do it*. I don't see any reason why > *we* are any better at fixing issues than the App Engine team would be. > Also presumably when there is a failure we want for the failure to be > understood and avoided in the future. The App Engine team does that. And > they do that *for us*. I hear you, but don't agree that putting the runtime into the hands of the GAE would get us an overall better service :-) The point is that with GAE you only have control over the code that you post there. Everything else is under control of the GAE team (and their automatic administration systems), i.e. whether your data is available and whether there are proper backups, whether the site is reachable or not, whether the performance is available and meets your requirements, whether the service is accessible, fast enough and has low latency, etc. So if something breaks, you can only fix it, if the problem is caused by a bug in the code. For all other situations, you have to wait for the GAE team to go in and do whatever is needed. I'm not saying that the GAE team would be doing a poor job, but just sitting there waiting for them to fix it in any of the typical problem situations (apart from a bug in the code), is asking a bit much, IMHO. We have to find a middle ground, where we can still apply the necessary hand holding ourselves, if we like to, while leaving most of the day-to-day tasks to automatic tools or other service providers to deal with. Since PyPI is becoming a central piece of Python community infrastructure, we need to make sure that we can provide a very good uptime of the service and fast access to the data, esp. for the automatic download tools. Fortunately, those tools only use static data, so focusing on making that highly available will get us a much better service uptime with little extra effort. > In some catastrophic case we could move the site to another server, use > TyphoonAE to move the code over (or simply require that there is a > sufficient abstraction layer to allow for a more normal environment) and > bring the site up. We control the domain, we can ultimately control where > it is hosted. This kind of failure seems like it would be far more likely > given our current situation than on App Engine, but moving to App Engine > would not somehow make this kind of move impossible. True, but do you really want to go through all that trouble just because GAE is down or too slow to be usable again ? If we were to go for a cloud service to deploy the PyPI runtime, I'd much rather like to see a standard virtualized server approach being used. With that approach, moving (virtual) servers would take at most 5 minutes, if needed at all - you can rather easily setup virtual servers as high availability cluster and then have them manage the failover all by themselves. BTW: Here's a nice blog on the subject of downtimes: http://www.transparentuptime.com/ >> Someone would also have to do some math to calculate the monthly >> costs for the PSF: >> >> http://code.google.com/appengine/docs/quotas.html >> http://code.google.com/appengine/docs/billing.html >> http://code.google.com/appengine/business/ >> > > It seems unlikely we'd have to pay for the service. Perhaps, but then someone will have to get that information as well. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 25 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 23 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From marrakis at gmail.com Fri Jun 25 10:44:39 2010 From: marrakis at gmail.com (Mathieu Leduc-Hamel) Date: Fri, 25 Jun 2010 10:44:39 +0200 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: Message-ID: > > Its mostly a question of ongoing management. Apache+Linux+$SQLSERVER+etc > can certainly handle our needs (which, lets face it, aren't really that > complex), but we don't have a full-time management staff for our server. By > leaning on Google (or Amazon, Rackspace, etc) we don't have to worry about > the day-to-day details of running the site. How many of the recent PyPI > downtimes have just required bouncing Apache? Wouldn't it have been nice if > a site engineer got paged within 60 seconds and had it dealt with soon after > instead of having to wait for one of the PyPI volunteers to notice and get > to a computer? It isn't a question of capability, it is just where are our > man-hours best spent: simple maintenance or actually improving the site? > > True GAE will allow us to have a good cloud implentation but: - Right now the problem we faced with PyPI is not necessarily related to the server or the type of deployment. We concentrated the discussion on the type of server or which platform but we completely forgot to think about if the code is working ! - Maybe switching to something else will just make PyPI to restart more frequently. It's the first law of optimization we need to find were the problem came from ! -------------- next part -------------- An HTML attachment was scrubbed... URL: From ziade.tarek at gmail.com Fri Jun 25 11:21:56 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Fri, 25 Jun 2010 11:21:56 +0200 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: Message-ID: On Thu, Jun 24, 2010 at 11:24 PM, Almir Karic wrote: > i would like to help out with the move. > > is anyone actually opposed to moving to GAE (either moving the current > code base or re-write, whichever seems more appropriate)? Could you summarize the motivations for such a move ? ISTM that the problem is more about the management of PyPI, rather than its code. Here's my summary: 1 - PyPI was down less than 2 days in 365 days IIRC. PyPI lacks of sysadmins, we need more in several timezone. A sysadmin just relaunch the service, like MvL or Jannis did. 2 - Some people in the community are frustrated with the current process of getting a feature in PyPI. I don't have a strong opinion in this but I think having the code in a DVCS would be better. hg.python.org would open the codebase to all python core comitters, and people would be able to request pulls. Sure, writing something in GSOC would be fun, but if we want to address the real problems, its not in the code field imo. Regards Tarek -- Tarek Ziad? | http://ziade.org From jcea at jcea.es Fri Jun 25 12:36:37 2010 From: jcea at jcea.es (Jesus Cea) Date: Fri, 25 Jun 2010 12:36:37 +0200 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: Message-ID: <4C2486B5.8090904@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 25/06/10 11:21, Tarek Ziad? wrote: > 1 - PyPI was down less than 2 days in 365 days IIRC. PyPI lacks of > sysadmins, we need more in several timezone. A sysadmin just relaunch > the service, like MvL or Jannis did. We need to deploy the mirroring PEP, and the impact of PYPI central server downtime would be far less. The only missing point in the PEP, AFAIK, is the crypto stuff to prevent mirror missbehaviour. > 2 - Some people in the community are frustrated with the current > process of getting a feature in PyPI. I don't have a strong opinion in > this but I think having the code in a DVCS would be better. > hg.python.org would open the codebase to all python core comitters, > and people would be able to request pulls. +1. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTCSGtZlgi5GaxT1NAQLlUQP7BFVYQwcmic3Zu93yg5S1TD8YSsGm7YmM X8RbKTt/1rE9cc1h53goFjx8r75PsjqpF16f2jARQjipEi+2066wS2pqflERVMKO XnPi5UOz9M5oOe0MfvZKytMx+aMowcjVXhhE8tka9WE3qVZ0feZNEcqAE60nwx3h j8hI2gpBKfE= =uTHa -----END PGP SIGNATURE----- From ianb at colorstudy.com Fri Jun 25 18:49:16 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 25 Jun 2010 11:49:16 -0500 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: <4C246B51.9010700@egenix.com> References: <4C23D92F.5060105@egenix.com> <4C246B51.9010700@egenix.com> Message-ID: On Fri, Jun 25, 2010 at 3:39 AM, M.-A. Lemburg wrote: > Ian Bicking wrote: > > On Thu, Jun 24, 2010 at 5:16 PM, M.-A. Lemburg wrote: > > > >> Almir Karic wrote: > >>> i would like to help out with the move. > >>> > >>> is anyone actually opposed to moving to GAE (either moving the current > >>> code base or re-write, whichever seems more appropriate)? > >> > >> I don't think people are opposed to having a PyPI clone on GAE, > >> but moving the existing installation to GAE is something we would > >> have to discuss separately. > >> > >> I for one would not welcome such a change, since we then completely > >> lose control over service availability. > >> > > > > I don't really understand what this means. Services become unavailable > > sometimes. A computer breaks, a company shuts down, an agreement ends. > We > > don't necessarily have "control" over these situations, but we can > respond > > to them. If App Engine goes down and the App Engine team is all like > > "whatever, we'll get around to fixing stuff sometime" then sure it's a > > problem. But it's not a plausible problem. The plausible problem is > that > > App Engine goes down, as it has from time to time, and we have to wait > for > > them to figure out what's wrong and fix it. *We* don't have to fix it, > we > > only have to *wait for someone else to do it*. I don't see any reason > why > > *we* are any better at fixing issues than the App Engine team would be. > > Also presumably when there is a failure we want for the failure to be > > understood and avoided in the future. The App Engine team does that. > And > > they do that *for us*. > > I hear you, but don't agree that putting the runtime into the > hands of the GAE would get us an overall better service :-) > > The point is that with GAE you only have control over the code > that you post there. Everything else is under control of the GAE > team (and their automatic administration systems), i.e. whether > your data is available and whether there are > proper backups, whether the site is reachable or not, whether > the performance is available and meets your requirements, whether > the service is accessible, fast enough and has low latency, etc. > > So if something breaks, you can only fix it, if the problem > is caused by a bug in the code. For all other situations, you > have to wait for the GAE team to go in and do whatever is needed. > > I'm not saying that the GAE team would be doing a poor job, > but just sitting there waiting for them to fix it in any > of the typical problem situations (apart from a bug in the > code), is asking a bit much, IMHO. > If GAE was just another hosting system, then sure -- but it's not. For instance, Noah mentioned if Apache went down (or the equivalent) there's someone with a pager who will respond to it. Except GAE isn't actually like that; application instances are can be automatically killed, machines are monitored automatically and brought out of the pool as necessary. We're not replacing our diligence with Google employees, it would be replaced with machines. Of course there might be network problems or Google's own problems growing the service. But a substantial class of problems (problems that I believe have actually caused downtime) are simply eliminated from the system. GAE has less serviceable parts; that appears like losing control but it's really the normal progression away from manual interactions. I would really like if there was an open source alternative that provided that kind of infrastructure, but there isn't. Another advantage to GAE is that if there are application errors, it would be much easier for anyone to work on them -- anyone can sign up and receive a free GAE account and deploy the code with almost no effort, and they will be hosting that is completely equivalent to anyone else's hosting. The only difference would be the data set, and it is possible (maybe even likely) that some class of problems will only be noticeable with a full dataset. That's true now as well, like for some UI problems where pages have become unwieldy, and I think it would be really helpful (regardless of GAE) if PyPI had a cleaned-up-export built into it. Other cloud service providers provide something very different from GAE, and I don't think they would give a lot of benefit. The one advantage I see is that we (well, anyone) could spin up a new instance in a consistent state. Everything else is basically the same, including all the same management issues -- there's no one to kick Apache except us, for instance. Honestly if I have any skin in the game it's actually for a system like this, as I've been working on this sort of infrastructure (http://cloudsilverlining.org) -- I only propose GAE because I genuinely think it will work best for a volunteer-run piece of infrastructure like PyPI. We have to find a middle ground, where we can still apply the > necessary hand holding ourselves, if we like to, while leaving > most of the day-to-day tasks to automatic tools or other service > providers to deal with. > > Since PyPI is becoming a central piece of Python community > infrastructure, we need to make sure that we can provide a very > good uptime of the service and fast access to the data, > esp. for the automatic download tools. > > Fortunately, those tools only use static data, so focusing on > making that highly available will get us a much better service > uptime with little extra effort. > > > In some catastrophic case we could move the site to another server, use > > TyphoonAE to move the code over (or simply require that there is a > > sufficient abstraction layer to allow for a more normal environment) and > > bring the site up. We control the domain, we can ultimately control > where > > it is hosted. This kind of failure seems like it would be far more > likely > > given our current situation than on App Engine, but moving to App Engine > > would not somehow make this kind of move impossible. > > True, but do you really want to go through all that trouble > just because GAE is down or too slow to be usable again ? > That's the catastrophic case, where Google decides they don't care about App Engine or something like that. Right now we'd have to do the same thing if the server's hard disk dies, which is obviously far more likely. If we were to go for a cloud service to deploy the PyPI runtime, I'd > much rather like to see a standard virtualized server approach > being used. > > With that approach, moving (virtual) servers would take > at most 5 minutes, if needed at all - you can rather easily setup > virtual servers as high availability cluster and then have > them manage the failover all by themselves. > Setting up infrastructure for fail-overs is hard, and it would be easy for us to set it up for the wrong pieces (the ones that aren't breaking). In some sense this is why I'm not excited about mirroring, because I don't think it's fail-over for the pieces likely to break. I do like the static file proposal, also. I think just putting more content into static files could potentially fix most of our problems, along with maybe a bit of server tweaking (to make sure even if PyPI goes down, it doesn't take Apache and the static files with it). I think using a CDN would be a nice step for speed, but is less important for reliability; I think generating things with a cron job will reduce reliability because it's exactly the kind of behind-the-scenes machinery that could break without someone noticing, and we don't have a dedicated staff paying attention to things like that. If a new package registration breaks, I'd far rather it be rejected immediately (e.g., from setup.py register) than for a broken cron job to keep it from getting in the simple index. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Jun 25 18:54:41 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 25 Jun 2010 12:54:41 -0400 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: Message-ID: There are obviously objections, valid or not, to moving PyPI lock, stock, and barrel to GAE, and perhaps any 1 proprietary setting without any direct experience with GAE. There should be no objection to a GAE mirror provided by one or more people with GAE knowledge and enthusiasm. It could be run up to the free limits. If it hit them, the operators could try to get a special case upgrade. "We should be able to get ..." does not cut it. Operators of a GAE could make it a mirror-plus if they wanted. Perhaps try an alternate (competitive) search page that would make it easier to find packages that run on a specific Python version or versions. After a year of GAE experience, making it a prime locus might -- or might not -- be more sensible. -- Terry Jan Reedy From martin at v.loewis.de Fri Jun 25 22:19:37 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 Jun 2010 22:19:37 +0200 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: Message-ID: <4C250F59.9050109@v.loewis.de> > Its mostly a question of ongoing management. > Apache+Linux+$SQLSERVER+etc can certainly handle our needs (which, > lets face it, aren't really that complex), but we don't have a > full-time management staff for our server. By leaning on Google (or > Amazon, Rackspace, etc) we don't have to worry about the day-to-day > details of running the site. How many of the recent PyPI downtimes > have just required bouncing Apache? Wouldn't it have been nice if a > site engineer got paged within 60 seconds and had it dealt with soon > after instead of having to wait for one of the PyPI volunteers to > notice and get to a computer? It isn't a question of capability, it > is just where are our man-hours best spent: simple maintenance or > actually improving the site? Still, there is significant, fundamental opposition to binding PyPI to any vendor tightly in terms of implementation. This applies to GAE, and (probably less strongly) to S3. I believe that Antoine just voices a wide-spread concern (rather than him representing a singular opinion). Therefore, I will personally refrain from endorsing any port of PyPI to GAE. If people think it would be worthwhile, they could still start a port; if they wanted that port to become pypi.python.org eventually, they'd have to convince Richard Jones, me, or the PSF board. I know that without an advanced prototype, I won't be convinced. Regards, Martin From martin at v.loewis.de Fri Jun 25 22:24:08 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Fri, 25 Jun 2010 22:24:08 +0200 Subject: [Catalog-sig] Rewrite PyPI for App Engine? In-Reply-To: References: <4C23D92F.5060105@egenix.com> <4C246B51.9010700@egenix.com> Message-ID: <4C251068.5030804@v.loewis.de> Am 25.06.2010 18:49, schrieb Ian Bicking: > That's true now as well, like for some > UI problems where pages have become unwieldy, and I think it would be > really helpful (regardless of GAE) if PyPI had a cleaned-up-export built > into it. Not sure what you mean by that. pg_dump seems to work fine. Regards, Martin From mal at egenix.com Tue Jun 29 16:39:54 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 29 Jun 2010 16:39:54 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability (version 2) Message-ID: <4C2A05BA.5050808@egenix.com> After the discussions, we've had on the catalog sig, I have updated the proposal to include comments and clarifications regarding the setup and it's relationship to the mirror PEP (see the end of the proposal). While I don't think that the proposal has an influence on whether or when PEP 381 gets rolled out or not, I will delay the PSF board vote on the proposal until the August board meeting. Perhaps that will even encourage developers to put more time on PEP 381. Regarding the costs of the cloud idea, I think this would actually be a good way of getting more donations for the PSF - possibly even with a net win. It's one of the few visible and tangible things the PSF has to offer to the community. Overall, I think this is a net win for everybody: users, developers and the PSF. """ PSF-Proposal: 100 Title: Move PyPI static data to the cloud for better availability Version: 2 Last-Modified: 2010-06-29 Author: mal at lemburg.com (Marc-Andr? Lemburg) Discussions-To: catalog-sig at python.org Status: Draft Type: Informational Created: 2010-06-14 Post-History: Proposal: Move PyPI static data to the cloud for better availability ======================================================================== Motivation ---------- PyPI has in recent months seen several outages with the index not being unavailable to both users using the web GUI interface as well as package administration tools such as easy_install from setuptools. As more and more Python applications rely on tools such as easy_install for direct installation, or zc.buildout to manage the complete software configuration cycle, the PyPI infrastructure receives more and more attention from the Python community. While we don't have hard numbers available (there doesn't appear to be any monitoring in place), the number of discussions about PyPI outtages in the mailing lists has increased to a point where we cannot simply ignore those complaints anymore. In order to maintain its credibility as software repository, to support the many different projects relying on the PyPI infrastructure and the many users who rely on the simplified installation process enabled by PyPI, the PSF needs to take action and move the essential parts of PyPI to a more robust infrastructur that provides: * scalability * 24/7 outsourced system administration management * redundant storage * geo-localized fast and reliable access Current Situation ----------------- PyPI is currently run from a single server hosted in The Netherlands (ximinez.python.org). This server is run by a very small team of sys admin. PyPI itself has in recent months been mostly maintained by one developer: Martin von Loewis. Projects are underway to enhance PyPI in various ways, including a proposal to add external mirroring (PEP 381), but these are still a long way from being finalized and implemented in the existing client tools. According to Martin, the server side features of PEP 381, including a few undocumented extensions to provide package signatures, are already implemented. However, without client tools to make use of them, this is not going to change the current situation for existing PyPI users. Furthermore those client tools enhancements would first have to get adopted by PyPI users by either replacing their client tools with updated versions or switching to new client tools, which is likely going to take months to years. Existing client tool users won't see an immediate improvement. Usage ----- PyPI provides four different mechanisms for accessing the stored information: * a web GUI that is meant for use by humans * an RPC interface which is mostly used for uploading new content * a semi-static /simple package listing, used by setuptools * a static area /packages for package download files and documentation, used by both the web GUI and setuptools The /simple package listing is dump of all packages in PyPI using a simple HTML page with links to sub-pages for each package. These sub-pages provide links to download files and external references. External tools like easy_install only use the /simple package listing together with the hosted package download files. While the /simple package listing is currently dynamically created from the database in real-time, this is not really needed for normal operation. A static copy created every 10-20 minutes would provide the same level of service in much the same way. Moving static data to a CDN --------------------------- Under the proposal the static information stored in PyPI (meta-information as well as package download files and documentation) is moved to a content delivery network (CDN). For this purpose, the /simple package listing is replaced with a static copy that is recreated every 10-20 minutes using a cronjob on the PyPI server. At the same intervals, another script will scan the package and documentation files under /packages for updates and upload any changes to the CDN for neartime availability. By using a CDN the PSF will enable and provide: * high availability of the static PyPI content * offload management to the CDN * enable geo-localized downloads, i.e. the files are hosted on a nearby server * faster downloads * more reliability and scalability * move away from a single point of failure setup Note that the proposal does not cover distribution of the dynamic parts of PyPI. As a result uploads to PyPI may still fail if the PyPI server goes down. However, these dynamic parts are currently not being used by the existing package installation tools. Choice of CDN: Amazon Cloudfront -------------------------------- To keep the costs low for the PSF, Amazon Cloudfront appears to be the bext choice for CDN. Cloudfront is supported by a set of Python libraries (e.g. Amazon S3 lib and boto), upload scripts are readily available and can easily be customized. http://www.saltycrane.com/blog/2008/12/card-store-project-4-notes-using-amazons-cloudfront/ Other CDNs, such as Akamai, are either more expensive or require custom integration. Availability of Python-based tools is not always given, in fact, accessing such information is difficult for most of the proporietary CDNs. Cloudfront: quality of service ------------------------------ Amazon Cloudfront uses S3 as basis for the service, S3 has been around for years and has a very stable uptime: http://www.readwriteweb.com/archives/amazon_s3_exceeds_9999_percent_uptime.php Cloudfront itself has been around since Nov 2008. Amazon still uses the web 2.0 "beta" marketing term on it. You can check their current online status using this panel: http://status.aws.amazon.com/ Apart from the gained availability and outsourced management, we'd also get faster downloads in most parts of the world, due to the local caching Cloudfront is applying. This caching can be used to further increase the availability, since we can control the expiry time of those local copies. So in summary, we are replacing a single point of failure with an N server fail-over system (with N being the number of edge caching servers they use). How Cloudfront works -------------------- Cloudfront uses Amazon's S3 storage system which is based on "buckets". These can store any number of files in a directory-like structure. The only limit is a 5GB per file limit - more than enough for any PyPI package file. Cloudfront provides a domain for each registered S3 bucket via a "distribution" which is then made available through local cache servers in various locations around the world. The management of which server to use for an incoming request is transparently handled by Amazon. Once uploaded to the S3 bucket, the files will be distributed to the cache servers on demand and as necessary. Each edge server server maintains a cache of requested files and refetches the files after an expiry time which can be defined when uploading the file to the bucket. To simplify things on our side, we'll setup a CNAME DNS alias for the Cloudfront domain issued by Amazon to our bucket: pypi-static.python.org. IN CNAME d32z1yuk7jeryy.cloudfront.net. In the unlikely event of a longer downtime of the whole Amazon Cloudfront system, our system administrators could then easily change the DNS alias pypi-static.python.org to point back to the PyPI server until the Cloudfront problem is rectified. For more details, please see the Cloudfront documentation and FAQ: http://aws.amazon.com/documentation/cloudfront/ http://aws.amazon.com/cloudfront/faqs/ Integration ----------- In order to keep the number of changes to existing client side tools and PyPI itself to a minimum, the installation will try to be as transparent to both the server and the client side as possible. This requires on the server side: * few, if any changes to the PyPI code base * simple scripts, driven by cronjobs * a simple distributed redirection setup to avoid having to change client side tools On the client side: * no need to change the existing URL http://pypi.python.org/simple to access PyPI * redirects are already supported by setuptools via urllib2 Note that we are avoiding creating a lock-in situation by moving the data to a CDN, since the needed configuration changes on the server side can easily be rolled back to the current setup, without affecting the client side. Server side: upload cronjobs ---------------------------- Since the /simple index tree is currently being created dynamically, we'd need to create static copies of it at regular intervals in order to upload the content to the S3 bucket. This can easily be done using tools such as wget or curl or using a custom Python script that hooks directly into the PyPI database (and reuses the code for generating the /simple tree). Both the static copy of the /simple tree and the static files uploaded to /packages then need to be uploaded or updated in the S3 bucket by a cronjob running every 10-20 minutes. In a second phase of the project, we could extend PyPI to automatically push updates to Cloudfront whenever a new file is uploaded or the package data changes. Server side: downloads statistics --------------------------------- The next step would then be to configure access logs: http://docs.amazonwebservices.com/AmazonCloudFront/latest/DeveloperGuide/index.html?AccessLogs.html and add a cronjob to download them to the PyPI server. Since the format is a bit different than the Apache log format used by the PyPI software, we'd have two options: 1. convert the Cloudfront format to Apache format and simply append the converted logs to the local log files 2. write a Cloudfront log file reader and add it to the apache_count_dist.py script that updates the download counts on the web GUI Both options require no more than a few hours to implement and test. Server side: redirection setup ------------------------------ Since PyPI wasn't designed to be put on a CDN, it mixes static file URL paths with dynamic access ones, e.g. dynamic: http://pypi.python.org/pypi (and a few others) static: http://pypi.python.org/simple http://pypi.python.org/packages To move part of the URL path tree to a CDN, which works based on domains, we will need to provide a URL redirection setup that redirects client side tools to the new location. As Martin von Loewis mentioned, this will require distributing the redirection setup to more than just one server as well. Fortunately, this is not difficult to do: it requires a preconfigured lighttpd (*) setup running on N different servers which then all provide the necessary redirections (and nothing more): dynamic: http://pypi.python.org/ -> http://ximinez.python.org/pypi http://pypi.python.org/pypi -> http://ximinez.python.org/pypi (and possibly a few others) static: http://pypi.python.org/simple -> http://pypi-static.python.org/simple http://pypi.python.org/packages -> http://pypi-static.python.org/packages (note: pypi-static.python.org is a CNAME alias for the Cloudfront domain issued to the S3 bucket where we upload the data) The pypi.python.org domain would then have to be setup to map to multiple IP addresses via DNS round-robin, one entry for each redirection server, e.g. pypi.python.org. IN A 123.123.123.1 pypi.python.org. IN A 123.123.123.2 pypi.python.org. IN A 123.123.123.3 pypi.python.org. IN A 123.123.123.4 Redirection servers could be run on all PSF server machines, and, to increase availability, on PSF partner servers as well. It should be noted that current client side PyPI tools do not support automatic retry, so there still is a chance that the redirection server they pick on first try will fail. The user would then just have to retry the download to get a new server address. Automatic retry would, of course, create a better user experience, but this requires a few small changes in the existing PyPI client tools. (*) lighttpd is a lightwheight and fast HTTP server. It's easy to setup, doesn't require a lot of resources on the server machine and runs stable. Long-term changes ----------------- While enabling the above redirection setup, we should also start working on changing PyPI and the client tools to use two new domains which then cleanly separate the static CDN file access from the dynamic PyPI server access: pypi.python.org pypi-static.python.org Such a transition on the client side is expected to take at least a few years. After that, the redirection service can be shut down or used to distribute and scale the dynamic PyPI service parts. Future improvements ------------------- We could replace the cronjob system with a trigger based system that uploads changes as soon as the PyPI server receives them. Side-effects ------------ Restarts of the PyPI server, network outages, or hardware failures would not affect the static copies of the PyPI on the CDN. setuptools, easy_install, pip, zc.buildout, etc. would continue to work. The S3 bucket would serve as additional backup for the files on PyPI. Later integration with Amazon EC2 (their virtual server offering) would easily be possible for more scalability and reduced system administration load. We don't have to worry about issues such as mirror servers having out-of-date data. Manipulation of packages, e.g. to introduce trojans, is also minimized, since the Cloudfront edge servers get their data straight from the S3 bucket. Costs ----- Amazon charges for S3 and Cloudfront storage, transfer and access. The costs vary depending on location. http://aws.amazon.com/cloudfront/#pricing http://aws.amazon.com/s3/#pricing To get an idea of the costs, we'd have to take a closer look at the PyPI web stats: http://pypi.python.org/webstats/usage_201005.html In May 2010, PyPI transferred 819GB data and had to handle 22mio requests. Using the AWS monthly calculator this gives roughly (I used 37KB as average object size and 35% US, 35% EU, 10% HK, 10% JP as basis): USD 132 per month, or about USD 1,584 per year for Cloudfront. For the S3 storage, the costs amount to roughly USD 30 per month, or USD 360 per year (100GB storage, 50GB traffic in, 100GB traffic out, 1000 PUT requests, 1mio GET requests). Total costs are an estimated USD 1944 per year. Refinancing the costs --------------------- Since PyPI is being used as essential resource by many important Python projects (Zope, Plone, Django, etc.), it's fair to ask the respective foundations and the general Python community for donations to help refinance the administration costs. A prominent donation button should go the PyPI page with a text explaining how PyPI is being hosted and why donations are necessary. We may also be able to directly ask for donations from the above foundations. Details of this are currently being evaluated by the PSF board (there are some issues related to our non-profit status that make this more complicated than it appears at first). Unlike other less visible PSF activities, providing and running PyPI is a real tangible service to the community, creating more incentive for Python users, including companies relying on the PyPI service, to donate to the PSF. Overall, we should be able to refinance the costs of this improved service level, perhaps even generate more donations than needed to fund other PSF activities. Effort ------ Given that most of the tools are readily available, setting up the servers shouldn't take more than 2-3 developer days for developers who've worked with Amazon S3 and Cloudfront before, including testing. It is expected that we'll find volunteers to implement the necessary changes. Competing with PEP 381 ---------------------- A few PEP 381 developers have stated that this proposal would limit the interest in PEP 381 implementations and argue that the proposal would compete with their proposed strategy. Just to clarify, this proposal does not try to compete with the mirror proposal outlined in PEP 381. Instead it focuses on a readily available solution that can be implemented in a few days and only requires little additional system administration. In order to further underline this, the proposal will be presented to the board for approval in their August board meeting (currently scheduled for August 16), giving the PEP 381 developers more time to work and improve their PEP 381 client implementations. If the PEP 381 infrastructure gets rolled out, both the external mirrors and the cloud mirrors can work side-by-side, so there is no conflict. """ -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 15 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 33 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ _______________________________________________ Catalog-SIG mailing list Catalog-SIG at python.org http://mail.python.org/mailman/listinfo/catalog-sig From ziade.tarek at gmail.com Tue Jun 29 17:05:53 2010 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Tue, 29 Jun 2010 17:05:53 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability (version 2) In-Reply-To: <4C2A05BA.5050808@egenix.com> References: <4C2A05BA.5050808@egenix.com> Message-ID: On Tue, Jun 29, 2010 at 4:39 PM, M.-A. Lemburg wrote: [..] > Competing with PEP 381 > ---------------------- > > A few PEP 381 developers have stated that this proposal would limit > the interest in PEP 381 implementations and argue that the proposal > would compete with their proposed strategy. You can replace a "few" with the "PEP 381 authors" here. > > Just to clarify, this proposal does not try to compete with the mirror > proposal outlined in PEP 381. Instead it focuses on a readily > available solution that can be implemented in a few days and only > requires little additional system administration. I still disagree with this statement, it fully competes with PEP 381. Your proposal and PEP 381 are both trying to solve the same issue. In fact, I could copy-paste your "Motivation" section and put it in PEP 381 :) > In order to further underline this, the proposal will be presented to > the board for approval in their August board meeting (currently > scheduled for August 16), giving the PEP 381 developers more time to > work and improve their PEP 381 client implementations. As I said earlier, the mirroring work was not finished because of a lack of resource and time. Giving us a deadline before you make your proposal is not really helping. That's like saying: "if you can finish your PEP 381 thing before august, great. If not we will implement the other proposal, but with the help and resources provided by the PSF." So you are just underlining that your solution is faster to implement here. If you really want to compare both solutions, this section should compare pro's and con's instead, and think of the best long term solution for the community. I still think that setting up a cloud doesn't solve anything, you will still have to have a sysadmin behind a computer if something goes wrong. And this will happen in the cloud as well, as I don't think it's the silver bullet. Furthermore, the outage where not as bad as you describe in your PEP. You should give the real numbers in your document, and calculate the availability percentage. instead of "several outages". So far, PyPI is more reliable than Twitter I think :) (a few days / years) Regards, Tarek -- Tarek Ziad? | http://ziade.org From ianb at colorstudy.com Tue Jun 29 18:54:03 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 29 Jun 2010 11:54:03 -0500 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability (version 2) In-Reply-To: <4C2A05BA.5050808@egenix.com> References: <4C2A05BA.5050808@egenix.com> Message-ID: A few notes: On Tue, Jun 29, 2010 at 9:39 AM, M.-A. Lemburg wrote: > In order to maintain its credibility as software repository, to > support the many different projects relying on the PyPI infrastructure > and the many users who rely on the simplified installation process > enabled by PyPI, the PSF needs to take action and move the essential > parts of PyPI to a more robust infrastructur that provides: > > * scalability > * 24/7 outsourced system administration management > In a sense a CDN offers outsourced system administration -- if you upload content, they are responsible for making sure it gets served up. But that's all they do. Other "cloud" systems only provide system administration for infrastructure issues, like a network routing issue. They do not provide anything on your machine itself. It is possible to get hosting with system administration included, Rackspace Managed Servers are an example, but these are quite expensive -- basically you are paying an overhead on hosting to have a competent sysadmin on hand. Usage > ----- > > PyPI provides four different mechanisms for accessing the stored > information: > > * a web GUI that is meant for use by humans > * an RPC interface which is mostly used for uploading new > content > * a semi-static /simple package listing, used by setuptools > * a static area /packages for package download files and > documentation, used by both the web GUI and setuptools > The static packages are used by the RPC (setup.py upload) and automatically linked in. There is no privileged aspect to them, Setuptools (easy_install/pip) just reads the links provided, and if they happen to point to pypi packages then that's what is fetched. I mention this because changing those URLs on the server side will be easy as a result. > The /simple package listing is dump of all packages in PyPI using a > simple HTML page with links to sub-pages for each package. These > sub-pages provide links to download files and external references. > > External tools like easy_install only use the /simple package > listing together with the hosted package download files. > > While the /simple package listing is currently dynamically created > from the database in real-time, this is not really needed for normal > operation. A static copy created every 10-20 minutes would provide the > same level of service in much the same way. > > > Moving static data to a CDN > --------------------------- > > Under the proposal the static information stored in PyPI > (meta-information as well as package download files and documentation) > is moved to a content delivery network (CDN). > > For this purpose, the /simple package listing is replaced with a > static copy that is recreated every 10-20 minutes using a cronjob on > the PyPI server. > > At the same intervals, another script will scan the package and > documentation files under /packages for updates and upload any changes > to the CDN for neartime availability. > I disagree with this part of the proposal, because I think a 10-20 minute delay introduces the possibility of invisible errors (an infinite delay), and represents a real degradation of service as new versions of packages will not be installable until after regeneration. Also I think the RPC code (what is invoked with setup.py register/upload) can regenerate these static pages immediately. Uploading to a CDN may have to be asynchronous, but to keep the data robust we should really be storing the package locally and adding a new field to point to the mirrored location (i.e., the CDN URL). When the cron job runs that field can be updated. If the CDN upload fails (which is not unlikely) then PyPI can keep using the local location. The cron job would then also be triggering another regeneration of the static file in /static, but so long as you are only regenerating on changes this isn't much overhead. Also, making upload/register a synchronous operation will slow down the speed of RPC commands, but I don't think this is a problem -- I would much rather have an upload be slow to finish than fast but not know when the result will be available. I don't know what kind of latency to expect, really. Also, I'd like to offer a counterproposal that does not use a CDN: * Have PyPI write out static files *locally* * Use rewrite rules so those files get served without touching PyPI. * Move the PyPI installation to mod_wsgi (I believe it is using FCGI now?), with conservative settings for things like MaxRequests. I believe this will significantly improve the problem of PyPI taking down Apache, which means the static files will still be available even if PyPI itself is down. This is largely work that would have to happen to move to a CDN, but it's simpler (given how PyPI works now) and I believe will relieve most of the problems we've seen. PyPI right now is really quite reliable, these small changes would I think be low-risk and less likely to introduce new problems while addressing what I suspect is the source of problems. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Tue Jun 29 22:50:38 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 29 Jun 2010 22:50:38 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability (version 2) In-Reply-To: References: <4C2A05BA.5050808@egenix.com> Message-ID: <4C2A5C9E.6000701@v.loewis.de> > * Move the PyPI installation to mod_wsgi (I believe it is using FCGI > now?) For the latter: correct. For the former (use mod_wsgi): I had actually implemented it, but needed to revert to FCGI, because mod_wsgi would cause too many hanging servers. > This is largely work that would have to happen to move to a CDN, but > it's simpler (given how PyPI works now) and I believe will relieve most > of the problems we've seen. As for the switch to WSGI: it will *introduce* new problems. > PyPI right now is really quite reliable, > these small changes would I think be low-risk and less likely to > introduce new problems while addressing what I suspect is the source of > problems. I disagree that these are small and low-risk. The WSGI switch will risk stability; the others (generate static pages) will not be small, and risk correctness. Regards, Martin From ianb at colorstudy.com Tue Jun 29 22:59:35 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 29 Jun 2010 15:59:35 -0500 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability (version 2) In-Reply-To: <4C2A5C9E.6000701@v.loewis.de> References: <4C2A05BA.5050808@egenix.com> <4C2A5C9E.6000701@v.loewis.de> Message-ID: On Tue, Jun 29, 2010 at 3:50 PM, "Martin v. L?wis" wrote: > > * Move the PyPI installation to mod_wsgi (I believe it is using FCGI > > now?) > > For the latter: correct. > > For the former (use mod_wsgi): I had actually implemented it, but needed > to revert to FCGI, because mod_wsgi would cause too many hanging servers. > I'm surprised, what specific mod_wsgi configuration did you try? I've had good luck with a using a daemon process and making sure no process lives too long. There's another configuration of mod_wsgi that runs Python in the Apache process, which I've never used and doesn't seem like a good idea to me. > > This is largely work that would have to happen to move to a CDN, but > > it's simpler (given how PyPI works now) and I believe will relieve most > > of the problems we've seen. > > As for the switch to WSGI: it will *introduce* new problems. > > > PyPI right now is really quite reliable, > > these small changes would I think be low-risk and less likely to > > introduce new problems while addressing what I suspect is the source of > > problems. > > I disagree that these are small and low-risk. The WSGI switch will risk > stability; the others (generate static pages) will not be small, and > risk correctness. > I don't really know how to describe "small" or "low-risk"... maybe I should say "smaller" and "lesser-risk" than the full CDN proposal. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Tue Jun 29 23:22:55 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 29 Jun 2010 23:22:55 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability (version 2) In-Reply-To: References: <4C2A05BA.5050808@egenix.com> <4C2A5C9E.6000701@v.loewis.de> Message-ID: <4C2A642F.8070605@v.loewis.de> > I'm surprised, what specific mod_wsgi configuration did you try? Not sure I understand the question: WSGIDaemonProcess pypi display-name=wsgi-pypi processes=10 threads=1 maximum-requests=2000 WSGIProcessGroup pypi WSGIPassAuthorization On WSGIScriptAlias /pypi /data/pypi/src/pypi/pypi.wsgi WSGIScriptAlias /simple /data/pypi/src/pypi/pypi.wsgi According to the bzr log, I reverted that because Python would crash (with a core dump). > > PyPI right now is really quite reliable, > > these small changes would I think be low-risk and less likely to > > introduce new problems while addressing what I suspect is the > source of > > problems. > > I disagree that these are small and low-risk. The WSGI switch will risk > stability; the others (generate static pages) will not be small, and > risk correctness. > > > I don't really know how to describe "small" or "low-risk"... maybe I > should say "smaller" and "lesser-risk" than the full CDN proposal. Ah, ok - relatively speaking. That is certainly true: the CDN proposal has more risk to not work correctly. Regards, Martin From ianb at colorstudy.com Tue Jun 29 23:39:11 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 29 Jun 2010 16:39:11 -0500 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability (version 2) In-Reply-To: <4C2A642F.8070605@v.loewis.de> References: <4C2A05BA.5050808@egenix.com> <4C2A5C9E.6000701@v.loewis.de> <4C2A642F.8070605@v.loewis.de> Message-ID: On Tue, Jun 29, 2010 at 4:22 PM, "Martin v. L?wis" wrote: > > I'm surprised, what specific mod_wsgi configuration did you try? > > Not sure I understand the question: > > > WSGIDaemonProcess pypi display-name=wsgi-pypi processes=10 threads=1 > maximum-requests=2000 > WSGIProcessGroup pypi > WSGIPassAuthorization On > WSGIScriptAlias /pypi /data/pypi/src/pypi/pypi.wsgi > WSGIScriptAlias /simple /data/pypi/src/pypi/pypi.wsgi > > According to the bzr log, I reverted that because Python would crash > (with a core dump). > OK, that's how I would configure it too. A core dump implies some installation problem (e.g., mod_wsgi was compiled against one version of Python, but is being bound to a different version -- or psycopg or some other extension). Graham Dumpleton is very responsive about these kinds of issues with mod_wsgi if you mail the mod_wsgi list; I've always stuck to debs and that's saved me from version mismatches, so I haven't actually debugged issues like this myself. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Wed Jun 30 06:50:13 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 30 Jun 2010 06:50:13 +0200 Subject: [Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability (version 2) In-Reply-To: References: <4C2A05BA.5050808@egenix.com> <4C2A5C9E.6000701@v.loewis.de> <4C2A642F.8070605@v.loewis.de> Message-ID: <4C2ACD05.8070402@v.loewis.de> > OK, that's how I would configure it too. A core dump implies some > installation problem (e.g., mod_wsgi was compiled against one version of > Python, but is being bound to a different version -- or psycopg or some > other extension). Graham Dumpleton is very responsive about these kinds > of issues with mod_wsgi if you mail the mod_wsgi list; I've always stuck > to debs and that's saved me from version mismatches, so I haven't > actually debugged issues like this myself. Same here: I was only using Debian packages for everything, expecting that this ought to work. I don't feel like retrying at this point, though, when all I expect is a loss of stability. If anybody absolutely thinks that FCGI is unacceptable and really wants to have this work with WSGI, please let me know. Regards, Martin