From jim at zope.com Wed Aug 1 16:05:29 2007 From: jim at zope.com (Jim Fulton) Date: Wed, 1 Aug 2007 10:05:29 -0400 Subject: [Catalog-sig] static files, and testing pypi In-Reply-To: <64ddb72c0707271722w3da8dfa2x4668f097df6a2c9b@mail.gmail.com> References: <64ddb72c0707271722w3da8dfa2x4668f097df6a2c9b@mail.gmail.com> Message-ID: <4BD6BCC6-C462-4E0C-9D83-2AB7901502F1@zope.com> On Jul 27, 2007, at 8:22 PM, Ren? Dudfield wrote: > Hello, > > I've got a bit of spare time again after catching up on work after > attending europython - so was wondering if I should still finish the > static file stuff? I think static generation is a good idea, however, I think it is far less urgent or important than it was. I think people are going to run static mirrors of the simple site that Matin put together. I plan to release a distribution to do that sometime in the next few days. With dynamic pypi and static mirrors, I think both the dynamic and static camps can be happy. Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jim at zope.com Wed Aug 1 21:23:44 2007 From: jim at zope.com (Jim Fulton) Date: Wed, 1 Aug 2007 15:23:44 -0400 Subject: [Catalog-sig] static files, and testing pypi In-Reply-To: <46AB46B5.6020806@v.loewis.de> References: <64ddb72c0707271722w3da8dfa2x4668f097df6a2c9b@mail.gmail.com> <46AB3E74.6090303@benjiyork.com> <46AB46B5.6020806@v.loewis.de> Message-ID: <2E1C760A-0AA5-4327-9393-3BF51C340415@zope.com> On Jul 28, 2007, at 9:37 AM, Martin v. L?wis wrote: >> I like the idea, if only from a stability standpoint. (Granted, >> stability has been improved greatly of late, but static files will >> always trump dynamic page generation). > > Depends on how you define stability, Being up and available. Unlike simple ATM. I'm sorry, That was low. It was just sooooo irresistible. :) Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From benji at benjiyork.com Wed Aug 1 21:06:05 2007 From: benji at benjiyork.com (Benji York) Date: Wed, 01 Aug 2007 15:06:05 -0400 Subject: [Catalog-sig] PyPI down Message-ID: <46B0D99D.4000605@benjiyork.com> As is my sworn duty, I have the sad news to relay that PyPI is down. I tried both http://pypi.python.org/pypi and http://cheeseshop.python.org/pypi. -- Benji York http://benjiyork.com From thomas at python.org Wed Aug 1 22:03:14 2007 From: thomas at python.org (Thomas Wouters) Date: Wed, 1 Aug 2007 22:03:14 +0200 Subject: [Catalog-sig] PyPI down In-Reply-To: <46B0D99D.4000605@benjiyork.com> References: <46B0D99D.4000605@benjiyork.com> Message-ID: <9e804ac0708011303n220fd2ceyd591403436ddf478@mail.gmail.com> Fixed, I think. The problem was actually a screen-scraper searching the wiki (a third of the last 100k hits on the wiki were from the same host, requesting every possible link on every possible page.) Someone with more wiki knowledge may want to make sure no weird pages were made. (The offending host was somewhere in dynamic.dsl.as9105.com; I have the actual address but don't want to mail it around.) On 8/1/07, Benji York wrote: > > As is my sworn duty, I have the sad news to relay that PyPI is down. > > I tried both http://pypi.python.org/pypi and > http://cheeseshop.python.org/pypi. > -- > Benji York > http://benjiyork.com > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/catalog-sig/attachments/20070801/253e7ad4/attachment.html From martin at v.loewis.de Wed Aug 1 22:16:18 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 01 Aug 2007 22:16:18 +0200 Subject: [Catalog-sig] PyPI down In-Reply-To: <9e804ac0708011303n220fd2ceyd591403436ddf478@mail.gmail.com> References: <46B0D99D.4000605@benjiyork.com> <9e804ac0708011303n220fd2ceyd591403436ddf478@mail.gmail.com> Message-ID: <46B0EA12.9060807@v.loewis.de> > Fixed, I think. The problem was actually a screen-scraper searching the > wiki (a third of the last 100k hits on the wiki were from the same host, > requesting every possible link on every possible page.) Someone with > more wiki knowledge may want to make sure no weird pages were made. (The > offending host was somewhere in dynamic.dsl.as9105.com > ; I have the actual address but don't > want to mail it around.) Strange. According to the logs, it also happened (simultaneously, but independently?) that pypi.fcgi would always terminate immediately. This happened before, and I could never figure out why, so I added a mechanism to restart Apache. This time, the restart happened, and pypi.fcgi would again stop right away, to be restarted again, and so on. If this is indeed related to the load on MoinMoin, this is quite puzzling: they are separate processes, and separate virtual hosts. So they shouldn't "see" each other. Regards, Martin From martin at v.loewis.de Wed Aug 1 22:22:36 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 01 Aug 2007 22:22:36 +0200 Subject: [Catalog-sig] static files, and testing pypi In-Reply-To: <2E1C760A-0AA5-4327-9393-3BF51C340415@zope.com> References: <64ddb72c0707271722w3da8dfa2x4668f097df6a2c9b@mail.gmail.com> <46AB3E74.6090303@benjiyork.com> <46AB46B5.6020806@v.loewis.de> <2E1C760A-0AA5-4327-9393-3BF51C340415@zope.com> Message-ID: <46B0EB8C.1020905@v.loewis.de> Jim Fulton schrieb: > On Jul 28, 2007, at 9:37 AM, Martin v. L?wis wrote: > >>> I like the idea, if only from a stability standpoint. (Granted, >>> stability has been improved greatly of late, but static files will >>> always trump dynamic page generation). >> Depends on how you define stability, > > Being up and available. Unlike simple ATM. Ok. You didn't include "being correct" also, so by that definition, static pages always trump dynamic ones. > I'm sorry, That was low. It was just sooooo irresistible. :) :-) If anybody can offer suggestion on how to fix this problem, that would be appreciated. Regards, Martin From amk at amk.ca Wed Aug 1 22:39:04 2007 From: amk at amk.ca (A.M. Kuchling) Date: Wed, 1 Aug 2007 16:39:04 -0400 Subject: [Catalog-sig] [Pydotorg] PyPI down In-Reply-To: <46B0EA12.9060807@v.loewis.de> References: <46B0D99D.4000605@benjiyork.com> <9e804ac0708011303n220fd2ceyd591403436ddf478@mail.gmail.com> <46B0EA12.9060807@v.loewis.de> Message-ID: <20070801203904.GA14833@amk-desktop.matrixgroup.net> On Wed, Aug 01, 2007 at 10:16:18PM +0200, "Martin v. L?wis" wrote: > If this is indeed related to the load on MoinMoin, this is quite > puzzling: they are separate processes, and separate virtual hosts. > So they shouldn't "see" each other. Perhaps the CPU load was so high that the PyPI FCGI took a long time to open its socket, such a long time that Apache concluded that it hadn't started. Was the Wiki crawler using a consistent user agent that can be banned (e.g. nutch, wget, etc.)? --amk From thomas at python.org Wed Aug 1 22:45:30 2007 From: thomas at python.org (Thomas Wouters) Date: Wed, 1 Aug 2007 22:45:30 +0200 Subject: [Catalog-sig] PyPI down In-Reply-To: <46B0EA12.9060807@v.loewis.de> References: <46B0D99D.4000605@benjiyork.com> <9e804ac0708011303n220fd2ceyd591403436ddf478@mail.gmail.com> <46B0EA12.9060807@v.loewis.de> Message-ID: <9e804ac0708011345oefd1f98rba1870ff8f493720@mail.gmail.com> On 8/1/07, "Martin v. L?wis" wrote: > > > Fixed, I think. The problem was actually a screen-scraper searching the > > wiki (a third of the last 100k hits on the wiki were from the same host, > > requesting every possible link on every possible page.) Someone with > > more wiki knowledge may want to make sure no weird pages were made. (The > > offending host was somewhere in dynamic.dsl.as9105.com > > ; I have the actual address but don't > > want to mail it around.) > > Strange. According to the logs, it also happened (simultaneously, > but independently?) that pypi.fcgi would always terminate > immediately. This happened before, and I could never figure out > why, so I added a mechanism to restart Apache. This time, the > restart happened, and pypi.fcgi would again stop right away, > to be restarted again, and so on. > > If this is indeed related to the load on MoinMoin, this is quite > puzzling: they are separate processes, and separate virtual hosts. > So they shouldn't "see" each other. The load is machine load, which is of course shared across all processes. It was about 15, with a slow response to match. Nullrouting that particular IP address fixed the problem instantly, so I'm pretty sure that was it. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/catalog-sig/attachments/20070801/efe99e6c/attachment.html From thomas at python.org Wed Aug 1 22:47:56 2007 From: thomas at python.org (Thomas Wouters) Date: Wed, 1 Aug 2007 22:47:56 +0200 Subject: [Catalog-sig] [Pydotorg] PyPI down In-Reply-To: <20070801203904.GA14833@amk-desktop.matrixgroup.net> References: <46B0D99D.4000605@benjiyork.com> <9e804ac0708011303n220fd2ceyd591403436ddf478@mail.gmail.com> <46B0EA12.9060807@v.loewis.de> <20070801203904.GA14833@amk-desktop.matrixgroup.net> Message-ID: <9e804ac0708011347r4aca257ayd58c3d83e262f899@mail.gmail.com> On 8/1/07, A.M. Kuchling wrote: > Was the Wiki crawler using a consistent user agent that can be banned > (e.g. nutch, wget, etc.)? Nope, the user agent claimed to be MSIE 5: "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/catalog-sig/attachments/20070801/48047d12/attachment.html From martin at v.loewis.de Wed Aug 1 23:09:53 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 01 Aug 2007 23:09:53 +0200 Subject: [Catalog-sig] [Pydotorg] PyPI down In-Reply-To: <20070801203904.GA14833@amk-desktop.matrixgroup.net> References: <46B0D99D.4000605@benjiyork.com> <9e804ac0708011303n220fd2ceyd591403436ddf478@mail.gmail.com> <46B0EA12.9060807@v.loewis.de> <20070801203904.GA14833@amk-desktop.matrixgroup.net> Message-ID: <46B0F6A1.7010600@v.loewis.de> > Perhaps the CPU load was so high that the PyPI FCGI took a long time > to open its socket, such a long time that Apache concluded that it > hadn't started. I think in this case, mod_fcgi would log a message "failed to respond" or some such. The actual log message was like "pypi.fcgi exited with status code 0" - so it wasn't killed (IIUC). I added syslog messages to pypi.fcgi; it looks like something raises SystemExit, so pypi.fcgi terminates "voluntarily". I'm not quite sure where the exit comes from (but I since added logs to the two SystemExit occurrences in thfcgi.py). What is puzzling is that it will immediately do the same thing after being started fresh. Regards, Martin From martin at v.loewis.de Thu Aug 2 22:38:54 2007 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 02 Aug 2007 22:38:54 +0200 Subject: [Catalog-sig] PyPI outage Message-ID: <46B240DE.3090402@v.loewis.de> I think I now understand what happened with the outage of PyPI yesterday and today. As Thomas found, somebody was crawling the wiki, with multiple requests per second, all links (e.g. in a series such as /moin/PyConFrancescAlted?action=AttachFile /moin/PyConFrancescAlted?action=diff /moin/PyConFrancescAlted?action=info /moin/PyConFrancescAlted?action=edit /moin/PyConFrancescAlted?action=LocalSiteMap /moin/PyConFrancescAlted?action=print /moin/PyConFrancescAlted?action=refresh and so on, for every page. That caused considerable load on the machine (load average 17). In turn, PyPI began to respond more slowly; in some cases, it would not respond within the 60s that I configured for FastCGI. As a result, mod_fastcgi would close the connection for the request (and log an error). thfcgi.py found that it can't write to the pipe anymore (EPIPE), and therefore decided to terminate the FCGI server. In turn, mod_fastcgi attempted to restart the server for some time, and eventually would start throttling the restarts, making all PyPI servers go away (i.e. they would quit, and then not get restarted for some time). At that point, my maintenance script would detect that all PyPI instances went away, and initiate a graceful restart of Apache. The crawler comes from the same ISP, but today with a different IP address. I blocked that address as well. Can anybody suggest a more reliable way to prevent crawlers from hitting the wiki so hard? Regards, Martin From renesd at gmail.com Sat Aug 4 04:22:15 2007 From: renesd at gmail.com (=?ISO-8859-1?Q?Ren=E9_Dudfield?=) Date: Sat, 4 Aug 2007 12:22:15 +1000 Subject: [Catalog-sig] PyPI outage In-Reply-To: <46B240DE.3090402@v.loewis.de> References: <46B240DE.3090402@v.loewis.de> Message-ID: <64ddb72c0708031922j41ea1ffascc940c4b3712da20@mail.gmail.com> Hello, I have had good luck with different throttling solutions in the past. As well as using apache mod_cache, and ulimit for each app. In summary: - throttling, with mod_cband - caching, with mod_cache - limiting resources of each app, with ulimit. - protecting from bots, with mod_security The idea with throttling is you limit the amount of bandwidth, and the amount of connections each ip/combination of ips has. However there are problems with this... the main one being that some ip addresses can have many people behind them. Think of proxies for AOL etc. Also some clients have legitimate uses for the many connections. Like eg, some build processes at biggish companies, ie the zope people etc, or conferences where 300+ people will connect from the same ip etc etc. The other problem is that some robots use many separate ip addresses - but that isn't the common case. I think mod_cband enabled on the wiki as well as enabling caching with mod_cache for moinmoin would help quite a lot. Or implementing just one of caching or bandwidth limiting would help. I can't think of that many legitimate uses where people would want to download heaps of wiki pages like the spamming robots are. Also as you say it appears to be the wiki causing all the load at the moment - probably generic moinmoin spamming robots. So it might be best to enable mod_cband on the wiki first, rather than on pypi. mod_cband can be enabled separately on each vhost. Here are some good urls you can read to start research on bandwidth limiting (there are many links off these pages to tutorials, howtos, articles etc). http://mod-cband.com/ http://gentoo-wiki.com/HOWTO_Apache_2_bandwidth_limiting mod_security http://www.modsecurity.org/ is another option that can help with many types of attacks. However it can be more complex to configure. Another thing to do is to use ulimit to limit the resources that each application can use. This way if the wiki is being abused, it can cause less damage to the rest of the machine. Type ulimit -a to see what you can do. Just put some ulimit lines in the application start up script. Using ulimit will not fix the problem, just limit the possible damage. eg. you can limit the amount of memory used, and the amount of open files etc. For moinmoin, you could probably ask on the moinmoin mailing list for solutions to this problem, since it is probably quite common. Cheers, From martin at v.loewis.de Sat Aug 4 08:13:41 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 04 Aug 2007 08:13:41 +0200 Subject: [Catalog-sig] PyPI outage In-Reply-To: <64ddb72c0708031922j41ea1ffascc940c4b3712da20@mail.gmail.com> References: <46B240DE.3090402@v.loewis.de> <64ddb72c0708031922j41ea1ffascc940c4b3712da20@mail.gmail.com> Message-ID: <46B41915.4040702@v.loewis.de> > I think mod_cband enabled on the wiki as well as enabling caching with > mod_cache for moinmoin would help quite a lot. Or implementing just > one of caching or bandwidth limiting would help. I don't think caching will help - the caching surely shouldn't cache pages with query parameters, and these are the ones that cause the load. As for mod_cband - I haven't tried it, but I don't think a bandwidth limit is what I want to specify. I'm rather after a requests rate (mod_bw has it, but I don't know whether to trust it). AFAICT, mod_cband does not support limiting the number of requests per time period. Regards, Martin From lac at openend.se Sat Aug 4 08:59:29 2007 From: lac at openend.se (Laura Creighton) Date: Sat, 04 Aug 2007 08:59:29 +0200 Subject: [Catalog-sig] PyPI outage In-Reply-To: Message from =?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?= of "Thu, 02 Aug 2007 22:38:54 +0200." <46B240DE.3090402@v.loewis.de> References: <46B240DE.3090402@v.loewis.de> Message-ID: <200708040659.l746xT9U008738@theraft.openend.se> In a message of Thu, 02 Aug 2007 22:38:54 +0200, =?ISO-8859-15?Q?=22Martin_v=2 E_L=F6wis=22?= writes: >Can anybody suggest a more reliable way to prevent crawlers >from hitting the wiki so hard? > >Regards, >Martin I assume that this particular spider isn't named. But in case I am wrong: http://www.fleiner.com/bots/#banning has an example of how to ban the inktomi spider named Slurp. Laura From lac at openend.se Sat Aug 4 09:24:15 2007 From: lac at openend.se (Laura Creighton) Date: Sat, 4 Aug 2007 09:24:15 +0200 Subject: [Catalog-sig] why is the wiki being hit so hard? Message-ID: <200708040724.l747OFAB012962@theraft.openend.se> One possibility is that we are being scraped. Some jerk comes along and copies all your web content, and runs his own mirror so that he can get revenue from AdWords. One thing to check is whether the spider respects robots.txt. If they do not respect them, then you can use this program: http://danielwebb.us/software/bot-trap/ to catch them. If you are doing this, Martin, use the German version instead: http://www.spider-trap.de/ because it has a few useful additions. I forget what now. Most scrapers, these days, respect robots.txt which will make this program useless for catching them. But some days you can get lucky. I think the only real fix for this is for Google and other searchers to set up a service where people who produce web content that is scraped and rehosted can report the rehosting sites and make google rank them as the millionth site or so. I.e. this is a political and economic problem, not a technical one. Laura From martin at v.loewis.de Sat Aug 4 09:32:55 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 04 Aug 2007 09:32:55 +0200 Subject: [Catalog-sig] PyPI outage In-Reply-To: <46B41915.4040702@v.loewis.de> References: <46B240DE.3090402@v.loewis.de> <64ddb72c0708031922j41ea1ffascc940c4b3712da20@mail.gmail.com> <46B41915.4040702@v.loewis.de> Message-ID: <46B42BA7.5020400@v.loewis.de> > As for mod_cband - I haven't tried it, but I don't think a bandwidth > limit is what I want to specify. I'm rather after a requests rate > (mod_bw has it, but I don't know whether to trust it). AFAICT, mod_cband > does not support limiting the number of requests per time period. I have now added request throttling to MoinMoin (FCGI) itself; if you issue more than one request every two seconds (on average), you get locked out for 30s (you are allowed spikes of 30 requests, after which you need to be idle for 60 seconds). Let's see whether this helps. Regards, Martin From martin at v.loewis.de Sat Aug 4 09:36:31 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 04 Aug 2007 09:36:31 +0200 Subject: [Catalog-sig] PyPI outage In-Reply-To: <200708040659.l746xT9U008738@theraft.openend.se> References: <46B240DE.3090402@v.loewis.de> <200708040659.l746xT9U008738@theraft.openend.se> Message-ID: <46B42C7F.4050006@v.loewis.de> > I assume that this particular spider isn't named. But in case I > am wrong: http://www.fleiner.com/bots/#banning has an example of > how to ban the inktomi spider named Slurp. No, it identifies itself as "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)". Regards, Martin From martin at v.loewis.de Sat Aug 4 09:42:45 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 04 Aug 2007 09:42:45 +0200 Subject: [Catalog-sig] why is the wiki being hit so hard? In-Reply-To: <200708040724.l747OFAB012962@theraft.openend.se> References: <200708040724.l747OFAB012962@theraft.openend.se> Message-ID: <46B42DF5.2010404@v.loewis.de> > If they do not respect them, then you can use this program: > http://danielwebb.us/software/bot-trap/ to catch them. > If you are doing this, Martin, use the German version instead: > http://www.spider-trap.de/ > because it has a few useful additions. I forget what now. > > Most scrapers, these days, respect robots.txt which will make this > program useless for catching them. But some days you can get lucky. That would also be an idea. I'll see how the throttling works out; if it fails (either because it still gets overloaded - which shouldn't happen - or because legitimate users complain), I'll try that one. Regards, Martin From renesd at gmail.com Sat Aug 4 09:56:04 2007 From: renesd at gmail.com (=?ISO-8859-1?Q?Ren=E9_Dudfield?=) Date: Sat, 4 Aug 2007 17:56:04 +1000 Subject: [Catalog-sig] PyPI outage In-Reply-To: <46B42BA7.5020400@v.loewis.de> References: <46B240DE.3090402@v.loewis.de> <64ddb72c0708031922j41ea1ffascc940c4b3712da20@mail.gmail.com> <46B41915.4040702@v.loewis.de> <46B42BA7.5020400@v.loewis.de> Message-ID: <64ddb72c0708040056l31b3c7dbnbb232c9eb3cc93a8@mail.gmail.com> Nice one. I tried clicking around on the wiki as quickly as I could, and it didn't seem to block me :) It feels more responsive, compared to when I was using it yesterday. Saving pages seems to be especially faster than before at the moment. Do you consider the cases where multiple people use one ip? Like at conferences, companies, and from large isps that use proxies (eg AOL)? It sounds like you have. The robot trap Laura mentioned sounds like a good idea too - but maybe not needed now that you've done this. On 8/4/07, "Martin v. L?wis" wrote: > > As for mod_cband - I haven't tried it, but I don't think a bandwidth > > limit is what I want to specify. I'm rather after a requests rate > > (mod_bw has it, but I don't know whether to trust it). AFAICT, mod_cband > > does not support limiting the number of requests per time period. > > I have now added request throttling to MoinMoin (FCGI) itself; if you > issue more than one request every two seconds (on average), you get > locked out for 30s (you are allowed spikes of 30 requests, after which > you need to be idle for 60 seconds). > > Let's see whether this helps. > > Regards, > Martin > From martin at v.loewis.de Sat Aug 4 10:35:09 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 04 Aug 2007 10:35:09 +0200 Subject: [Catalog-sig] PyPI outage In-Reply-To: <64ddb72c0708040056l31b3c7dbnbb232c9eb3cc93a8@mail.gmail.com> References: <46B240DE.3090402@v.loewis.de> <64ddb72c0708031922j41ea1ffascc940c4b3712da20@mail.gmail.com> <46B41915.4040702@v.loewis.de> <46B42BA7.5020400@v.loewis.de> <64ddb72c0708040056l31b3c7dbnbb232c9eb3cc93a8@mail.gmail.com> Message-ID: <46B43A3D.70609@v.loewis.de> > Nice one. I tried clicking around on the wiki as quickly as I could, > and it didn't seem to block me :) > > It feels more responsive, compared to when I was using it yesterday. > Saving pages seems to be especially faster than before at the moment. That might depend on the time of the day - load is low at the moment. So far, nobody got locked out but myself, in testing. > Do you consider the cases where multiple people use one ip? Like at > conferences, companies, and from large isps that use proxies (eg AOL)? > It sounds like you have. Not really - it's only that the formula allows for quite many simultaneous access, as long as they don't run for a long period of time. E.g if "normal" people read 20 pages per hour (which they don't sustain over several hours), we can have 90 such users simultaneously. The busy hour is 19:00..20:00 GMT, the wiki gets roughly 3600 requests in that hour total (on average in July) - so allowing one request every two seconds from a bot is fairly permissive. At a conference, if people are told simultaneously to look at the same page, we can only accommodate 30 people doing so. The next 30 people will have to wait 15s. So if the entire conference of 200 people access the page within 30s, some will see the overload page. If this turns out to be a problem, the limit of 30 can be raised (without raising the allowed request rate); if it's raised to, say, 400, then we can take a spike of 400 accesses, which then takes 13 minutes to decay. Regards, Martin From lac at openend.se Sat Aug 4 14:50:06 2007 From: lac at openend.se (Laura Creighton) Date: Sat, 04 Aug 2007 14:50:06 +0200 Subject: [Catalog-sig] why is the wiki being hit so hard? In-Reply-To: Message from =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= of "Sat, 04 Aug 2007 09:42:45 +0200." <46B42DF5.2010404@v.loewis.de> References: <200708040724.l747OFAB012962@theraft.openend.se> <46B42DF5.2010404@v.loewis.de> Message-ID: <200708041250.l74Co6Fm003169@theraft.openend.se> Thank you. The wiki seems very responsive now, which is nice in itself. Laura From lac at openend.se Sun Aug 5 07:59:20 2007 From: lac at openend.se (Laura Creighton) Date: Sun, 05 Aug 2007 07:59:20 +0200 Subject: [Catalog-sig] why is the wiki being hit so hard? In-Reply-To: Message from =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= of "Sat, 04 Aug 2007 09:42:45 +0200." <46B42DF5.2010404@v.loewis.de> References: <200708040724.l747OFAB012962@theraft.openend.se> <46B42DF5.2010404@v.loewis.de> Message-ID: <200708050559.l755xKg5014573@theraft.openend.se> In a message of Sat, 04 Aug 2007 09:42:45 +0200, "Martin v. L?wis" writes: >> If they do not respect them, then you can use this program: >> http://danielwebb.us/software/bot-trap/ to catch them. >> If you are doing this, Martin, use the German version instead: >> http://www.spider-trap.de/ >> because it has a few useful additions. I forget what now. >> >> Most scrapers, these days, respect robots.txt which will make this >> program useless for catching them. But some days you can get lucky. > >That would also be an idea. I'll see how the throttling works out; >if it fails (either because it still gets overloaded - which shouldn't >happen - or because legitimate users complain), I'll try that one. > >Regards, >Martin pardon for this completely useless quoting of irrelevant text but I tried just telling catalog-sig to go read this url http://search.msn.com.my/docs/siteowner.aspx?t=SEARCH_WEBMASTER_FAQ_MSNBotIndexing.htm&FORM=WFDD#D and check MSNbot is crawling my site too frequently. and i got suspiciopus header, which is what all the python.org groups say when they think you are sendng them spam, and not in the header. So if your text is basically a url, and you want to send it to a python.org group you are screwed. So I find an article and reply. Go read that. I think it says that we could set our crawl delay to some number -- why 120 I have no clue -- and our spider will be made behave. Or possibly we can hack the bot trap for those as not respect crawl-delay. at any rate seems relevant to our problem Laura From martin at v.loewis.de Sun Aug 5 10:26:13 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 05 Aug 2007 10:26:13 +0200 Subject: [Catalog-sig] why is the wiki being hit so hard? In-Reply-To: <200708050559.l755xKg5014573@theraft.openend.se> References: <200708040724.l747OFAB012962@theraft.openend.se> <46B42DF5.2010404@v.loewis.de> <200708050559.l755xKg5014573@theraft.openend.se> Message-ID: <46B589A5.1000600@v.loewis.de> > pardon for this completely useless quoting of irrelevant text > but I tried just telling catalog-sig to go read this url > http://search.msn.com.my/docs/siteowner.aspx?t=SEARCH_WEBMASTER_FAQ_MSNBotIndexing.htm&FORM=WFDD#D > and check MSNbot is crawling my site too frequently. msnbot is currently locked out entirely from crawling the wiki, not by robots.txt, but by giving 403 for the IPs it comes from. I have now added a robots.txt with a crawl-speed of 20. IIUC, this requests that crawlers should access the site not more often than once every 20s. I then unblocked Yahoo! Slurp and msnbot. Regards, Martin From petri.savolainen at iki.fi Mon Aug 6 15:53:15 2007 From: petri.savolainen at iki.fi (Petri Savolainen) Date: Mon, 6 Aug 2007 16:53:15 +0300 Subject: [Catalog-sig] new pypi categories for symbian/series60 mobile devices? Message-ID: Hello, I'd like to propose the following addition to PyPI categorization: Operating System :: Symbian :: Series60 Basically, Series60 is a Nokia-developed (licensed to and used by others as well) platform/environment or version of the Symbian OS. For more information, please see http://wiki.opensource.nokia.com/projects/Python_for_S60 I'd also like to propose changing the Environment :: Handhelds/PDA's category by adding something to it that makes it include also mobile phones, as in: Handhelds/PDA's/Phones, or, just change it completely into something simpler but more generic, such as: Environment :: Mobile (which I'd personally find better). Thoughts? Petri From martin at v.loewis.de Tue Aug 7 23:06:47 2007 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Aug 2007 23:06:47 +0200 Subject: [Catalog-sig] PyPI and Wiki crawling Message-ID: <46B8DEE7.2080703@v.loewis.de> I hope I have now solved the overload problem that massive crawling has caused to the wiki, and, in consequence, caused PyPI outage. Following Laura's advice, I added Crawl-delay into robots.txt. Several robots have picked that up, not just msnbot and slurp, but also e.g. MJ12bot. For the others, I had to fine-tune my throttling code, after observing that the expensive URLs are those with a query string. They now account for 3 regular queries (might have to bump this to 5), so you can only do one of them every 6s. For statistics of the load, see http://ximinez.python.org/munin/localdomain/localhost.localdomain-pypitime.html I added accounting of moin.fcgi run times, which shows that Moin produced 15% CPU load on average (PyPI 3%, Postgres 2%) Regards, Martin From andrew.kuchling at gmail.com Thu Aug 9 03:30:35 2007 From: andrew.kuchling at gmail.com (Andrew Kuchling) Date: Wed, 8 Aug 2007 21:30:35 -0400 Subject: [Catalog-sig] Fwd: PyPI Idea In-Reply-To: <46B34B0F.9070702@eepatents.com> References: <46B34B0F.9070702@eepatents.com> Message-ID: ---------- Forwarded message ---------- From: Ed Suominen Date: Aug 3, 2007 11:34 AM Subject: PyPI Idea To: webmaster at python.org Here's an idea for the PyPI site that you may want to consider. (I hereby dedicate anything novel about it to the public domain.) Currently, the keywords for a given project are just listed and don't do anything. It would be cool if each keyword worked as a tag, being a hyperlink to a listing of all projects that share the same keyword. Also, I suggest inserting a space after the comma separating each keyword. Best regards, Ed From KrystalRacine at yahoo.com Fri Aug 10 22:01:50 2007 From: KrystalRacine at yahoo.com (Krystal Racine) Date: Fri, 10 Aug 2007 13:01:50 -0700 (PDT) Subject: [Catalog-sig] PyPI outage In-Reply-To: <46B43A3D.70609@v.loewis.de> References: <46B240DE.3090402@v.loewis.de> <64ddb72c0708031922j41ea1ffascc940c4b3712da20@mail.gmail.com> <46B41915.4040702@v.loewis.de> <46B42BA7.5020400@v.loewis.de> <64ddb72c0708040056l31b3c7dbnbb232c9eb3cc93a8@mail.gmail.com> <46B43A3D.70609@v.loewis.de> Message-ID: <12098124.post@talk.nabble.com> Have you thought about testing the load with a hosted web service? Some offer by GEO location "Martin v. L?wis" wrote: > >> Nice one. I tried clicking around on the wiki as quickly as I could, >> and it didn't seem to block me :) >> >> It feels more responsive, compared to when I was using it yesterday. >> Saving pages seems to be especially faster than before at the moment. > > That might depend on the time of the day - load is low at the moment. > So far, nobody got locked out but myself, in testing. > >> Do you consider the cases where multiple people use one ip? Like at >> conferences, companies, and from large isps that use proxies (eg AOL)? >> It sounds like you have. > > Not really - it's only that the formula allows for quite many > simultaneous access, as long as they don't run for a long period of > time. E.g if "normal" people read 20 pages per hour (which they > don't sustain over several hours), we can have 90 such users > simultaneously. > > The busy hour is 19:00..20:00 GMT, the wiki gets roughly 3600 > requests in that hour total (on average in July) - > so allowing one request every two seconds from a bot is fairly > permissive. > > At a conference, if people are told simultaneously to look at the > same page, we can only accommodate 30 people doing so. The next > 30 people will have to wait 15s. So if the entire conference of > 200 people access the page within 30s, some will see the overload > page. If this turns out to be a problem, the limit of 30 can be > raised (without raising the allowed request rate); if it's raised > to, say, 400, then we can take a spike of 400 accesses, which > then takes 13 minutes to decay. > > Regards, > Martin > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > > -- View this message in context: http://www.nabble.com/PyPI-outage-tf4214712.html#a12098124 Sent from the Python - catalog-sig mailing list archive at Nabble.com. From martin at v.loewis.de Fri Aug 10 23:14:31 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Fri, 10 Aug 2007 23:14:31 +0200 Subject: [Catalog-sig] PyPI outage In-Reply-To: <12098124.post@talk.nabble.com> References: <46B240DE.3090402@v.loewis.de> <64ddb72c0708031922j41ea1ffascc940c4b3712da20@mail.gmail.com> <46B41915.4040702@v.loewis.de> <46B42BA7.5020400@v.loewis.de> <64ddb72c0708040056l31b3c7dbnbb232c9eb3cc93a8@mail.gmail.com> <46B43A3D.70609@v.loewis.de> <12098124.post@talk.nabble.com> Message-ID: <46BCD537.7060607@v.loewis.de> Krystal Racine schrieb: > Have you thought about testing the load with a hosted web service? No. That would require a volunteer to do that which is not available. Regards, Martin From richardjones at optushome.com.au Sat Aug 11 01:52:47 2007 From: richardjones at optushome.com.au (Richard Jones) Date: Sat, 11 Aug 2007 09:52:47 +1000 Subject: [Catalog-sig] new pypi categories for symbian/series60 mobile devices? In-Reply-To: References: Message-ID: <200708110952.47584.richardjones@optushome.com.au> On Mon, 6 Aug 2007, Petri Savolainen wrote: > I'd like to propose the following addition to PyPI categorization: > > Operating System :: Symbian :: Series60 > > Basically, Series60 is a Nokia-developed (licensed to and used by > others as well) platform/environment or version of the Symbian OS. For > more information, please see > http://wiki.opensource.nokia.com/projects/Python_for_S60 I don't have any objections to this. In the absence of any other comments I'd be happy to add it. > I'd also like to propose changing the Environment :: Handhelds/PDA's > category by adding something to it that makes it include also mobile > phones, as in: > > Handhelds/PDA's/Phones, or, just change it completely into something > simpler but more generic, such as: Environment :: Mobile (which I'd > personally find better). I like "mobile" better because it's more succint, but I'll leave it up to you to make the call either way. Richard From ben at groovie.org Sun Aug 12 21:07:53 2007 From: ben at groovie.org (Ben Bangert) Date: Sun, 12 Aug 2007 12:07:53 -0700 Subject: [Catalog-sig] PyPI and Wiki crawling, and a CDN In-Reply-To: <46B8DEE7.2080703@v.loewis.de> References: <46B8DEE7.2080703@v.loewis.de> Message-ID: <9258992C-D3BC-409F-A2F1-521CB80D5D77@groovie.org> On Aug 7, 2007, at 2:06 PM, Martin v. L?wis wrote: > I hope I have now solved the overload problem that massive > crawling has caused to the wiki, and, in consequence, > caused PyPI outage. > > Following Laura's advice, I added Crawl-delay into robots.txt. > Several robots have picked that up, not just msnbot and slurp, > but also e.g. MJ12bot. > > For the others, I had to fine-tune my throttling code, after > observing that the expensive URLs are those with a query string. > They now account for 3 regular queries (might have to bump this > to 5), so you can only do one of them every 6s. I don't suppose there's enough resources to just have PyPI on a separate box entirely, so that whatever else is running (the wiki, etc) won't have the opportunity to drag down the package repository? On a side-note, has anyone checked into a CDN for packages to speed up their delivery and remove more of the traffic load off the PyPi host? That would also lower the bar for other sites that wanted to mirror PyPI, since they wouldn't have to hose all the actual egg's as well. Cheers, Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2472 bytes Desc: not available Url : http://mail.python.org/pipermail/catalog-sig/attachments/20070812/7ee6ccc3/attachment.bin From martin at v.loewis.de Sun Aug 12 23:50:11 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 12 Aug 2007 23:50:11 +0200 Subject: [Catalog-sig] PyPI and Wiki crawling, and a CDN In-Reply-To: <9258992C-D3BC-409F-A2F1-521CB80D5D77@groovie.org> References: <46B8DEE7.2080703@v.loewis.de> <9258992C-D3BC-409F-A2F1-521CB80D5D77@groovie.org> Message-ID: <46BF8093.8060106@v.loewis.de> > I don't suppose there's enough resources to just have PyPI on a separate > box entirely, so that whatever else is running (the wiki, etc) won't > have the opportunity to drag down the package repository? People have offered hardware (i.e. internet-connected machines). What's missing is volunteers to maintain them. Regards, Martin From michael at d2m.at Mon Aug 13 09:47:01 2007 From: michael at d2m.at (Michael Haubenwallner) Date: Mon, 13 Aug 2007 09:47:01 +0200 Subject: [Catalog-sig] pypi and wiki down Message-ID: pypi and wiki seem to be down for some 15 hours now. Is there a place (maybe in IRC) to report problems with the webservice ? Michael -- http://www.zope.org/Members/d2m http:/planetzope.org From martin at v.loewis.de Mon Aug 13 10:58:00 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 13 Aug 2007 10:58:00 +0200 Subject: [Catalog-sig] pypi and wiki down In-Reply-To: References: Message-ID: <46C01D18.7070301@v.loewis.de> Michael Haubenwallner schrieb: > pypi and wiki seem to be down for some 15 hours now. > > Is there a place (maybe in IRC) to report problems with the webservice ? Posting to this list is the best way. It is fixed now. Regards, Martin From ben at groovie.org Mon Aug 13 20:23:04 2007 From: ben at groovie.org (Ben Bangert) Date: Mon, 13 Aug 2007 11:23:04 -0700 Subject: [Catalog-sig] PyPI and Wiki crawling, and a CDN In-Reply-To: <46BF8093.8060106@v.loewis.de> References: <46B8DEE7.2080703@v.loewis.de> <9258992C-D3BC-409F-A2F1-521CB80D5D77@groovie.org> <46BF8093.8060106@v.loewis.de> Message-ID: <261B15DD-496C-4535-B735-F5A3EDE4B215@groovie.org> On Aug 12, 2007, at 2:50 PM, Martin v. L?wis wrote: > People have offered hardware (i.e. internet-connected machines). > What's missing is volunteers to maintain them. I believe ideally there should be at least 2 machines to handle PyPI, so that maintenance can be performed without taking PyPI down. I can volunteer myself right now, and can ask if there's some sysadmin's willing to volunteer maintenance time on the Pylons and TurboGears lists, as our frameworks rely rather heavily on PyPI being available. Are the people offering hardware/hosting still willing? Cheers, Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2472 bytes Desc: not available Url : http://mail.python.org/pipermail/catalog-sig/attachments/20070813/7dcedd94/attachment.bin From martin at v.loewis.de Mon Aug 13 22:20:15 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 13 Aug 2007 22:20:15 +0200 Subject: [Catalog-sig] ip 194.183.146.189 blocked In-Reply-To: References: <8F1F0605-B424-4597-BADF-1496BDBFC2C1@lovelysystems.com> <4689F923.8030304@v.loewis.de> <5B0A8BC7-CC65-49E6-AA15-CCF591A0EA41@lovelysystems.com> Message-ID: <46C0BCFF.1090406@v.loewis.de> Jodok Batlogg schrieb: > i'm sorry, this ip still seems to be blocked. > to make sure the outgoing network connection is working i just connected > to the next higher ip (svn.python.org) - and this works. > > would you mind fixing it? I've checked now - this IP was null-routed, probably because it caused an overload at some point in the past. I've removed the routing entry, so please try again now. Regards, Martin From jodok at lovelysystems.com Mon Aug 13 20:55:32 2007 From: jodok at lovelysystems.com (Jodok Batlogg) Date: Mon, 13 Aug 2007 20:55:32 +0200 Subject: [Catalog-sig] ip 194.183.146.189 blocked In-Reply-To: <5B0A8BC7-CC65-49E6-AA15-CCF591A0EA41@lovelysystems.com> References: <8F1F0605-B424-4597-BADF-1496BDBFC2C1@lovelysystems.com> <4689F923.8030304@v.loewis.de> <5B0A8BC7-CC65-49E6-AA15-CCF591A0EA41@lovelysystems.com> Message-ID: i'm sorry, this ip still seems to be blocked. to make sure the outgoing network connection is working i just connected to the next higher ip (svn.python.org) - and this works. would you mind fixing it? thanks jodok flutschi:/home/ppix lovely$ telnet pypi.python.org 80 Trying 82.94.237.219... ^C flutschi:/home/ppix lovely$ host pypi.python.org pypi.python.org is an alias for ximinez.python.org. ximinez.python.org has address 82.94.237.219 flutschi:/home/ppix lovely$ telnet 82.94.237.220 80 Trying 82.94.237.220... Connected to svn.python.org. Escape character is '^]'. GET / HTTP/1.0 HTTP/1.1 200 OK Date: Mon, 13 Aug 2007 18:51:28 GMT Server: Apache/2.2.3 (Debian) DAV/2 SVN/1.4.2 mod_ssl/2.2.3 OpenSSL/ 0.9.8c Last-Modified: Tue, 02 May 2006 00:48:08 GMT ETag: "3c610-161-864da200" Accept-Ranges: bytes Content-Length: 353 Connection: close Content-Type: text/html svn.python.org

svn.python.org

Connection closed by foreign host. On 03.07.2007, at 11:02, Jodok Batlogg wrote: > On 03.07.2007, at 09:22, Martin v. L?wis wrote: > >>> is it possible that our outgoing proxy server is beeing blocked by >>> cheeseshop? it's ip address is 194.183.146.189 >> >> I can't see anything like that in the configuration of ximinez. >> >> Furthermore, I cannot see that this IP addresses made any attempt >> to contact ximinez. I got several accesses from 194.183.146.178, >> for various versions of zc.buildout, through setuptools, and >> I got requests from 194.183.146.185 through Firefox, but none >> from the IP address that you mention. Going back until December >> 2006 (if I can trust the logs), that machine never made any >> access to the Cheeseshop. > > it seems to happen on the network level. i can't ping the machine > from this ip address :) > > coming from 194.183.146.189: > > traceroute to ximinez.python.org (82.94.237.219), 64 hops max, 60 > byte packets > 1 lsfw01 (192.168.34.254) 0.727 ms 0.406 ms 0.345 ms > 2 194-183-146-177.tele.net (194.183.146.177) 1.212 ms 1.061 ms > 3.801 ms > 3 cr4-swz1.net.tele.net (194.183.134.8) 6.733 ms 5.034 ms > 4.472 ms > 4 fas0-1-70-cr3-swz1.net.tele.net (194.183.133.188) 4.550 ms > 4.581 ms 4.627 ms > 5 atm0-0-r1-hoe1.net.tele.net (194.183.135.34) 5.743 ms 5.471 > ms 5.362 ms > 6 giga0-2.r2-buh1.net.tele.net (194.183.135.194) 7.449 ms 6.484 > ms 5.843 ms > 7 83.144.194.17 (83.144.194.17) 8.407 ms 8.736 ms 8.444 ms > 8 g4-0-211.core01.zrh01.atlas.cogentco.com (149.6.83.129) 9.269 > ms 8.669 ms 8.727 ms > 9 p6-0.core01.str01.atlas.cogentco.com (130.117.0.53) 11.924 ms > 11.825 ms 10.960 ms > 10 p3-0.core01.fra03.atlas.cogentco.com (130.117.0.217) 13.820 > ms 14.551 ms 13.941 ms > 11 p3-0.core01.ams03.atlas.cogentco.com (130.117.0.145) 21.411 > ms 21.266 ms 20.842 ms > 12 t3-1.mpd01.ams03.atlas.cogentco.com (130.117.0.34) 20.100 ms > 21.003 ms 20.880 ms > 13 ams-ix.sara.xs4all.net (195.69.144.48) 20.878 ms 20.983 ms > 28.193 ms > 14 0.so-6-0-0.xr1.3d12.xs4all.net (194.109.5.1) 21.045 ms 21.486 > ms 20.892 ms > 15 0.so-3-0-0.cr1.3d12.xs4all.net (194.109.5.58) 49.436 ms > 29.076 ms 103.199 ms > 16 * * * > 17 * * * > 18 * * * > > > coming from 194.183.146.179: > > traceroute to ximinez.python.org (82.94.237.219), 64 hops max, 60 > byte packets > 1 lsfw01 (192.168.34.254) 2.030 ms 1.495 ms 1.461 ms > 2 * 194-183-146-177.tele.net (194.183.146.177) 1.834 ms 1.646 ms > 3 cr4-swz1.net.tele.net (194.183.134.8) 4.873 ms 6.393 ms > 5.318 ms > 4 fas4-0-70-cr1-swz1.net.tele.net (194.183.133.190) 8.466 ms > 196.174 ms 5.562 ms > 5 194.183.142.2 (194.183.142.2) 6.540 ms 6.462 ms 21.969 ms > 6 giga0-2.r2-buh1.net.tele.net (194.183.135.194) 6.642 ms 6.871 > ms 7.797 ms > 7 83.144.194.17 (83.144.194.17) 18.965 ms 9.923 ms 10.459 ms > 8 g4-0-211.core01.zrh01.atlas.cogentco.com (149.6.83.129) 10.003 > ms 9.462 ms 9.945 ms > 9 p6-0.core01.str01.atlas.cogentco.com (130.117.0.53) 13.728 ms > 11.831 ms 12.375 ms > 10 p3-0.core01.fra03.atlas.cogentco.com (130.117.0.217) 14.568 > ms 16.176 ms 15.069 ms > 11 p3-0.core01.ams03.atlas.cogentco.com (130.117.0.145) 124.421 > ms 134.435 ms 205.047 ms > 12 t3-1.mpd01.ams03.atlas.cogentco.com (130.117.0.34) 21.689 ms > 21.962 ms 22.313 ms > 13 ams-ix.tc2.xs4all.net (195.69.144.166) 21.655 ms 21.213 ms > 23.011 ms > 14 0.so-7-0-0.xr2.3d12.xs4all.net (194.109.5.13) 21.531 ms > 21.966 ms 0.so-7-0-0.xr1.3d12.xs4all.net (194.109.5.9) 21.673 ms > 15 0.so-2-0-0.cr1.3d12.xs4all.net (194.109.5.74) 21.526 ms > 0.so-3-0-0.cr1.3d12.xs4all.net (194.109.5.58) 24.606 ms 22.263 ms > 16 ximinez.python.org (82.94.237.219) 23.363 ms 21.890 ms > 25.506 ms > > thanks a lot for your help > > jodok > >> >> Regards, >> Martin > > -- > "Simple is better than complex." > -- The Zen of Python, by Tim Peters > > Jodok Batlogg, Lovely Systems > Schmelzh?tterstra?e 26a, 6850 Dornbirn, Austria > phone: +43 5572 908060, fax: +43 5572 908060-77 > > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig -- "In the face of ambiguity, refuse the temptation to guess." -- The Zen of Python, by Tim Peters Jodok Batlogg, Lovely Systems Schmelzh?tterstra?e 26a, 6850 Dornbirn, Austria phone: +43 5572 908060, fax: +43 5572 908060-77 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2454 bytes Desc: not available Url : http://mail.python.org/pipermail/catalog-sig/attachments/20070813/11fe205c/attachment.bin From jodok at lovelysystems.com Mon Aug 13 22:34:24 2007 From: jodok at lovelysystems.com (Jodok Batlogg) Date: Mon, 13 Aug 2007 22:34:24 +0200 Subject: [Catalog-sig] ip 194.183.146.189 blocked In-Reply-To: <46C0BCFF.1090406@v.loewis.de> References: <8F1F0605-B424-4597-BADF-1496BDBFC2C1@lovelysystems.com> <4689F923.8030304@v.loewis.de> <5B0A8BC7-CC65-49E6-AA15-CCF591A0EA41@lovelysystems.com> <46C0BCFF.1090406@v.loewis.de> Message-ID: <9B342026-7B4F-4BFB-AF84-81DF6ACAFBB3@lovelysystems.com> On 13.08.2007, at 22:20, Martin v. L?wis wrote: > Jodok Batlogg schrieb: >> i'm sorry, this ip still seems to be blocked. >> to make sure the outgoing network connection is working i just >> connected >> to the next higher ip (svn.python.org) - and this works. >> >> would you mind fixing it? > > I've checked now - this IP was null-routed, probably because it caused > an overload at some point in the past. I've removed the routing entry, > so please try again now. works like a charm thanks jodok > > Regards, > Martin -- "Flat is better than nested." -- The Zen of Python, by Tim Peters Jodok Batlogg, Lovely Systems Schmelzh?tterstra?e 26a, 6850 Dornbirn, Austria phone: +43 5572 908060, fax: +43 5572 908060-77 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2454 bytes Desc: not available Url : http://mail.python.org/pipermail/catalog-sig/attachments/20070813/6ddcde2a/attachment.bin From bjorn at exoweb.net Tue Aug 14 09:18:37 2007 From: bjorn at exoweb.net (=?ISO-8859-1?Q?Bj=F8rn_Stabell?=) Date: Tue, 14 Aug 2007 15:18:37 +0800 Subject: [Catalog-sig] A first step at improving PyPI: the "egg" command In-Reply-To: <46C01D18.7070301@v.loewis.de> References: <46C01D18.7070301@v.loewis.de> Message-ID: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net> Hi all, I think there's a lot to gain for Python by improving PyPI, and I'm willing to help. I did help a bit with PyPI at last year's EuroPython sprint, and was then made aware of http://wiki.python.org/ moin/CheeseShopDev - is this the most up-to-date plans for PyPI? If you're in a hurry and don't want to read everything; 1) I've created a little app to help prototype how we can do better egg/package management at http://contrib.exoweb.net/trac/browser/egg/ 2) I'd like feedback, and pointers to how I can help more. Basically, the problems I would like to work on solving are: 1) Simplifying/enabling discovery of packages 2) Simplifying/enabling management of packages 3) Improving quality and usefulness of package index From a usability point-of view I'd like to focus on the requirements for the Python newbie, someone that has just discovered Python, but is probably used to package management systems from Linux distributions, FreeBSD, and other dynamic languages like Perl and Ruby (these are also the systems I have experience with, so I'm pulling ideas from them). Ideally everything should be (following Steve Krug's "Don't Make Me Think" recommendations) self-evident, and if that's not possible, at least self-explanatory. Someone put in front of a keyboard without having read any docs should be able to find, install, manage, and perhaps even create Python packages. Better usability will of course benefit everyone, not just beginners. I'm frankly amazed at how people that have programmed Python for years don't really know or use PyPI. I'm convinced making more of Python package system discoverable and easily accessible will greatly improve the adoption of Python, the number of Python packages, and the quality of these packages. I think the typical use cases would be (in order of importance, based on what a typical user would encounter first): * Find available eggs for a particular topic online * Get more information about an egg * Install an egg (and its dependencies) * See which eggs are installed * Upgrade some or all outdated eggs * Remove/uninstall an egg * Create an egg * Find eggs that are plugins for some framework online NAMING So, first of all we'll need either one command, or a set of similarly named commands, to do discovery, installation, and management of packages, as these are common end-user actions. Creation of packages is a bit more advanced, and could be in another command. If there's general agreement that Python eggs is the future way of distributing packages, why not call the command "egg", similar to the way many other package managers are named after the packages, e.g., rpm, port, gem? I'll assume that's the case. Next, where do you find eggs? This might not be a big issue if the "egg" command is configured properly by default, but I'd offer my thoughts. I know the cheeseshop just changed name back to PyPI again. In my opinion, neither of the names are good in that they don't help people remember; any Monty Python connection is lost on the big masses, and PyPI is hard to spell, not very obvious, and a confusing clash with the also-prominent PyPy project. Why not call the place for eggs just eggs? I.e., http://eggs.python.org/ So we'd have the command "egg" for managing eggs that are by default found at "eggs.python.org". I think it's hard to make Python package management more obvious that this. The goal is to get someone that is new to Python to remember how to get and where to find packages, so obvious is a good thing. THE COMMAND LINE PACKAGE MANAGEMENT TOOL The "egg" command should enable you to at least find, show info for, install, and uninstall packages. I think the most common way to do command line tools like this is to offer sub-commands, a la, bzr, port, svn, apt-get, gem, so I suggest: egg - list out a help of commands egg search - search for eggs (aliases: find/list) egg info - show info for egg (aliases: show/details) egg install - install named eggs egg uninstall - uninstall eggs (aliases: remove/purge/delete) so you can do: egg search bittorrent to find all packages that have anything to do with bittorrent (full- text search of the package index), and then: egg install iTorrent to actually download and install the package. PROTOTYPE I've built a command that works this way, implementing most (except the last) of the use cases at least partiall. You can give it a go as follows: # install prerequisities on your platform # e.g., sudo apt-get install python-setuptools sqlite3 libsqlite3-0 python-pysqlite2 svn co http://contrib.exoweb.net/svn/egg/ cd egg sudo python setup.py develop # should install storm for you gzip -dc pypi.sql.gz | sqlite3 ~/.pythoneggs.db # bootstrap cache egg sync # update cache It's still incomplete, lacking tests, might only work on unix-y computers, and is lacking support for lots of features like activation/deactivation, and upgrades, but it works for basic stuff like finding, installing, and uninstalling packages. Summary of the design: * Local and PyPI package information is synchronized into a local sqlite database for easy access * Storm is used for ORM (but could easily be changed) * Installation is handled by passing off the "egg install" command to "easy_install" * I'm using a non-standard command-line parser (but could easily be changed) * For interactive use on terminals that supports it: colorizes and adjusts text to fit While doing the synchronization with PyPI I discovered a couple of issues, described below, that makes the application unfit for common use yet. (Eg., it has to query the PyPI for each of the packages.) Most subcommands take arguments that can be a free mix of set names and query strings. I thought this would make for the most forgiving and user-friendly interface. These are filters; by default all eggs match. SETS: Eggs have a few attributes that can be used to limit to a subset of all eggs, e.g., whether it is installed, active, oudated, local, or remote. Specifying several of these creates a join of the sets, it further limits the number of eggs. QUERY STRINGS: If none of the set names are matched, the argument is assumed to be a query string. Many subcommands like "search" do a full-text search of the package cache database. Others, like "list", will do a substring match of package names. Others, like "install" will require you to match the name exactly. You can specify a specific version by adding a slash, e.g., "name/version". Here are some example commands: egg list installed sql - list all installed eggs having sql in their name egg search installed sql - list all installed eggs mentioning sql anywhere in the package metadata egg list oudated installed - list all outdated installed eggs egg list oudated active - list all outdated and active (and installed) eggs egg uninstall outdated - uninstall all oudated eggs egg info pysqlite - show information about pysqlite egg info pysqlite/2.0.0 - show information about version 2.0.0 of pysqlite egg sync local - rescan local packages and update cache db PYPI IMPROVEMENT SUGGESTIONS While doing the application I discovered one important missing feature: PyPI doesn't offer a way to programatically bulk-download information about all eggs, as is customary for many other packaging systems. This means "egg sync" will have to fetch the information for each package individually. I think it wouldn't be hard to offer a compressed XML file with all of the package information, suitable for download. A minor nuiscence is that there's no way to get only eggs/ distributions; PyPI lists packages, and some packages don't even have any eggs. The "egg" command will try to download each of these empty packages at each sync (since it treats empty packages as "packages for which we haven't downloaded eggs for yet"). It might be better to list eggs/distributions instead of packages. There's a lot of opportunity in improving the consistency and usefulness of package metainformation. Once you have it all sync'ed to a local SQlite database and start snooping around, it'll be pretty obvious; very few packages use the dependencies etc. (In fact, I think the dependencies/obsoletes definitions are overengineered; we could get by with just a simple package >= version number). Many people use other platform-specific packaging system to manage Python packages, probably both because this gives dependencies to other non-Python packages, but also because PyPI hasn't been very useful or easy to use. It may even be asked what the role of PyPI is since it's never going to replace platform-specific packaging systems; then should it support them? How? In any case, installing Python packages from different packaging systems would result in problems, and currently "egg" can't find Python packages installed using other systems. ("Yolk" has some support for discovering Python packages installed using Gentoo.) Optional: These days XMLRPC (and the WS-Deathstar) seems to be losing steam to REST, so I think we'd gain a lot of "hackability" by enabling a REST interface for accessing packages. Eventually we probably need to enforce package signing. EGG IDEAS It'd be good for "egg" to support both system- and user-wide configurations, and to support downloading from several package indexes, like apt-get does. Perhaps "egg" should keep the uninstalled packages in a cache, like apt-get and I believe buildout. Perhaps "egg" should provide a simple web server to allow browsing (and perhaps installation from) local packages (I believe the Ruby guys have this). If this web server should be discoverable via Bonjour/Zeroconf, then all that's needed to set up a cache of PyPI is to run an egg server (that people on the net auto-discovers) and regularly download all packages. How could "egg" work with "buildout"? Should buildout be used for project-specific egg installations? Rgds, Bjorn From jim at zope.com Tue Aug 14 15:40:54 2007 From: jim at zope.com (Jim Fulton) Date: Tue, 14 Aug 2007 09:40:54 -0400 Subject: [Catalog-sig] A first step at improving PyPI: the "egg" command In-Reply-To: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net> References: <46C01D18.7070301@v.loewis.de> <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net> Message-ID: <1389093E-9567-45FA-9BD6-3A7CEDB95167@zope.com> On Aug 14, 2007, at 3:18 AM, Bj?rn Stabell wrote: > Hi all, > > > I think there's a lot to gain for Python by improving PyPI, and I'm > willing to help. Great! > I did help a bit with PyPI at last year's > EuroPython sprint, and was then made aware of http://wiki.python.org/ > moin/CheeseShopDev - is this the most up-to-date plans for PyPI? > > If you're in a hurry and don't want to read everything; > > 1) I've created a little app to help prototype how we can do better > egg/package management at http://contrib.exoweb.net/trac/browser/egg/ I get prompted for a password for that, (More ideas than I have time to absorb at the moment snipped.) ... I think you need to raise this on the distutils sig as well. Thats' where setuptools is discussed and much of what you describe is addressed to some degree by setuptools. It is still my opinion that the distutils-sig and catalog-sig should be combined. Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From constant.beta at gmail.com Tue Aug 14 17:49:13 2007 From: constant.beta at gmail.com (=?ISO-8859-2?Q?Micha=B3_Kwiatkowski?=) Date: Tue, 14 Aug 2007 17:49:13 +0200 Subject: [Catalog-sig] A first step at improving PyPI: the "egg" command In-Reply-To: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net> References: <46C01D18.7070301@v.loewis.de> <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net> Message-ID: <5e8b0f6b0708140849u784e40edvd60e17cbdd91f205@mail.gmail.com> On 8/14/07, Bj?rn Stabell wrote: > THE COMMAND LINE PACKAGE MANAGEMENT TOOL > The "egg" command should enable you to at least find, show info for, > install, and uninstall packages. I think the most common way to do > command line tools like this is to offer sub-commands, a la, bzr, > port, svn, apt-get, gem, so I suggest: [snip] > It's still incomplete, lacking tests, might only work on unix-y > computers, and is lacking support for lots of features like > activation/deactivation, and upgrades, but it works for basic stuff > like finding, installing, and uninstalling packages. Please take a look at yolk (http://tools.assembla.com/yolk/), a tool for obtaining information about PyPI and locally installed packages. It's been developed for more than half a year now, so I'm sure that you'll find stable pieces of code there for inclusion. Maybe a merge would be the best thing to do? I'm CC-ing Rob Cakebread, yolk author, so he can voice his opinion. Cheers, mk From bjorn at exoweb.net Tue Aug 14 18:15:41 2007 From: bjorn at exoweb.net (=?ISO-8859-1?Q?Bj=F8rn_Stabell?=) Date: Wed, 15 Aug 2007 00:15:41 +0800 Subject: [Catalog-sig] A first step at improving PyPI: the "egg" command In-Reply-To: <1389093E-9567-45FA-9BD6-3A7CEDB95167@zope.com> References: <46C01D18.7070301@v.loewis.de> <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net> <1389093E-9567-45FA-9BD6-3A7CEDB95167@zope.com> Message-ID: On Aug 14, 2007, at 21:40, Jim Fulton wrote: >> If you're in a hurry and don't want to read everything; >> >> 1) I've created a little app to help prototype how we can do better >> egg/package management at http://contrib.exoweb.net/trac/browser/ >> egg/ > > I get prompted for a password for that, Hi Jim, Ooops! Thanks for the heads up. It should work now. > (More ideas than I have time to absorb at the moment snipped.) Yeah, I was afraid I was trying to communicate too much in one single email. > ... > > I think you need to raise this on the distutils sig as well. > Thats' where setuptools is discussed and much of what you describe > is addressed to some degree by setuptools. Okay, I'll subscribe. > It is still my opinion that the distutils-sig and catalog-sig > should be combined. Sounds like a good idea. I was finding a lot of the ideas/thoughts were related to setuptools and PyPI at the same time. They're really the client- and server-side components of the same thing. Rgds, Bjorn From bjorn at exoweb.net Tue Aug 14 18:24:59 2007 From: bjorn at exoweb.net (=?ISO-8859-1?Q?Bj=F8rn_Stabell?=) Date: Wed, 15 Aug 2007 00:24:59 +0800 Subject: [Catalog-sig] A first step at improving PyPI: the "egg" command In-Reply-To: <5e8b0f6b0708140849u784e40edvd60e17cbdd91f205@mail.gmail.com> References: <46C01D18.7070301@v.loewis.de> <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net> <5e8b0f6b0708140849u784e40edvd60e17cbdd91f205@mail.gmail.com> Message-ID: On Aug 14, 2007, at 23:49, Micha? Kwiatkowski wrote: > On 8/14/07, Bj?rn Stabell wrote: >> THE COMMAND LINE PACKAGE MANAGEMENT TOOL >> The "egg" command should enable you to at least find, show info for, >> install, and uninstall packages. I think the most common way to do >> command line tools like this is to offer sub-commands, a la, bzr, >> port, svn, apt-get, gem, so I suggest: > [snip] >> It's still incomplete, lacking tests, might only work on unix-y >> computers, and is lacking support for lots of features like >> activation/deactivation, and upgrades, but it works for basic stuff >> like finding, installing, and uninstalling packages. > > Please take a look at yolk (http://tools.assembla.com/yolk/), a tool > for obtaining information about PyPI and locally installed packages. > It's been developed for more than half a year now, so I'm sure that > you'll find stable pieces of code there for inclusion. Maybe a merge > would be the best thing to do? I'm CC-ing Rob Cakebread, yolk author, > so he can voice his opinion. I already looked at yolk (I liked it) and enstaller (only Windows, it seems), and blogged about it at: http://stabell.org/2007/07/28/pypi-yolk-httplib2/ And now I just discovered there's something called PythonEggTools in the PyPI. I agree we should join forces. I'm doing the egg thing because I wanted: * to see how a subcommand interface would work (a la svn/gem/port/ aptitutde) * a cache (like apt-get) that's easily queryable (I'm in China; the net is slow) * to link into easy_install/ uninstall etc so it's a comprehensive utility Rgds, Bjorn From paul at boddie.org.uk Wed Aug 15 00:37:57 2007 From: paul at boddie.org.uk (Paul Boddie) Date: Wed, 15 Aug 2007 00:37:57 +0200 Subject: [Catalog-sig] A first step at improving PyPI: the "egg" command Message-ID: <200708150037.57652.paul@boddie.org.uk> Bj?rn Stabell wrote: > > Basically, the problems I would like to work on solving are: > > 1) Simplifying/enabling discovery of packages > 2) Simplifying/enabling management of packages > 3) Improving quality and usefulness of package index I think we can all agree that these are noble objectives. :-) > From a usability point-of view I'd like to focus on the requirements > for the Python newbie, someone that has just discovered Python, but > is probably used to package management systems from Linux > distributions, FreeBSD, and other dynamic languages like Perl and > Ruby (these are also the systems I have experience with, so I'm > pulling ideas from them). I've been moderately negative about evolving a parallel infrastructure to other package and dependency management systems in the past, and I'm not enthusiastic about things like CPAN or language-specific equivalents. The first thing most people using a GNU/Linux or *BSD distribution are likely to wonder is, "Where are the Python packages in my package selector?" There are exceptions, of course. Some people may be sufficiently indoctrinated in the ways of Python, which I doubt is the case for a lot of people looking for packages. Others may be working in restricted environments where system package management tools don't really help. And people coming from Perl might wonder where the CPAN equivalent is, but they should also remind themselves what the system provides - they have manpages for Perl, after all. It's nice to see someone looking at existing tools, though. > Ideally everything should be (following Steve Krug's "Don't Make Me > Think" recommendations) self-evident, and if that's not possible, at > least self-explanatory. Someone put in front of a keyboard without > having read any docs should be able to find, install, manage, and > perhaps even create Python packages. Better usability will of course > benefit everyone, not just beginners. I'm frankly amazed at how > people that have programmed Python for years don't really know or use > PyPI. I'm convinced making more of Python package system > discoverable and easily accessible will greatly improve the adoption > of Python, the number of Python packages, and the quality of these > packages. There are many people who don't know about other parts of the python.org infrastructure besides PyPI, notably the Wiki. However, you have to take into account communities which are not centred on python.org. [...] I've read through the text that I've mercilessly cut from this response, and I admire the scope of this effort, but I do wonder whether we couldn't make use of existing projects (as others have noted), and not only at the Python-specific level, especially since the user interface to the "egg" tool seems to strongly resemble other established tools - as you seem to admit in this and later messages, Bj?rn. > PYPI IMPROVEMENT SUGGESTIONS > > While doing the application I discovered one important missing > feature: PyPI doesn't offer a way to programatically bulk-download > information about all eggs, as is customary for many other packaging > systems. This means "egg sync" will have to fetch the information > for each package individually. I think it wouldn't be hard to offer > a compressed XML file with all of the package information, suitable > for download. I was thinking of re-using the Debian indexing strategy. It's very simple, perhaps almost quaintly so, but a lot of the problems revealed with the current strategies around PyPI (not exactly mitigated by bizarre tool-related constraints) could be solved by adopting existing well-worn techniques. [...] > There's a lot of opportunity in improving the consistency and > usefulness of package metainformation. Once you have it all sync'ed > to a local SQlite database and start snooping around, it'll be pretty > obvious; very few packages use the dependencies etc. (In fact, I > think the dependencies/obsoletes definitions are overengineered; we > could get by with just a simple package >= version number). If I recall correctly, the PEP concerned just "bailed" on the version numbering and dependency management issue, despite seeming to be inspired by Debian or RPM-style syntax. > Many people use other platform-specific packaging system to manage > Python packages, probably both because this gives dependencies to > other non-Python packages, but also because PyPI hasn't been very > useful or easy to use. It may even be asked what the role of PyPI is > since it's never going to replace platform-specific packaging > systems; then should it support them? How? In any case, installing > Python packages from different packaging systems would result in > problems, and currently "egg" can't find Python packages installed > using other systems. ("Yolk" has some support for discovering Python > packages installed using Gentoo.) As I've said before, it's arguably best to work with whatever is already there, particularly because of the "interface" issue you mention with non-Python packages. I suppose the apparent lack of an open and widespread package/dependency management system on Windows (and some UNIX flavours) can be used as a justification to write something entirely new, but I imagine that only very specific tools need writing in order to make existing distribution mechanisms work with Windows - there's no need to duplicate existing work from end to end "just because". > Optional: These days XMLRPC (and the WS-Deathstar) seems to be losing > steam to REST, so I think we'd gain a lot of "hackability" by > enabling a REST interface for accessing packages. > > Eventually we probably need to enforce package signing. Agreed. And by adopting existing mechanisms, we can hopefully avoid having to reinvent their feature sets, too. Paul P.S. Sorry if this sounds a bit negative, but I've been reading the archives of the catalog-sig for a while now, and it's a bit painful reading about how sensitive various projects are to downtime in PyPI, how various workarounds have been devised with accompanying whisper campaigns to tell people where unofficial mirrors are, all whilst the business of package distribution continues uninterrupted in numerous other communities. If I had a critical need to get Python packages directly from their authors to run on a Windows machine, for example, I'd want to know how to do so via a Debian package channel or something like that. This isn't original thought: I'm sure that Ximian Red Carpet and Red Hat Network address many related issues. From bjorn at exoweb.net Wed Aug 15 02:15:48 2007 From: bjorn at exoweb.net (=?ISO-8859-1?Q?Bj=F8rn_Stabell?=) Date: Wed, 15 Aug 2007 08:15:48 +0800 Subject: [Catalog-sig] PyPI - Evolve our own or reuse existing package systems? In-Reply-To: <200708150037.57652.paul@boddie.org.uk> References: <200708150037.57652.paul@boddie.org.uk> Message-ID: <212C47AD-A14B-4D81-B6DE-3AEF13846D95@exoweb.net> (Since my email was a bit long and wide I'm trying to update the subject when the response is rather focused.) On Aug 15, 2007, at 06:37, Paul Boddie wrote: > Bj?rn Stabell wrote: [...] > I've been moderately negative about evolving a parallel > infrastructure to > other package and dependency management systems in the past, and > I'm not > enthusiastic about things like CPAN or language-specific > equivalents. The > first thing most people using a GNU/Linux or *BSD distribution are > likely to > wonder is, "Where are the Python packages in my package selector?" > > There are exceptions, of course. Some people may be sufficiently > indoctrinated > in the ways of Python, which I doubt is the case for a lot of > people looking > for packages. Others may be working in restricted environments > where system > package management tools don't really help. And people coming from > Perl might > wonder where the CPAN equivalent is, but they should also remind > themselves > what the system provides - they have manpages for Perl, after all. [...] > I've read through the text that I've mercilessly cut from this > response, and I > admire the scope of this effort, but I do wonder whether we > couldn't make use > of existing projects (as others have noted), and not only at the > Python-specific level, especially since the user interface to the > "egg" tool > seems to strongly resemble other established tools - as you seem to > admit in > this and later messages, Bj?rn. [...] > I was thinking of re-using the Debian indexing strategy. It's very > simple, > perhaps almost quaintly so, but a lot of the problems revealed with > the > current strategies around PyPI (not exactly mitigated by bizarre > tool-related > constraints) could be solved by adopting existing well-worn > techniques. [...] > If I recall correctly, the PEP concerned just "bailed" on the version > numbering and dependency management issue, despite seeming to be > inspired by > Debian or RPM-style syntax. [...] > As I've said before, it's arguably best to work with whatever is > already > there, particularly because of the "interface" issue you mention with > non-Python packages. I suppose the apparent lack of an open and > widespread > package/dependency management system on Windows (and some UNIX > flavours) can > be used as a justification to write something entirely new, but I > imagine > that only very specific tools need writing in order to make existing > distribution mechanisms work with Windows - there's no need to > duplicate > existing work from end to end "just because". [...] > Agreed. And by adopting existing mechanisms, we can hopefully avoid > having to > reinvent their feature sets, too. > > P.S. Sorry if this sounds a bit negative, but I've been reading the > archives > of the catalog-sig for a while now, and it's a bit painful reading > about how > sensitive various projects are to downtime in PyPI, how various > workarounds > have been devised with accompanying whisper campaigns to tell > people where > unofficial mirrors are, all whilst the business of package > distribution > continues uninterrupted in numerous other communities. > > If I had a critical need to get Python packages directly from their > authors to > run on a Windows machine, for example, I'd want to know how to do > so via a > Debian package channel or something like that. This isn't original > thought: > I'm sure that Ximian Red Carpet and Red Hat Network address many > related > issues. There seems to be two issues: 1) Should Python have its own package management system (with dependencies etc) in parallel with what's already on many platforms (at least Linux and OS X)? Anyone that has worked with two parallel package management systems knows that dependencies are hellish. * If you mix and match you often end up with two of everything. * It'll be incomplete because you can't easily specify dependencies to non-Python packages. 2) If we agree Python should have a package management system, should we build or repurpose some other one? * I think it's a matter of pride and proof of concept to have one written in Python. That doesn't mean we can't get ideas from others. * It's also not that hard to do. The prototype I threw up took one weekend + half a day, and consists of about 500 lines of new code. It could be refactored and made smaller, but even if a complete version is ten times the size of that, it's still not a huge undertaking. * With a Python version we could relatively easily innovate beyond what traditional packaging systems do; ports and apt are pretty much stagnated. I think RubyGems seems to have some cool features, features that probably wouldn't have happened if they were using ports or apt-get (but then they could piggyback on innovations in those tools, I guess). If it works for them, why shouldn't it work for us? * It would have to be as portable as Python is; many packaging systems are by nature relatively platform-specific. * If we don't build our own, doesn't that mean we throw out eggs? * Packaging systems are useful for mega frameworks like Zope, TurboGears, and Django, and slightly less so for projects you roll on your own, to manage distribution and installation of plugins and addons. Relying on platform-specific packaging systems for these may not work that well. (But I could be wrong about that.) That said, it might be possible to do some kind of hybrid, for PyPI to be a "meta package" repository that can easily feed into platform specific packaging systems. And to perhaps also have a client-side "meta package manager" that will call upon the platform-specific package manager to install stuff. It looks like, for example, ports have targets to build to other systems, e.g., pkg, mpkg, dmg, rpm, srpm, dpkg. So maintaining package information in (or compatible with) ports could make it easy to feed packages into other package systems. * Benefit: We're working with other package systems, just making it easier to get Python packages into them. * Drawback: They may not want to include all packages, at the speed at which we want, or the way we want to. (I.e., there may still be packages you'd want that are only available on PyPI.) * Drawback: Some systems don't have package systems. Which brings me to: If we're just distributing source files why don't we use a source control system such as svn, bzr, or hg? The package developers have trunk, PyPI is a branch, the platform-specific package maintainers have a branch, and what's installed onto your system is in the end a branch (serially connected). Some systems, like Subversion, can also include externals like I did with cliutils on the egg package. Just a thought. Rgds, Bjorn From arve.knudsen at gmail.com Wed Aug 15 15:34:41 2007 From: arve.knudsen at gmail.com (Arve Knudsen) Date: Wed, 15 Aug 2007 15:34:41 +0200 Subject: [Catalog-sig] [Distutils] PyPI - Evolve our own or reuse existing package systems? In-Reply-To: <212C47AD-A14B-4D81-B6DE-3AEF13846D95@exoweb.net> References: <200708150037.57652.paul@boddie.org.uk> <212C47AD-A14B-4D81-B6DE-3AEF13846D95@exoweb.net> Message-ID: Hei Bj?rn :) These are some interesting points you are making. I have in fact been developing a general software deployment system, Conduit, in Python for some time, that is capable of supporting several major platforms (at the moment: Linux, Windows and OS X). It's not reached any widespread use, but we (at the Simula Research Laboratory) are using it to distribute software to students attending some courses at the university of Oslo. Right now we are in the middle of preparing for the semester start, which is next week. The system is designed to be general, both with regard to the target platform and the deployable software. I've solved the distribution problem by constructing an XML-RPC Web service, that serves information about software projects in RDF (based on the DOAP format). This distribution service is general and independent of the installation system, which acts as a client of the latter. If this sounds interesting to you I'd love if you checked it out and gave me some feedback. It is an experimental project, and as such we are definitely interested in ideas/help from others. Arve On 8/15/07, Bj?rn Stabell wrote: > > (Since my email was a bit long and wide I'm trying to update the > subject when the response is rather focused.) > > On Aug 15, 2007, at 06:37, Paul Boddie wrote: > > Bj?rn Stabell wrote: > [...] > > I've been moderately negative about evolving a parallel > > infrastructure to > > other package and dependency management systems in the past, and > > I'm not > > enthusiastic about things like CPAN or language-specific > > equivalents. The > > first thing most people using a GNU/Linux or *BSD distribution are > > likely to > > wonder is, "Where are the Python packages in my package selector?" > > > > There are exceptions, of course. Some people may be sufficiently > > indoctrinated > > in the ways of Python, which I doubt is the case for a lot of > > people looking > > for packages. Others may be working in restricted environments > > where system > > package management tools don't really help. And people coming from > > Perl might > > wonder where the CPAN equivalent is, but they should also remind > > themselves > > what the system provides - they have manpages for Perl, after all. > [...] > > I've read through the text that I've mercilessly cut from this > > response, and I > > admire the scope of this effort, but I do wonder whether we > > couldn't make use > > of existing projects (as others have noted), and not only at the > > Python-specific level, especially since the user interface to the > > "egg" tool > > seems to strongly resemble other established tools - as you seem to > > admit in > > this and later messages, Bj?rn. > [...] > > I was thinking of re-using the Debian indexing strategy. It's very > > simple, > > perhaps almost quaintly so, but a lot of the problems revealed with > > the > > current strategies around PyPI (not exactly mitigated by bizarre > > tool-related > > constraints) could be solved by adopting existing well-worn > > techniques. > [...] > > If I recall correctly, the PEP concerned just "bailed" on the version > > numbering and dependency management issue, despite seeming to be > > inspired by > > Debian or RPM-style syntax. > [...] > > As I've said before, it's arguably best to work with whatever is > > already > > there, particularly because of the "interface" issue you mention with > > non-Python packages. I suppose the apparent lack of an open and > > widespread > > package/dependency management system on Windows (and some UNIX > > flavours) can > > be used as a justification to write something entirely new, but I > > imagine > > that only very specific tools need writing in order to make existing > > distribution mechanisms work with Windows - there's no need to > > duplicate > > existing work from end to end "just because". > [...] > > Agreed. And by adopting existing mechanisms, we can hopefully avoid > > having to > > reinvent their feature sets, too. > > > > P.S. Sorry if this sounds a bit negative, but I've been reading the > > archives > > of the catalog-sig for a while now, and it's a bit painful reading > > about how > > sensitive various projects are to downtime in PyPI, how various > > workarounds > > have been devised with accompanying whisper campaigns to tell > > people where > > unofficial mirrors are, all whilst the business of package > > distribution > > continues uninterrupted in numerous other communities. > > > > If I had a critical need to get Python packages directly from their > > authors to > > run on a Windows machine, for example, I'd want to know how to do > > so via a > > Debian package channel or something like that. This isn't original > > thought: > > I'm sure that Ximian Red Carpet and Red Hat Network address many > > related > > issues. > > There seems to be two issues: > > 1) Should Python have its own package management system (with > dependencies etc) in parallel with what's already on many platforms > (at least Linux and OS X)? Anyone that has worked with two parallel > package management systems knows that dependencies are hellish. > > * If you mix and match you often end up with two of everything. > > * It'll be incomplete because you can't easily specify > dependencies to non-Python packages. > > > 2) If we agree Python should have a package management system, should > we build or repurpose some other one? > > * I think it's a matter of pride and proof of concept to have one > written in Python. That doesn't mean we can't get ideas from others. > > * It's also not that hard to do. The prototype I threw up took > one weekend + half a day, and consists of about 500 lines of new > code. It could be refactored and made smaller, but even if a > complete version is ten times the size of that, it's still not a huge > undertaking. > > * With a Python version we could relatively easily innovate beyond > what traditional packaging systems do; ports and apt are pretty much > stagnated. I think RubyGems seems to have some cool features, > features that probably wouldn't have happened if they were using > ports or apt-get (but then they could piggyback on innovations in > those tools, I guess). If it works for them, why shouldn't it work > for us? > > * It would have to be as portable as Python is; many packaging > systems are by nature relatively platform-specific. > > * If we don't build our own, doesn't that mean we throw out eggs? > > * Packaging systems are useful for mega frameworks like Zope, > TurboGears, and Django, and slightly less so for projects you roll on > your own, to manage distribution and installation of plugins and > addons. Relying on platform-specific packaging systems for these may > not work that well. (But I could be wrong about that.) > > > That said, it might be possible to do some kind of hybrid, for PyPI > to be a "meta package" repository that can easily feed into platform > specific packaging systems. And to perhaps also have a client-side > "meta package manager" that will call upon the platform-specific > package manager to install stuff. > > It looks like, for example, ports have targets to build to other > systems, e.g., pkg, mpkg, dmg, rpm, srpm, dpkg. So maintaining > package information in (or compatible with) ports could make it easy > to feed packages into other package systems. > > * Benefit: We're working with other package systems, just making > it easier to get Python packages into them. > > * Drawback: They may not want to include all packages, at the > speed at which we want, or the way we want to. (I.e., there may > still be packages you'd want that are only available on PyPI.) > > * Drawback: Some systems don't have package systems. > > > Which brings me to: If we're just distributing source files why don't > we use a source control system such as svn, bzr, or hg? The package > developers have trunk, PyPI is a branch, the platform-specific > package maintainers have a branch, and what's installed onto your > system is in the end a branch (serially connected). Some systems, > like Subversion, can also include externals like I did with cliutils > on the egg package. Just a thought. > > > Rgds, > Bjorn > _______________________________________________ > Distutils-SIG maillist - Distutils-SIG at python.org > http://mail.python.org/mailman/listinfo/distutils-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/catalog-sig/attachments/20070815/c33a2a13/attachment.html From bjorn at exoweb.net Thu Aug 16 10:28:12 2007 From: bjorn at exoweb.net (=?ISO-8859-1?Q?Bj=F8rn_Stabell?=) Date: Thu, 16 Aug 2007 16:28:12 +0800 Subject: [Catalog-sig] [Distutils] PyPI - Evolve our own or reuse existing package systems? In-Reply-To: References: <200708150037.57652.paul@boddie.org.uk> <212C47AD-A14B-4D81-B6DE-3AEF13846D95@exoweb.net> Message-ID: <865235D6-2674-4B78-ABE8-4DFC5919576E@exoweb.net> On Aug 15, 2007, at 21:34, Arve Knudsen wrote: > These are some interesting points you are making. I have in fact > been developing a general software deployment system, Conduit, in > Python for some time, that is capable of supporting several major > platforms (at the moment: Linux, Windows and OS X). It's not > reached any widespread use, but we (at the Simula Research > Laboratory) are using it to distribute software to students > attending some courses at the university of Oslo. Right now we are > in the middle of preparing for the semester start, which is next week. > > The system is designed to be general, both with regard to the > target platform and the deployable software. I've solved the > distribution problem by constructing an XML-RPC Web service, that > serves information about software projects in RDF (based on the > DOAP format). This distribution service is general and independent > of the installation system, which acts as a client of the latter. > > If this sounds interesting to you I'd love if you checked it out > and gave me some feedback. It is an experimental project, and as > such we are definitely interested in ideas/help from others. Hi Arve, That's an interesting coincidence! :) Without turning it into a big research project, it would be interesting to hear what you (honestly) thought were the strengths and weaknesses of Conduit compared to, say, deb/rpm/ports/emerge, whichever you have experience with. I did download and look at Conduit, but haven't tried it yet. There are so many ways to take this, and so many "strategic" decisions that I'd hope people on the list could help out with. Personally I think it would be great if we had a strong Python-based central package system, perhaps based on Conduit. I'm pretty sure Conduit would have to have the client and server-side components even more clearly separated, though, and the interface between them open and clearly defined (which I think it is, but it would have to be discussed). I see Conduit (and PyPI) supports DOAP, and looking around I also found http://python.codezoo.com/ by O'Reilly; it also seems to have a few good ideas, for example voting and some quality control (although that's a very difficult decision, I guess). Rgds, Bjorn From eu at lbruno.org Thu Aug 16 19:12:12 2007 From: eu at lbruno.org (=?UTF-8?Q?Lu=C3=ADs_Bruno?=) Date: Thu, 16 Aug 2007 18:12:12 +0100 Subject: [Catalog-sig] [Distutils] Simpler Python package management: the "egg" command In-Reply-To: <5500FDAF-34F5-4075-9ED7-2C849CA1F266@exoweb.net> References: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net> <5500FDAF-34F5-4075-9ED7-2C849CA1F266@exoweb.net> Message-ID: <7555ca2e0708161012j25b957d4q1f1a04c73715b7ec@mail.gmail.com> 'lo there! Bj?rn Stabell: > * Find available eggs for a particular topic online > * Get more information about an egg > * Install an egg (and its dependencies) > * See which eggs are installed > * Upgrade some or all outdated eggs > * Remove/uninstall an egg > * Create an egg > * Find eggs that are plugins for some framework online Having a checklist of use cases is useful, as others can add to it (or shoot items down). Thanks. > egg - list out a help of commands > egg search - search for eggs (aliases: find/list) > egg info - show info for egg (aliases: show/details) > egg install - install named eggs > egg uninstall - uninstall eggs (aliases: remove/purge/delete) > [...] > egg list installed sql - list all installed eggs having sql in their name > egg search installed sql - list all installed eggs mentioning sql anywhere [...] > egg list oudated installed - list all outdated installed eggs > egg list oudated active - list all outdated and active (and installed) eggs > egg uninstall outdated - uninstall all oudated eggs > egg info pysqlite - show information about pysqlite > egg info pysqlite/2.0.0 - show information about version 2.0.0 of pysqlite > egg sync local - rescan local packages and update cache db Sorry, but I think you meant apt-get instead of egg. No, I didn't search the archives. But making an apt-get repository (yum, emerge...) can't be *that* hard; it also can't be an uncommon idea. Someone must have suggested it before. On second thought, if I recall correctly Debian-style repositories have to update a master Packages catalog for *each* and *every* *single* upload. That's a -1. I think you've asked for a "sync local" master and I snipped it. Any other -1? We'll, getting the repositores updated for each single upload becomes O(N), but it's a small N anyway. One per supported repository format. > Optional: These days XMLRPC (and the WS-Deathstar) seems to be losing > steam to REST, so I think we'd gain a lot of "hackability" by > enabling a REST interface for accessing packages. Yep. Me likes dispatching on Accept: to get different responses. I think Apache can do it with a type-map. Gotta read up on the performance of it. That was an idea I stumbled upon during the recent PyPI discussions. Well, me likes flat files. > Eventually we probably need to enforce package signing. Heh, like the .deb and .rpm signatures? As I've said previously, I'd like to have a standard-type-repository for PyPI. If we're distributing binaries (has Phillip Eby said, sdist works *fine* for source tarballs ) there are already people working on that subject. Package signing's one of the for-free wheels we don't have to invent. Squared wheels and all that. So digital signatures' a +1. > EGG IDEAS > > [snip] > Perhaps "egg" should provide a simple web server to allow browsing > (and perhaps installation from) local packages. D*mn. Right now you just serve your .../site-packages and you can easy_install from it (I think Phillip Eby said as much recently). This standard-type-repository idea makes that a tad more difficult. > If this web server should be discoverable via > Bonjour/Zeroconf, then all that's needed to set up a cache of PyPI is > to run an egg server (that people on the net auto-discovers) and > regularly download all packages. Maybe regenerating a bunch of static files isn't that difficult anyway; do it before serving content. Well, you're gonna run a local PyPI copy; might as well run the PyPI code anyway. And now the collective asks: who the fsck is this Luis Bruno idiot? Just one more Python user with some free time on his hands. Greetings, -- Luis Bruno From eucci.group at gmail.com Thu Aug 16 20:20:21 2007 From: eucci.group at gmail.com (Jeff Shell) Date: Thu, 16 Aug 2007 12:20:21 -0600 Subject: [Catalog-sig] [Distutils] PyPI - Evolve our own or reuse existing package systems? In-Reply-To: <865235D6-2674-4B78-ABE8-4DFC5919576E@exoweb.net> References: <200708150037.57652.paul@boddie.org.uk> <212C47AD-A14B-4D81-B6DE-3AEF13846D95@exoweb.net> <865235D6-2674-4B78-ABE8-4DFC5919576E@exoweb.net> Message-ID: <88d0d31b0708161120u54fc1d38ib50366d2125acc7c@mail.gmail.com> On 8/14/07, Bj?rn Stabell wrote: > There seems to be two issues: > > 1) Should Python have its own package management system (with > dependencies etc) in parallel with what's already on many platforms > (at least Linux and OS X)? Anyone that has worked with two parallel > package management systems knows that dependencies are hellish. > > * If you mix and match you often end up with two of everything. > > * It'll be incomplete because you can't easily specify > dependencies to non-Python packages. On that second bullet, tools like 'buildout' seem better equipped for handling those situations. Yesterday I saw a `buildout.cfg` for building and testing `lxml` against fresh downloads and builds of libxml2 and libxslt. It downloaded and built those two things locally before getting, building, and installing the `lxml` egg locally. Platform package management terrifies me. I work in Python far more than I work in a particular operating system (even though our office is pretty much Mac OS X and FreeBSD based). It's very easy for our servers to get stuck at a particular FreeBSD version, while the ports move on. Eventually they get so out of date that ports are pretty much unusable. > 2) If we agree Python should have a package management system, should > we build or repurpose some other one? > > [snip] > > * With a Python version we could relatively easily innovate beyond > what traditional packaging systems do; ports and apt are pretty much > stagnated. I think RubyGems seems to have some cool features, > features that probably wouldn't have happened if they were using > ports or apt-get (but then they could piggyback on innovations in > those tools, I guess). If it works for them, why shouldn't it work > for us? I agree. > * It would have to be as portable as Python is; many packaging > systems are by nature relatively platform-specific. You could change "have to" to "gets to" there. :). This is a big plus -- I know how easy_install and `gem` work as I use them far more frequently on both my desktop and various servers than any platform-specific packaging system. > * Packaging systems are useful for mega frameworks like Zope, > TurboGears, and Django, and slightly less so for projects you roll on > your own, to manage distribution and installation of plugins and > addons. Relying on platform-specific packaging systems for these may > not work that well. (But I could be wrong about that.) Personally, I think packaging systems are worse here. But I just may be a control freak... And I've had the luxury of Zope being a big self contained package for quite some time. Now that it's breaking into smaller pieces, it gets a bit more complex, but the combination of `setuptools` and `buildout` seem to be doing their jobs admirably. Relatively admirably. Once you have Ruby and Gems, Ruby on Rails installs with just one line:: gem install rails --include-dependencies I think Pylons and/or Turbogears does just about the same..? It's been a while since I looked at either of them. But that one line is a lot easier to work with than:: If running Debian, run ``apt-get ....`` If running RedHat or RPM system, .... If running Mac OS X with MacPorts, run ... If running ... then ... > That said, it might be possible to do some kind of hybrid, for PyPI > to be a "meta package" repository that can easily feed into platform > specific packaging systems. And to perhaps also have a client-side > "meta package manager" that will call upon the platform-specific > package manager to install stuff. For my own experience, that sounds worse. However, it would be nice if 'egg' could detect that certain things were installed by a non-egg system (ie, having `py-sqlite` from MacPorts) and not installing it. This goes into a deeper frustration I've had in the past: I installed MySQL on my desktop (Mac OS X) using a disk image / .pkg installer downloaded from MySQL's web site. Then I think I tried installing a python package from MacPorts (maybe just the mysql bindings?) that had a MySQL dependency. It didn't detect that I already had MySQL installed, and MacPorts then tried installing it on its own. At that point, I stopped using ports for just about anything Python related, aside from getting Python and Py-Readline. It was easier to use easy_install or regular distutils and the like. The dependencies were met, but not advertised in a way that was friendly to the packaging system in question. > It looks like, for example, ports have targets to build to other > systems, e.g., pkg, mpkg, dmg, rpm, srpm, dpkg. So maintaining > package information in (or compatible with) ports could make it easy > to feed packages into other package systems. > > * Benefit: We're working with other package systems, just making > it easier to get Python packages into them. > > * Drawback: They may not want to include all packages, at the > speed at which we want, or the way we want to. (I.e., there may > still be packages you'd want that are only available on PyPI.) Or packages you only want internally. Or packages you don't want available on PyPI because they're very specific to a large framework/toolkit like Zope 3. > * Drawback: Some systems don't have package systems. And some administrators don't use them beyond (maybe) initially setting up the system. I also don't know how well those package systems deal with concepts like local-installs. Not just local to a single user, but local to a single package. `zc.buildout` is good about this, almost to a fault. There is a rough balance there between desktop and personal machine global-install ease of use and being able to set up fine-tuned self-contained setups. Anyways, I'd vote pure-python. Even on the most barren of machines, it's relatively easy to build and install Python from source. Even on a fairly old installation, it's easy to build an install a new version of Python from source - probably far easier than wrestling with the package manager about updating its database and then updating package after package after package after package that one doesn't want. I think that Python should be all that you need in order to get other Python packages. `easy_install` pretty much gives us this today. There are improvements I'd love to see - reports of what I have installed, what's active, what's taking precedence in my environment, etc. Your tool may do this, I haven't had time to look yet. Ruby's 'gem' command does this beautifully. And I hardly ever touch Ruby or gems; it was just very easy to use for the few things I've wanted to try. -- Jeff Shell From barry at python.org Thu Aug 16 21:46:56 2007 From: barry at python.org (Barry Warsaw) Date: Thu, 16 Aug 2007 15:46:56 -0400 Subject: [Catalog-sig] [Distutils] Simpler Python package management: the "egg" command In-Reply-To: <5500FDAF-34F5-4075-9ED7-2C849CA1F266@exoweb.net> References: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net> <5500FDAF-34F5-4075-9ED7-2C849CA1F266@exoweb.net> Message-ID: <1D4917BC-BD94-44CE-BFEC-64F385B69561@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 14, 2007, at 7:15 PM, Bj?rn Stabell wrote: > If there's > general agreement that Python eggs is the future way of distributing > packages, why not call the command "egg", similar to the way many > other package managers are named after the packages, e.g., rpm, port, > gem? +1 > Next, where do you find eggs? This might not be a big issue if the > "egg" command is configured properly by default, but I'd offer my > thoughts. I know the cheeseshop just changed name back to PyPI > again. In my opinion, neither of the names are good in that they > don't help people remember; any Monty Python connection is lost on > the big masses, and PyPI is hard to spell, not very obvious, and a > confusing clash with the also-prominent PyPy project. Why not call > the place for eggs just eggs? I.e., http://eggs.python.org/ +1 -- nice! > THE COMMAND LINE PACKAGE MANAGEMENT TOOL > > The "egg" command should enable you to at least find, show info for, > install, and uninstall packages. I think the most common way to do > command line tools like this is to offer sub-commands, a la, bzr, > port, svn, apt-get, gem, so I suggest: > > egg - list out a help of commands > egg search - search for eggs (aliases: find/list) > egg info - show info for egg (aliases: show/details) > egg install - install named eggs > egg uninstall - uninstall eggs (aliases: remove/purge/delete) > > so you can do: > > egg search bittorrent > > to find all packages that have anything to do with bittorrent (full- > text search of the package index), and then: > > egg install iTorrent > > to actually download and install the package. Yes, yes, yes, +1. > Optional: These days XMLRPC (and the WS-Deathstar) seems to be losing > steam to REST, so I think we'd gain a lot of "hackability" by > enabling a REST interface for accessing packages. +1 > Eventually we probably need to enforce package signing. +1 > It'd be good for "egg" to support both system- and user-wide > configurations, and to support downloading from several package > indexes, like apt-get does. And it would be nice if Python could be adapted to provide for user- specific site-packages so that PYTHONPATH hackery isn't necessary. Bjorn, I wish I had time to help, but I like where you're going with this. I think it would greatly improve the utility of eggs. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRsSpsXEjvBPtnXfVAQIQnwP9FibXQYRMlhG9VScTbkr1lKB84k0+Awl8 NFIvl+h8ADkiItJsAmGYlCRO/dAUgE9imKoPD4Z35LbVvz9y6oiTRU6KYJwFossk ytIYBLQTf+727NQD4860+1Q23O1mFwf612/M4W4niO6H7GDCVZnxbSFJZIoaYNcH VUBp4F8WAy0= =AYko -----END PGP SIGNATURE----- From bjorn at exoweb.net Fri Aug 17 02:18:46 2007 From: bjorn at exoweb.net (=?ISO-8859-1?Q?Bj=F8rn_Stabell?=) Date: Fri, 17 Aug 2007 08:18:46 +0800 Subject: [Catalog-sig] [Distutils] Simpler Python package management: the "egg" command In-Reply-To: <7555ca2e0708161012j25b957d4q1f1a04c73715b7ec@mail.gmail.com> References: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net> <5500FDAF-34F5-4075-9ED7-2C849CA1F266@exoweb.net> <7555ca2e0708161012j25b957d4q1f1a04c73715b7ec@mail.gmail.com> Message-ID: <7C93A25C-B7B9-471C-AFF3-437D551DF282@exoweb.net> On Aug 17, 2007, at 01:12, Lu?s Bruno wrote: > Bj?rn Stabell: [...] >> egg info pysqlite - show information about >> pysqlite >> egg info pysqlite/2.0.0 - show information about version >> 2.0.0 of pysqlite >> egg sync local - rescan local packages and >> update cache db > > Sorry, but I think you meant apt-get instead of egg. No, I didn't > search the archives. But making an apt-get repository (yum, emerge...) > can't be *that* hard; it also can't be an uncommon idea. Someone must > have suggested it before. The "egg" prototype already does the above commands. > On second thought, if I recall correctly Debian-style repositories > have to update a master Packages catalog for *each* and *every* > *single* upload. That's a -1. I think you've asked for a "sync local" > master and I snipped it. Any other -1? > > We'll, getting the repositores updated for each single upload becomes > O(N), but it's a small N anyway. One per supported repository format. The "egg sync" stuff was to get the latest package information from PyPI and from your locally installed packages so that you can do fast and offline queries against it. If you don't sync, you'll have to rescan every time; sync'ing is just an optimization, and since it gets put it in a little database, it makes making queries etc much easier as another benefit. [...] >> Perhaps "egg" should provide a simple web server to allow browsing >> (and perhaps installation from) local packages. > > D*mn. Right now you just serve your .../site-packages and you can > easy_install from it (I think Phillip Eby said as much recently). I haven't seen that done, but since eggs in uninstalled and installed form are the same, it should be easy. Rgds, Bjorn From eu at lbruno.org Fri Aug 17 11:51:45 2007 From: eu at lbruno.org (Luis Bruno) Date: Fri, 17 Aug 2007 10:51:45 +0100 Subject: [Catalog-sig] [Distutils] PyPI - Evolve our own or reuse existing package systems? In-Reply-To: <88d0d31b0708161120u54fc1d38ib50366d2125acc7c@mail.gmail.com> References: <200708150037.57652.paul@boddie.org.uk> <212C47AD-A14B-4D81-B6DE-3AEF13846D95@exoweb.net> <865235D6-2674-4B78-ABE8-4DFC5919576E@exoweb.net> <88d0d31b0708161120u54fc1d38ib50366d2125acc7c@mail.gmail.com> Message-ID: <7555ca2e0708170251s62e5fc84me5de6589a13001c2@mail.gmail.com> Hello there, Jeff Shell wrote: > This goes into a deeper frustration I've had in the past: I installed > MySQL on my desktop (Mac OS X) using a disk image / .pkg installer > downloaded from MySQL's web site. Then I think I tried installing a > python package from MacPorts (maybe just the mysql bindings?) that had > a MySQL dependency. It didn't detect that I already had MySQL > installed, and MacPorts then tried installing it on its own. IIRC there are variants in some MacPorts that removed dependencies from their setup.py-like file and used the system ones. It can also be dealt with by creating phantom-packages which provide the virtual "name" mysql-client, for example (which is how it's been done in apt-get repositories, etc, etc.). > Even on the most barren of machines, it's relatively > easy to build and install Python from source. I can agree with that. From eu at lbruno.org Fri Aug 17 11:59:26 2007 From: eu at lbruno.org (Luis Bruno) Date: Fri, 17 Aug 2007 10:59:26 +0100 Subject: [Catalog-sig] [Distutils] Simpler Python package management: the "egg" command In-Reply-To: <7C93A25C-B7B9-471C-AFF3-437D551DF282@exoweb.net> References: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net> <5500FDAF-34F5-4075-9ED7-2C849CA1F266@exoweb.net> <7555ca2e0708161012j25b957d4q1f1a04c73715b7ec@mail.gmail.com> <7C93A25C-B7B9-471C-AFF3-437D551DF282@exoweb.net> Message-ID: <7555ca2e0708170259v1f946099nea35e00412d7fdbb@mail.gmail.com> Hello again, I'm trying to ASCII-fy so that I don't send another base64 blob. Bjorn Stabell wrote: > Luis Bruno wrote: > > Bjorn Stabell wrote: > > > egg info pysqlite > > > egg info pysqlite/2.0.0 > > > egg sync local > > > > Sorry, but I think you meant apt-get instead of egg. No, I didn't > > search the archives. But making an apt-get repository (yum, emerge...) > > can't be *that* hard; it also can't be an uncommon idea. Someone must > > have suggested it before. > > The "egg" prototype already does the above commands. I didn't have a look around your code. My whole post could be summarized as: you're reinventing the apt-get paraphernalia. I'd prefer to drop a python.list into /etc/apt/sources.list.d/ with: deb http://eggs.python.org/apt And use the rest of the tools I already have. The really *big* -1 this has is that I'm basically gonna be using --single-version-externally-managed eggs (which makes it impossible to have multiple "inactive" versions and require() them, if I understood Phillip Eby correctly). > The "egg sync" stuff was to get the latest package information from > PyPI and from your locally installed packages so that you can do fast > and offline queries against it. If you don't sync, you'll have to > rescan every time; sync'ing is just an optimization, and since it > gets put it in a little database, it makes making queries etc much > easier as another benefit. I was thinking "sync local" re-gets the repository's Packages master-list. Then you read in the locally installed ones (which is a matter of traversing sys.path and looking for the .egg-info files; I think those are now (as of 2.5) expected to be there. All this has been done before; that's why I'm being so bone-headed about it. > > D*mn. Right now you just serve your .../site-packages and you can > > easy_install from it. > > I haven't seen that done, but since eggs in uninstalled and installed > form are the same, it should be easy. I think easy_install -f can work against an Apache directory index. I thought that was the whole point behind it, really. -- Luis "Bone-headed describes me so well" Bruno From arve.knudsen at gmail.com Fri Aug 17 12:47:21 2007 From: arve.knudsen at gmail.com (Arve Knudsen) Date: Fri, 17 Aug 2007 12:47:21 +0200 Subject: [Catalog-sig] [Distutils] PyPI - Evolve our own or reuse existing package systems? In-Reply-To: <865235D6-2674-4B78-ABE8-4DFC5919576E@exoweb.net> References: <200708150037.57652.paul@boddie.org.uk> <212C47AD-A14B-4D81-B6DE-3AEF13846D95@exoweb.net> <865235D6-2674-4B78-ABE8-4DFC5919576E@exoweb.net> Message-ID: Very glad to hear you're interested in my system Bj?rn. On 8/16/07, Bj?rn Stabell wrote: > > On Aug 15, 2007, at 21:34, Arve Knudsen wrote: > > These are some interesting points you are making. I have in fact > > been developing a general software deployment system, Conduit, in > > Python for some time, that is capable of supporting several major > > platforms (at the moment: Linux, Windows and OS X). It's not > > reached any widespread use, but we (at the Simula Research > > Laboratory) are using it to distribute software to students > > attending some courses at the university of Oslo. Right now we are > > in the middle of preparing for the semester start, which is next week. > > > > The system is designed to be general, both with regard to the > > target platform and the deployable software. I've solved the > > distribution problem by constructing an XML-RPC Web service, that > > serves information about software projects in RDF (based on the > > DOAP format). This distribution service is general and independent > > of the installation system, which acts as a client of the latter. > > > > If this sounds interesting to you I'd love if you checked it out > > and gave me some feedback. It is an experimental project, and as > > such we are definitely interested in ideas/help from others. > > Hi Arve, > > That's an interesting coincidence! :) > > Without turning it into a big research project, it would be > interesting to hear what you (honestly) thought were the strengths > and weaknesses of Conduit compared to, say, deb/rpm/ports/emerge, > whichever you have experience with. I did download and look at > Conduit, but haven't tried it yet. I would say the main difference lies in how Conduit is designed to be a completely general solution for distributing software and deploying it on user's systems, with as loose coupling as possible. You could say that what I am trying to achieve is closer to MacroVision's Install Anywhere / Flexnet Connect than to monolithic package managers such as APT, Emerge etc. The former offers a complete solution to independent providers for letting them deliver software and maintain it (with updates) over time, while the latter is a tightly integrated service which is even used to implement operating systems (e.g. Debian, Gentoo). Conduit tries to offer the best of both worlds by building a central software portal from independent project representations. The idea is that software providers maintain their own profile within the portal service, and associate with this a number of projects which are described in RDF (an extension of the DOAP vocabulary). The portal service accumulates these data, and expose them to installation agents via a public XML-RPC API. I've written a framework for Conduit agents that currently supports installing on Linux, Windows (XP/Vista) and OS X. I find it a great strength to be able to offer a common installation system for all three platforms, but the weakness is that generally it doesn't integrate as well with the operating systems as native installers do. On Windows at least I plan to piggy-back on the native installation service (Windows Installer), to achieve better integration without having to reinvent the wheel. On Linux it is worse since there is no well-defined native installation service, but instead a bunch of different packaging systems which overlap with my own deployment model (specification of dependencies etc.). There are so many ways to take this, and so many "strategic" > decisions that I'd hope people on the list could help out with. > > Personally I think it would be great if we had a strong Python-based > central package system, perhaps based on Conduit. I'm pretty sure > Conduit would have to have the client and server-side components even > more clearly separated, though, and the interface between them open > and clearly defined (which I think it is, but it would have to be > discussed). The client and server components should be clearly separated as-is, but the server API should definitely be reviewed and properly defined. Conduit-specific support exists on the server as extensions (namespace "conduit") of the RDF vocabulary. I see Conduit (and PyPI) supports DOAP, and looking around I also > found http://python.codezoo.com/ by O'Reilly; it also seems to have a > few good ideas, for example voting and some quality control (although > that's a very difficult decision, I guess). CodeZoo is a very interesting initiative. I let myself inspire in part by CodeZoo when I started designing Conduit, but mostly by SWED ( http://swed.org.uk) which has a similar model of accumulating decentralized information in RDF for centralized access (via a Web interface). I would actually like Conduit's distribution service to evolve into something with similar functionality to CodeZoo. A rich Web interface for navigating the catalog of software would be awesome (an alternative to sourceforge?). I've also pondered the possibility of user profiles in the portal so that one can keep preferences centrally, for instance as a way to define personal "installation sets" (e.g., after installing a new Linux, restore your previous installations). Arve -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/catalog-sig/attachments/20070817/69e3cfbd/attachment.htm From bwinton at latte.ca Fri Aug 17 15:07:20 2007 From: bwinton at latte.ca (Blake Winton) Date: Fri, 17 Aug 2007 09:07:20 -0400 Subject: [Catalog-sig] [Distutils] Simpler Python package management: the "egg" command In-Reply-To: <7555ca2e0708170259v1f946099nea35e00412d7fdbb@mail.gmail.com> References: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net> <5500FDAF-34F5-4075-9ED7-2C849CA1F266@exoweb.net> <7555ca2e0708161012j25b957d4q1f1a04c73715b7ec@mail.gmail.com> <7C93A25C-B7B9-471C-AFF3-437D551DF282@exoweb.net> <7555ca2e0708170259v1f946099nea35e00412d7fdbb@mail.gmail.com> Message-ID: <46C59D88.1040303@latte.ca> Luis Bruno wrote: >>>> egg info pysqlite >>>> egg info pysqlite/2.0.0 >>>> egg sync local >>> Sorry, but I think you meant apt-get instead of egg. >> The "egg" prototype already does the above commands. > I'd prefer to drop a python.list into /etc/apt/sources.list.d/ with: > deb http://eggs.python.org/apt > And use the rest of the tools I already have. Me too! Uh, except the closest thing I have to /etc/apt/sources.list.d/ is C:\Program Files\Apt\sources.list.d\... And I don't have any tools that will work with it. Fortunately, "egg info pysqlite" and "egg sync local" should work out just fine for me. :) Later, Blake. From me at lbruno.org Mon Aug 20 15:58:05 2007 From: me at lbruno.org (Luis Bruno) Date: Mon, 20 Aug 2007 14:58:05 +0100 Subject: [Catalog-sig] [Distutils] Simpler Python package management: the "egg" command In-Reply-To: <46C59D88.1040303@latte.ca> References: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net> <5500FDAF-34F5-4075-9ED7-2C849CA1F266@exoweb.net> <7555ca2e0708161012j25b957d4q1f1a04c73715b7ec@mail.gmail.com> <7C93A25C-B7B9-471C-AFF3-437D551DF282@exoweb.net> <7555ca2e0708170259v1f946099nea35e00412d7fdbb@mail.gmail.com> <46C59D88.1040303@latte.ca> Message-ID: <7555ca2e0708200658q7584d9c3vd0cf89305c131ab5@mail.gmail.com> Blake Winton wrote: > Me too! Uh, except the closest thing I have to /etc/apt/sources.list.d/ > is C:\Program Files\Apt\sources.list.d\... And I don't have any tools > that will work with it. Good point; I had forgotten there isn't a Windows fetch-X-from-repositories. The closest thing that comes to mind is the whole Policy Object enchilada which I never really understood. -- Luis "and I gotta +1 your marvelous use of irony" Bruno From martin at v.loewis.de Mon Aug 20 18:26:19 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 20 Aug 2007 18:26:19 +0200 Subject: [Catalog-sig] [Distutils] Simpler Python package management: the "egg" command In-Reply-To: <7555ca2e0708200658q7584d9c3vd0cf89305c131ab5@mail.gmail.com> References: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net> <5500FDAF-34F5-4075-9ED7-2C849CA1F266@exoweb.net> <7555ca2e0708161012j25b957d4q1f1a04c73715b7ec@mail.gmail.com> <7C93A25C-B7B9-471C-AFF3-437D551DF282@exoweb.net> <7555ca2e0708170259v1f946099nea35e00412d7fdbb@mail.gmail.com> <46C59D88.1040303@latte.ca> <7555ca2e0708200658q7584d9c3vd0cf89305c131ab5@mail.gmail.com> Message-ID: <46C9C0AB.8070204@v.loewis.de> > Good point; I had forgotten there isn't a Windows > fetch-X-from-repositories. The closest thing that comes to mind is the > whole Policy Object enchilada which I never really understood. That works actually very well, but requires that computers are domain members. Then, you can deploy selected software on a selected subset of the machines in the domain, or for a selected subset of the users. Regards, Martin From pje at telecommunity.com Mon Aug 20 20:01:02 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 20 Aug 2007 14:01:02 -0400 Subject: [Catalog-sig] [Distutils] Simpler Python package management: the "egg" command In-Reply-To: <7555ca2e0708170259v1f946099nea35e00412d7fdbb@mail.gmail.co m> References: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net> <5500FDAF-34F5-4075-9ED7-2C849CA1F266@exoweb.net> <7555ca2e0708161012j25b957d4q1f1a04c73715b7ec@mail.gmail.com> <7C93A25C-B7B9-471C-AFF3-437D551DF282@exoweb.net> <7555ca2e0708170259v1f946099nea35e00412d7fdbb@mail.gmail.com> Message-ID: <20070820175840.63A683A408D@sparrow.telecommunity.com> At 10:59 AM 8/17/2007 +0100, Luis Bruno wrote: >The really *big* -1 this has is that I'm basically gonna be using >--single-version-externally-managed eggs (which makes it impossible to >have multiple "inactive" versions and require() them, if I understood >Phillip Eby correctly). You can have inactive versions and require() them, they just have to be .egg files or directories. You can have a "default" version that's installed --single-version, e.g. by a system package manager such as RPM. >I was thinking "sync local" re-gets the repository's Packages >master-list. Then you read in the locally installed ones (which is a >matter of traversing sys.path and looking for the .egg-info files; I >think those are now (as of 2.5) expected to be there. Please, please, *please* use the published APIs in pkg_resources for this. Too many people are writing tools that inspect egg files and directories directly -- and get it only partly right, making assumptions about the formats that aren't valid across platforms, Python versions, etc., etc. In general, if you are doing absolutely *anything* with on-disk formats of eggs, and you didn't read enough of the docs to find the equivalent APIs, it's a near-certainty that you don't understand the format well enough to write your own versions. Meanwhile, pkg_resources is proposed for inclusion in the Python 2.6 stdlib, so it's not like it's going to be hard to get a hold of. In this particular example, by the way, if you want to find all locally installed packages, you probably want to be using an Environment instance, which indexes all installed packages by package name, and gives you objects you can inspect in a variety of ways, including using .get_metadata('PKG-INFO') calls to read the .egg-info files -- or .egg-info/PKG-INFO, or EGG-INFO/PKG-INFO, or whatever file is actually involved. (This is why you need to use the API -- there are a lot of devils in the details.) >I think easy_install -f can work against an Apache directory >index. Yes. >I thought that was the whole point behind it, really. One of them, anyway. There are other aspects besides -f that work for directory indexes, such as PyPI "home page" and "download" URL links. From jim at zope.com Mon Aug 27 17:32:29 2007 From: jim at zope.com (Jim Fulton) Date: Mon, 27 Aug 2007 11:32:29 -0400 Subject: [Catalog-sig] PyPI slowdowns Message-ID: I've been mirroring PyPI with a cron job that runs once a minute. It uses a lock file and fails when the file is locked and I get an email when this happens. From this, I can tell when PyPI is having problems because usually the cron job runs in a few seconds. In the interest of giving people running PyPI data on problem periods, PyPI struggled on several occasions over the past few days: Aug 25, 16:01-16:07 Aug 26, 9:00- 9:21 Aug 26, 14:56-15:00 Aug 27, 03:56-04:06 All of these times are UTC. I haven't otherwise noticed problems like this for quite a while. Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jim at zope.com Tue Aug 28 00:59:53 2007 From: jim at zope.com (Jim Fulton) Date: Mon, 27 Aug 2007 18:59:53 -0400 Subject: [Catalog-sig] simple package index has links back into the human interface Message-ID: <07CF16C8-B9EC-472F-90F9-C53B56C4936D@zope.com> A while ago, I created an experimental PyPI mirror: http://download.zope.org/ppix/ Recently, I've been working on a mirror of the new simple index: http://download.zope.org/simple/ This mirrors the pages at: http://cheeshop.python.org/simple/ In experimenting with this, I found that buildouts were taking much longer (e.g. 70 second vs 40 seconds) using the simpler mirror than using the ppix mirror. I added some additional logging and found that when using the simple index, buildout was getting a lot of non- simple pages. A common practice is to use the package index page for a project as the project home page. There's no point in a simple page including a link to the non-simple page as it contains the same or less information. I filter these pages out in the ppix index. The simple index doesn't. For example, the simple page for zc.buildout: http://cheeseshop.python.org/simple/zc.buildout has home page links to http://www.python.org/pypi/zc.buildout. Martin, can you filter links like this out of the simple output? (If not, I'll filter them out when I mirror.) Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From benji at benjiyork.com Tue Aug 28 02:59:52 2007 From: benji at benjiyork.com (Benji York) Date: Mon, 27 Aug 2007 20:59:52 -0400 Subject: [Catalog-sig] simple package index has links back into the human interface In-Reply-To: <07CF16C8-B9EC-472F-90F9-C53B56C4936D@zope.com> References: <07CF16C8-B9EC-472F-90F9-C53B56C4936D@zope.com> Message-ID: <46D37388.3040502@benjiyork.com> Jim Fulton wrote: > Martin, can you filter links like this out of the simple output? (If > not, I'll filter them out when I mirror.) If PyPI's simple version makes this change, it would mean that the simple variant would be (approximately) as fast as your ppix version, right? If so, it sounds like a very nice addition (or, rather, subtraction). -- Benji York http://benjiyork.com From jim at zope.com Tue Aug 28 13:54:59 2007 From: jim at zope.com (Jim Fulton) Date: Tue, 28 Aug 2007 07:54:59 -0400 Subject: [Catalog-sig] simple package index has links back into the human interface In-Reply-To: <46D37388.3040502@benjiyork.com> References: <07CF16C8-B9EC-472F-90F9-C53B56C4936D@zope.com> <46D37388.3040502@benjiyork.com> Message-ID: On Aug 27, 2007, at 8:59 PM, Benji York wrote: > Jim Fulton wrote: >> Martin, can you filter links like this out of the simple output? >> (If not, I'll filter them out when I mirror.) > > If PyPI's simple version makes this change, it would mean that the > simple variant would be (approximately) as fast as your ppix > version, right? It will get much closer. Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From martin at v.loewis.de Thu Aug 30 13:34:35 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Aug 2007 13:34:35 +0200 Subject: [Catalog-sig] simple package index has links back into the human interface In-Reply-To: <07CF16C8-B9EC-472F-90F9-C53B56C4936D@zope.com> References: <07CF16C8-B9EC-472F-90F9-C53B56C4936D@zope.com> Message-ID: <46D6AB4B.4010201@v.loewis.de> > Martin, can you filter links like this out of the simple output? I had already filtered out cheeseshop.python.org/pypi and pypi.python.org/pypi, and now also filter www.python.org/pypi. So these should be gone now. Please let me know if there are further problems. Regards, Martin From jim at zope.com Thu Aug 30 13:47:59 2007 From: jim at zope.com (Jim Fulton) Date: Thu, 30 Aug 2007 07:47:59 -0400 Subject: [Catalog-sig] simple package index has links back into the human interface In-Reply-To: <46D6AB4B.4010201@v.loewis.de> References: <07CF16C8-B9EC-472F-90F9-C53B56C4936D@zope.com> <46D6AB4B.4010201@v.loewis.de> Message-ID: Much thanks! Jim On Aug 30, 2007, at 7:34 AM, Martin v. L?wis wrote: >> Martin, can you filter links like this out of the simple output? > > I had already filtered out cheeseshop.python.org/pypi and > pypi.python.org/pypi, and now also filter www.python.org/pypi. > > So these should be gone now. Please let me know if there are > further problems. > > Regards, > Martin -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From pje at telecommunity.com Thu Aug 30 17:47:33 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 30 Aug 2007 11:47:33 -0400 Subject: [Catalog-sig] simple package index has links back into the human interface In-Reply-To: <46D6AB4B.4010201@v.loewis.de> References: <07CF16C8-B9EC-472F-90F9-C53B56C4936D@zope.com> <46D6AB4B.4010201@v.loewis.de> Message-ID: <20070830154507.8EC023A40A5@sparrow.telecommunity.com> At 01:34 PM 8/30/2007 +0200, Martin v. L?wis wrote: > > Martin, can you filter links like this out of the simple output? > >I had already filtered out cheeseshop.python.org/pypi and >pypi.python.org/pypi, and now also filter www.python.org/pypi. > >So these should be gone now. Please let me know if there are >further problems. Martin, how safe would it be for me to make the next version of setuptools begin using the "simple" index? I mean, is it an official API now? From martin at v.loewis.de Thu Aug 30 21:36:52 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Aug 2007 21:36:52 +0200 Subject: [Catalog-sig] simple package index has links back into the human interface In-Reply-To: <20070830154507.8EC023A40A5@sparrow.telecommunity.com> References: <07CF16C8-B9EC-472F-90F9-C53B56C4936D@zope.com> <46D6AB4B.4010201@v.loewis.de> <20070830154507.8EC023A40A5@sparrow.telecommunity.com> Message-ID: <46D71C54.4040205@v.loewis.de> > Martin, how safe would it be for me to make the next version of > setuptools begin using the "simple" index? I mean, is it an official > API now? It's an official API, and I'd encourage using it. Of course, it may have bugs, but I'll try to get them fixed when I find the time to do so. Regards, Martin From martin at v.loewis.de Fri Aug 31 14:25:15 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 31 Aug 2007 14:25:15 +0200 Subject: [Catalog-sig] PyPI slowdowns In-Reply-To: References: Message-ID: <46D808AB.1030200@v.loewis.de> > In the interest of giving people running PyPI data on problem > periods, PyPI struggled on several occasions over the past few days: > > Aug 25, 16:01-16:07 > Aug 26, 9:00- 9:21 > Aug 26, 14:56-15:00 > Aug 27, 03:56-04:06 > > All of these times are UTC. > > I haven't otherwise noticed problems like this for quite a while. Thanks. I couldn't quite match all these incidences with the log files, but apparently, what happens is this: - some application gets overloaded (probably the Wiki, but I'm not certain), for some reason - FastCGI finds that the application does not respond quickly enough, and kills it - it does that a number of times, and then decides to back-off restarting - as a consequence, all Apache process start blocking for that application. This can be seen at http://ximinez.python.org/munin/localdomain/localhost.localdomain-apache_processes.html when there are 256 Apache processes. - as a consequence, the entire web server is unaccessible, as the MaxClients limit is exhausted. I don't know how detect this problem before it happens. I have added response-time measuring to MoinMoin; if a response takes more than 10s, it will refuse all requests with a QUERY_STRING, for 120s. As the expensive MoinMoin requests are those with query parameters, I hope that this will cause fast processing of any backlog that may have been built up. Regards, Martin From jim at zope.com Fri Aug 31 16:08:42 2007 From: jim at zope.com (Jim Fulton) Date: Fri, 31 Aug 2007 10:08:42 -0400 Subject: [Catalog-sig] PyPI slowdowns In-Reply-To: <46D808AB.1030200@v.loewis.de> References: <46D808AB.1030200@v.loewis.de> Message-ID: On Aug 31, 2007, at 8:25 AM, Martin v. L?wis wrote: ... Thanks for looking into this. Would you like me to keep sending this data to you? Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From martin at v.loewis.de Fri Aug 31 16:12:43 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 31 Aug 2007 16:12:43 +0200 Subject: [Catalog-sig] PyPI slowdowns In-Reply-To: References: <46D808AB.1030200@v.loewis.de> Message-ID: <46D821DB.20808@v.loewis.de> > Would you like me to keep sending this data to you? How easy would it be to extract a "pypi watcher" out of that, which sends an email for an outage >2min? I'd run that myself somewhere; I then get a chance possibly to look into the problem while it occurs, rather than post-mortem. Regards, Martin From jim at zope.com Fri Aug 31 16:20:32 2007 From: jim at zope.com (Jim Fulton) Date: Fri, 31 Aug 2007 10:20:32 -0400 Subject: [Catalog-sig] PyPI slowdowns In-Reply-To: <46D821DB.20808@v.loewis.de> References: <46D808AB.1030200@v.loewis.de> <46D821DB.20808@v.loewis.de> Message-ID: On Aug 31, 2007, at 10:12 AM, Martin v. L?wis wrote: >> Would you like me to keep sending this data to you? > > How easy would it be to extract a "pypi watcher" out of that, > which sends an email for an outage >2min? I'd run that myself > somewhere; I then get a chance possibly to look into the problem > while it occurs, rather than post-mortem. I could probably do that. Or I can just add your address to my existing cron definition. That would be easiest. Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From paul at boddie.org.uk Fri Aug 31 20:36:08 2007 From: paul at boddie.org.uk (Paul Boddie) Date: Fri, 31 Aug 2007 20:36:08 +0200 Subject: [Catalog-sig] PyPI slowdowns In-Reply-To: <46D808AB.1030200@v.loewis.de> References: <46D808AB.1030200@v.loewis.de> Message-ID: <200708312036.08982.paul@boddie.org.uk> On Friday 31 August 2007 14:25:15 Martin v. L?wis wrote: > > I don't know how detect this problem before it happens. I have > added response-time measuring to MoinMoin; if a response takes more > than 10s, it will refuse all requests with a QUERY_STRING, for > 120s. As the expensive MoinMoin requests are those with query > parameters, I hope that this will cause fast processing of any > backlog that may have been built up. I've received various errors from the Wiki recently, most commonly one which seems to involve a FastCGI timeout, but where the edited page does get saved. Another seems to involve checking permissions to see if the requester can save pages, where the software seems to get held up communicating with an XML-RPC service on moinmoin.de, possibly for anti-spam blacklist purposes. Perhaps some of this extravagance can be turned off, especially for people who are registered users with elevated privileges. Paul P.S. If there's a place to discuss the Wiki then please point me right to it. PyPI works well enough for me, but then I don't have software relying on it serving certain pages on a 24x7 basis. From martin at v.loewis.de Fri Aug 31 22:54:52 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 31 Aug 2007 22:54:52 +0200 Subject: [Catalog-sig] PyPI slowdowns In-Reply-To: <200708312036.08982.paul@boddie.org.uk> References: <46D808AB.1030200@v.loewis.de> <200708312036.08982.paul@boddie.org.uk> Message-ID: <46D8801C.5050207@v.loewis.de> > Another seems to involve checking permissions to see if the requester can > save pages, where the software seems to get held up communicating with an > XML-RPC service on moinmoin.de, possibly for anti-spam blacklist purposes. Ah, that's a clue. Do you know where I could find more about that? What files should I look at that may have configuration to that effect? Regards, Martin From martin at v.loewis.de Fri Aug 31 23:21:09 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 31 Aug 2007 23:21:09 +0200 Subject: [Catalog-sig] PyPI slowdowns In-Reply-To: <46D8801C.5050207@v.loewis.de> References: <46D808AB.1030200@v.loewis.de> <200708312036.08982.paul@boddie.org.uk> <46D8801C.5050207@v.loewis.de> Message-ID: <46D88645.4090403@v.loewis.de> > Ah, that's a clue. Do you know where I could find more about that? What > files should I look at that may have configuration to that effect? I found it. It fetches BadContent once every hour from moinmaster.wikiwikiweb.de:8000. I changed it to do that once every 12h. If urgent action is necessary, you can still edit LocalBadContent (as you do, anyway). I've bumped several timeout values - although I do wonder why some moin requests take 20s to complete. Regards, Martin