From gentoodev at gmail.com Thu Jun 14 22:35:44 2007 From: gentoodev at gmail.com (Rob Cakebread) Date: Thu, 14 Jun 2007 13:35:44 -0700 Subject: [Catalog-sig] Cheese Shop Meltdown Message-ID: <9b06ffb10706141335h547cdb74x3d476fbcedd04708@mail.gmail.com> Until the problems with "dumb spiders"[1] are worked around, couldn't something simple be done such as using a spider trap[2]? Add this to robots.txt: User-agent: * Disallow: /pypi?:action=bot_ban Add a hidden link on the front page, then grab the ip of the offending spider and block it by adding it to .htacess Just a thought, as PyPI has been painfully slow for hours at a time lately. [1] http://mail.python.org/pipermail/catalog-sig/2007-April/001055.html [2] http://danielwebb.us/software/bot-trap/ From martin at v.loewis.de Sat Jun 23 08:51:13 2007 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 23 Jun 2007 08:51:13 +0200 Subject: [Catalog-sig] Cheeseshop performance improved Message-ID: <467CC2E1.3010708@v.loewis.de> I now took some measures to improve the cheeseshop performance. If you are still experiencing that the cheeseshop is unresponsive, please report exact date, the operation you tried to perform, and the time it took for that operation to complete (or the time it took for a timeout to occur). Regards, Martin From jodok at lovelysystems.com Mon Jun 18 23:40:05 2007 From: jodok at lovelysystems.com (Jodok Batlogg) Date: Mon, 18 Jun 2007 23:40:05 +0200 Subject: [Catalog-sig] improving pypi Message-ID: hi, cheeseshop seems to attract more and more traffic. during the last weeks it was pretty slow and now and then not reachable at all. we're using zc.buildout to deploy our egg-based applications in production and really rely on cheeseshop (and download.zope.org). the current situation is really bad and should be improved: here's what i propose: a) change the pypi software so it bakes static pages that can be served/cached easily. b) start an effort to have some mirrors worldwide. lovely systems is willing to contribute money for a) and hardware / bandwidth for b) are there volunteers to work on it? are there other companies to support it? until we solved the issues we'd like to mirror http:// cheeseshop.python.org/packages/source/ (no dynamically generated pages) to one of our local servers and start working with pypicache (http://trac.wiretooth.com/public/wiki/pypicache). probably we can even start setting up rsync between trusted servers? thanks jodok -- "Now is better than never." -- The Zen of Python, by Tim Peters Jodok Batlogg, Lovely Systems Schmelzh?tterstra?e 26a, 6850 Dornbirn, Austria phone: +43 5572 908060, fax: +43 5572 908060-77 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2454 bytes Desc: not available Url : http://mail.python.org/pipermail/catalog-sig/attachments/20070618/c5d3f61f/attachment.bin From jim at zope.com Mon Jun 25 15:40:49 2007 From: jim at zope.com (Jim Fulton) Date: Mon, 25 Jun 2007 09:40:49 -0400 Subject: [Catalog-sig] Cheeseshop performance improved In-Reply-To: <467CC2E1.3010708@v.loewis.de> References: <467CC2E1.3010708@v.loewis.de> Message-ID: Could you tell us what you did? Jim On Jun 23, 2007, at 2:51 AM, Martin v. L?wis wrote: > I now took some measures to improve the cheeseshop performance. > > If you are still experiencing that the cheeseshop is unresponsive, > please report exact date, the operation you tried to perform, > and the time it took for that operation to complete (or the time > it took for a timeout to occur). > > Regards, > Martin > _______________________________________________ > Catalog-sig mailing list > Catalog-sig at python.org > http://mail.python.org/mailman/listinfo/catalog-sig -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jim at zope.com Mon Jun 25 16:16:42 2007 From: jim at zope.com (Jim Fulton) Date: Mon, 25 Jun 2007 10:16:42 -0400 Subject: [Catalog-sig] Cheeseshop performance improved In-Reply-To: References: <467CC2E1.3010708@v.loewis.de> Message-ID: It's very slow atm. Jim On Jun 25, 2007, at 9:40 AM, Jim Fulton wrote: > Could you tell us what you did? > > Jim > > On Jun 23, 2007, at 2:51 AM, Martin v. L?wis wrote: > >> I now took some measures to improve the cheeseshop performance. >> >> If you are still experiencing that the cheeseshop is unresponsive, >> please report exact date, the operation you tried to perform, >> and the time it took for that operation to complete (or the time >> it took for a timeout to occur). >> >> Regards, >> Martin >> _______________________________________________ >> Catalog-sig mailing list >> Catalog-sig at python.org >> http://mail.python.org/mailman/listinfo/catalog-sig > > -- > Jim Fulton mailto:jim at zope.com Python Powered! > CTO (540) 361-1714 http://www.python.org > Zope Corporation http://www.zope.com http://www.zope.org > > > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From martin at v.loewis.de Mon Jun 25 22:03:28 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 25 Jun 2007 22:03:28 +0200 Subject: [Catalog-sig] Cheeseshop performance improved In-Reply-To: References: <467CC2E1.3010708@v.loewis.de> Message-ID: <46801F90.50807@v.loewis.de> Jim Fulton schrieb: > Could you tell us what you did? Sure: I noticed that whenever the cheeseshop became unresponsive, nearly all swap space was consumed, and the system load was about 40 (I never saw values above 50, probably because of Apache's MaxClients setting). So I installed a daemon that watches the system load, and restarts Apache when the load goes above 25. Whenever I did that manually, all swap space would be released immediately, and the system load went down to below 10. In the recent past, this caused a restart at these points in time (all times CEST = GMT+2): Jun 24, 17:17 Jun 25, 00:15 Jun 25, 04:43 Jun 25, 05:00 Jun 25, 05:08 Jun 25, 10:15 Jun 25, 15:30 Jun 25, 16:36 Jun 25, 18:00 Jun 25, 21:38 Jun 25, 21:47 Regards, Martin From martin at v.loewis.de Mon Jun 25 22:04:44 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 25 Jun 2007 22:04:44 +0200 Subject: [Catalog-sig] Cheeseshop performance improved In-Reply-To: References: <467CC2E1.3010708@v.loewis.de> Message-ID: <46801FDC.4060502@v.loewis.de> Jim Fulton schrieb: > It's very slow atm. Can you be more precise? What does that mean? What operation are you performing, and how did you find out that it is very slow? Regards, Martin From jim at zope.com Mon Jun 25 22:32:35 2007 From: jim at zope.com (Jim Fulton) Date: Mon, 25 Jun 2007 16:32:35 -0400 Subject: [Catalog-sig] Cheeseshop performance improved In-Reply-To: <46801FDC.4060502@v.loewis.de> References: <467CC2E1.3010708@v.loewis.de> <46801FDC.4060502@v.loewis.de> Message-ID: <65F50ECE-9555-4F7E-B450-4ECD19E18795@zope.com> On Jun 25, 2007, at 4:04 PM, Martin v. L?wis wrote: > Jim Fulton schrieb: >> It's very slow atm. > > Can you be more precise? What does that mean? What operation > are you performing, and how did you find out that it is > very slow? Well, I noticed that buildouts, which use setuptools, are running slow. Then I tried accessing the pypi home page, which was very slow. Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From constant.beta at gmail.com Mon Jun 25 22:34:47 2007 From: constant.beta at gmail.com (=?ISO-8859-2?Q?Micha=B3_Kwiatkowski?=) Date: Mon, 25 Jun 2007 22:34:47 +0200 Subject: [Catalog-sig] Cheeseshop performance improved In-Reply-To: <46801F90.50807@v.loewis.de> References: <467CC2E1.3010708@v.loewis.de> <46801F90.50807@v.loewis.de> Message-ID: <5e8b0f6b0706251334y2bbc12cn7b0d4e5c98c22d1a@mail.gmail.com> On 6/25/07, "Martin v. L?wis" wrote: > Jim Fulton schrieb: > > Could you tell us what you did? > > Sure: I noticed that whenever the cheeseshop became > unresponsive, nearly all swap space was consumed, and the > system load was about 40 (I never saw values above 50, > probably because of Apache's MaxClients setting). > > So I installed a daemon that watches the system load, > and restarts Apache when the load goes above 25. Whenever > I did that manually, all swap space would be released > immediately, and the system load went down to below 10. In what way this is a *performance* improvement? It looks like a workaround. Do you have any ideas on the cause of this swap-consuming behaviour? Cheers, mk From martin at v.loewis.de Mon Jun 25 22:48:16 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 25 Jun 2007 22:48:16 +0200 Subject: [Catalog-sig] Cheeseshop performance improved In-Reply-To: <65F50ECE-9555-4F7E-B450-4ECD19E18795@zope.com> References: <467CC2E1.3010708@v.loewis.de> <46801FDC.4060502@v.loewis.de> <65F50ECE-9555-4F7E-B450-4ECD19E18795@zope.com> Message-ID: <46802A10.8080205@v.loewis.de> >> Can you be more precise? What does that mean? What operation >> are you performing, and how did you find out that it is >> very slow? > > Well, I noticed that buildouts, which use setuptools, are running slow. > Then I tried accessing the pypi home page, which was very slow. Can you quantify that (milliseconds, seconds, minutes, hours)? In any case, that was apparently shortly before a restart, so apparently, this was already at a point when peformance was degrading. Regards, Martin From martin at v.loewis.de Mon Jun 25 22:52:14 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 25 Jun 2007 22:52:14 +0200 Subject: [Catalog-sig] Cheeseshop performance improved In-Reply-To: <5e8b0f6b0706251334y2bbc12cn7b0d4e5c98c22d1a@mail.gmail.com> References: <467CC2E1.3010708@v.loewis.de> <46801F90.50807@v.loewis.de> <5e8b0f6b0706251334y2bbc12cn7b0d4e5c98c22d1a@mail.gmail.com> Message-ID: <46802AFE.1040605@v.loewis.de> > In what way this is a *performance* improvement? It looks like a > workaround. Do you have any ideas on the cause of this swap-consuming > behaviour? It improves the responsiveness of the system, and thus improves performance. I agree it is a work-around - but even a work-around is (or can be) an improvement. I'm still uncertain what causes the memory consumption, in particular as the problems occur at times of the day when I'm asleep (at American daylight), so it's difficult for me to investigate. In any case, I created http://cheeseshop.python.org/gc which allows to inspect a Python memory tally for a specific web server (reload to inspect a different server process). Regards, Martin From jim at zope.com Mon Jun 25 23:34:15 2007 From: jim at zope.com (Jim Fulton) Date: Mon, 25 Jun 2007 17:34:15 -0400 Subject: [Catalog-sig] Cheeseshop performance improved In-Reply-To: <46802A10.8080205@v.loewis.de> References: <467CC2E1.3010708@v.loewis.de> <46801FDC.4060502@v.loewis.de> <65F50ECE-9555-4F7E-B450-4ECD19E18795@zope.com> <46802A10.8080205@v.loewis.de> Message-ID: On Jun 25, 2007, at 4:48 PM, Martin v. L?wis wrote: >>> Can you be more precise? What does that mean? What operation >>> are you performing, and how did you find out that it is >>> very slow? >> >> Well, I noticed that buildouts, which use setuptools, are running >> slow. >> Then I tried accessing the pypi home page, which was very slow. > > Can you quantify that (milliseconds, seconds, minutes, hours)? In this case, I'd say around 30 seconds per request. next time I'll time it more precisely. > In any case, that was apparently shortly before a restart, so > apparently, this was already at a point when peformance was > degrading. I appreciate your efforts. I suspect that this will help a little . I'm certain that baking will help a lot. We just need to find someone with enough time to implement it. Can you tell if the memory leak is coming from PyPI or from the Wiki? I think that another short term thing to try would be to put PyPI on its own machine to protect it from Wiki load. I don't know if this will help, but it might be worth trying. Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From lac at openend.se Mon Jun 25 23:44:07 2007 From: lac at openend.se (Laura Creighton) Date: Mon, 25 Jun 2007 23:44:07 +0200 Subject: [Catalog-sig] Cheeseshop performance improved In-Reply-To: Message from Jim Fulton of "Mon, 25 Jun 2007 17:34:15 EDT." References: <467CC2E1.3010708@v.loewis.de> <46801FDC.4060502@v.loewis.de> <65F50ECE-9555-4F7E-B450-4ECD19E18795@zope.com> <46802A10.8080205@v.loewis.de> Message-ID: <200706252144.l5PLi7cs032424@theraft.openend.se> In a message of Mon, 25 Jun 2007 17:34:15 EDT, Jim Fulton writes: > >On Jun 25, 2007, at 4:48 PM, Martin v. L?wis wrote: > >>>> Can you be more precise? What does that mean? What operation >>>> are you performing, and how did you find out that it is >>>> very slow? >>> >>> Well, I noticed that buildouts, which use setuptools, are running >>> slow. >>> Then I tried accessing the pypi home page, which was very slow. >> >> Can you quantify that (milliseconds, seconds, minutes, hours)? > >In this case, I'd say around 30 seconds per request. next time I'll >time it more precisely. > >> In any case, that was apparently shortly before a restart, so >> apparently, this was already at a point when peformance was >> degrading. > >I appreciate your efforts. I suspect that this will help a little . > >I'm certain that baking will help a lot. We just need to find someone >with enough time to implement it. > >Can you tell if the memory leak is coming from PyPI or from the >Wiki? I think that another short term thing to try would be to put >PyPI on its own machine to protect it from Wiki load. I don't know if >this will help, but it might be worth trying. > >Jim > >-- >Jim Fulton mailto:jim at zope.com Python Pow >ered! >CTO (540) 361-1714 http://www >.python.org >Zope Corporation http://www.zope.com http://www.zope.or >g Jodok Batlogg of Lovely Systems has already proposed donating a machine for this. What would it take to set him up in business and get the urls to point to his machine and the like? Laura From martin at v.loewis.de Tue Jun 26 00:01:53 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 26 Jun 2007 00:01:53 +0200 Subject: [Catalog-sig] Cheeseshop performance improved In-Reply-To: References: <467CC2E1.3010708@v.loewis.de> <46801FDC.4060502@v.loewis.de> <65F50ECE-9555-4F7E-B450-4ECD19E18795@zope.com> <46802A10.8080205@v.loewis.de> Message-ID: <46803B51.6030000@v.loewis.de> > Can you tell if the memory leak is coming from PyPI or from the Wiki? No. When it becomes slow, take a look and report http://cheeseshop.python.org/gc Do this a couple of times, verifying that you talk to different processes (close your browser/use a command line tool if your browser insists on keeping the connection open). I can't study it, as I'm not awake at the times of the day when it becomes loaded. > I > think that another short term thing to try would be to put PyPI on its > own machine to protect it from Wiki load. I don't know if this will > help, but it might be worth trying. Unfortunately, that's not a short-term solution. It takes a lot of effort (one day) to migrate the installation, and there are no volunteers that have that much free time available. I, myself, am booked until February 2008, at which point I might be able to perform the migration of the installation to a new machine. I dislike the wording "protect from the Wiki load", though. I'm not so certain who needs protection from whom, here; I personally consider the Wiki of equal importance for the Python community as the Cheeseshop (I personally use it more than the Cheeseshop). Regards, Martin From martin at v.loewis.de Tue Jun 26 00:05:36 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 26 Jun 2007 00:05:36 +0200 Subject: [Catalog-sig] Cheeseshop performance improved In-Reply-To: <200706252144.l5PLi7cs032424@theraft.openend.se> References: <467CC2E1.3010708@v.loewis.de> <46801FDC.4060502@v.loewis.de> <65F50ECE-9555-4F7E-B450-4ECD19E18795@zope.com> <46802A10.8080205@v.loewis.de> <200706252144.l5PLi7cs032424@theraft.openend.se> Message-ID: <46803C30.9070901@v.loewis.de> > Jodok Batlogg of Lovely Systems has already proposed donating a machine > for this. What would it take to set him up in business and get the > urls to point to his machine and the like? A volunteer to implement that change. That volunteer would have to cooperate with various people for specific tasks that he doesn't have permission to do, but that could be arranged. That volunteer should be aware that he is likely in charge of the new installation after that switch ("Den letzten bei?en die Hunde" - "the weakest go to the wall"?) Regards, Martin From renesd at gmail.com Tue Jun 26 02:03:29 2007 From: renesd at gmail.com (=?ISO-8859-1?Q?Ren=E9_Dudfield?=) Date: Tue, 26 Jun 2007 10:03:29 +1000 Subject: [Catalog-sig] Caching and expires for the cheese shop. Message-ID: <64ddb72c0706251703l42118fbbmcfede59653d05596@mail.gmail.com> Hello, here's some examples of caching, and expires config you can add to your apache instance to improve performance. First I talk about caching stuff, then at the bottom there is some expires config for apache. Cheers, Caching. ============ You'll need to tweak settings quite a bit probably to get optimal performance. I volunteered to get this working before, but I was never sent the cheese shop apache config so I could test it. It's a very simple change, which can give good performance increases. CacheRoot "/var/tmp/proxy2/cheeseshop" CacheEnable disk / CacheSize 4000000 # CacheMinFileSize setting this so that 403 forbidden pages are not cached. CacheMinFileSize 400 CacheDirLevels 5 CacheDirLength 3 #CacheGcInterval 4 CacheMaxExpire 24 CacheLastModifiedFactor 0.1 CacheDefaultExpire 1 #CacheForceCompletion 100 You may need to add some last modified headers to the cheese shop output. Since it doesn't appear that is being done yet. It appears updates are only happening 10 times a day. So caching of the cheese shop should give a big increase. If someone wants to be my hands on the server for an hour or so, I could do any tweaking necessary. I'll look at making a patch to cheese shop to put in last modified headers. feedparser.py has pretty good last modified handling if someone wants to look there as an example. If someone could help me with the cheeseshop code, that'd be great. But if not, I'll dig into and do a complete patch. Here is the pseudo code for adding Last-Modified handling to cheese shop. def get_when_changed_header_from_http_client(request): if (request['If-Modified-Since']) // Split the If-Modified-Since (Netscape < v6 gets this wrong) modified_since = request['If-Modified-Since'].split(";") // Turn the client request If-Modified-Since into a timestamp modified_since = feedparser._parse_date(modified_since[0]) else: // Set modified since to 0 modified_since = 0 def get_date_for_most_recently_changed_part_of_page(): # would probably do a database look up to see when this page last changed. # would need select statements for each type of page. # eg, main page, single project page, category page etc. modified_since = get_when_changed_header_from_http_client(request) when_changed = get_date_for_most_recently_changed_part_of_page() if (when_changed <= modified_since): return header('HTTP/1.1 304 Not Modified'); how_long_pages_can_be_wrong_for = "10 minutes" set_expires_header(now() + how_long_pages_can_be_wrong_for) set_last_modified_header(when_changed) output_normal_page_stuff() return Expires ======== # Setting up the expires stuff should be ok, if changes to images, javascript, and css are not frequent. If they do change, then references to the external scripts should change too. # eg, add variables like style.css?r=801 that way browsers have to download new ones. # # Setting the expires stuff can make it so that web browsers don't even attempt to download stuff they already have. ExpiresActive On ExpiresByType image/gif A604800 ExpiresByType image/png A604800 ExpiresByType image/jpeg A604800 #ExpiresByType text/* A86400 ExpiresByType text/css A604800 ExpiresByType text/javascript A604800 ExpiresByType application/x-javascript A604800 ExpiresByType application/x-shockwave-flash A604800 From jim at zope.com Tue Jun 26 02:41:56 2007 From: jim at zope.com (Jim Fulton) Date: Mon, 25 Jun 2007 20:41:56 -0400 Subject: [Catalog-sig] Cheeseshop performance improved In-Reply-To: <46803B51.6030000@v.loewis.de> References: <467CC2E1.3010708@v.loewis.de> <46801FDC.4060502@v.loewis.de> <65F50ECE-9555-4F7E-B450-4ECD19E18795@zope.com> <46802A10.8080205@v.loewis.de> <46803B51.6030000@v.loewis.de> Message-ID: On Jun 25, 2007, at 6:01 PM, Martin v. L?wis wrote: > >> I >> think that another short term thing to try would be to put PyPI on >> its >> own machine to protect it from Wiki load. I don't know if this will >> help, but it might be worth trying. > > Unfortunately, that's not a short-term solution. Or at least not a quick solution. > It takes a lot of > effort (one day) to migrate the installation, and there are no > volunteers that have that much free time available. I, myself, > am booked until February 2008, at which point I might be able > to perform the migration of the installation to a new machine. Yeah, I was a bit afraid of that. I think that about 2/3 of the effort of implementing baking is getting enough knowledge to create a development/testing environment. What you're talking about is comerable. > I dislike the wording "protect from the Wiki load", though. > I'm not so certain who needs protection from whom, here; > I personally consider the Wiki of equal importance for the > Python community as the Cheeseshop (I personally use it more > than the Cheeseshop). Of course, I didn't mean to cast dispersions on the wiki. They're probably both important enough to have their own machines. Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From martin at v.loewis.de Tue Jun 26 06:14:43 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 26 Jun 2007 06:14:43 +0200 Subject: [Catalog-sig] Caching and expires for the cheese shop. In-Reply-To: <64ddb72c0706251703l42118fbbmcfede59653d05596@mail.gmail.com> References: <64ddb72c0706251703l42118fbbmcfede59653d05596@mail.gmail.com> Message-ID: <468092B3.8050305@v.loewis.de> > I volunteered to get this working before, but I was never sent the > cheese shop apache config so I could test it. That's easy enough resolved; I'll send it to you in private. Regards, Martin From renesd at gmail.com Tue Jun 26 06:26:20 2007 From: renesd at gmail.com (=?ISO-8859-1?Q?Ren=E9_Dudfield?=) Date: Tue, 26 Jun 2007 14:26:20 +1000 Subject: [Catalog-sig] Caching and expires for the cheese shop. In-Reply-To: <468092B3.8050305@v.loewis.de> References: <64ddb72c0706251703l42118fbbmcfede59653d05596@mail.gmail.com> <468092B3.8050305@v.loewis.de> Message-ID: <64ddb72c0706252126q2505c63dv763afca49cdd3bb0@mail.gmail.com> Thanks. On 6/26/07, "Martin v. L?wis" wrote: > > I volunteered to get this working before, but I was never sent the > > cheese shop apache config so I could test it. > > That's easy enough resolved; I'll send it to you in private. > > Regards, > Martin > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From jafo at tummy.com Tue Jun 26 12:52:01 2007 From: jafo at tummy.com (Sean Reifschneider) Date: Tue, 26 Jun 2007 04:52:01 -0600 Subject: [Catalog-sig] Cheeseshop performance improved In-Reply-To: <200706252144.l5PLi7cs032424@theraft.openend.se> References: <467CC2E1.3010708@v.loewis.de> <46801FDC.4060502@v.loewis.de> <65F50ECE-9555-4F7E-B450-4ECD19E18795@zope.com> <46802A10.8080205@v.loewis.de> <200706252144.l5PLi7cs032424@theraft.openend.se> Message-ID: <20070626105201.GA14025@tummy.com> On Mon, Jun 25, 2007 at 11:44:07PM +0200, Laura Creighton wrote: >Jodok Batlogg of Lovely Systems has already proposed donating a machine The problem is that nobody has the 8-ish hours of time required to migrate the services to a new machine. The quick fix would be to engage XS4ALL to upgrade the RAM in that box, leaving the box otherwise untouched. The system has only 1GB of RAM in it. It's got a 2.8GHz Xeon CPU in it, so I would expect it can take at least 4GB of RAM, if not 8 or 16GB. Thomas: If the PSF threw a grand or two at XS4ALL, could we get the memory in ximinez upgraded? Preferably to 4 or 8GB of RAM? >From what I've seen, the problem really boils down to 50 instances of Apache at 20MB each consumes all RAM. 20MB each for Apache processes is not unusual. Our web-site, which is fairly light-weight apache+mod_python uses around 20MB per Apache process resident, for example. The Apache processes consuming memory cause other problems. As memory is consumed, disc caching is reduced to nearly nothing, causing more data to have to be pulled from disc, causing fewer requests able to be processed per second. For example, right now I'm looking at the system and it's spending half it's time waiting for the disc. This is after a restart of Apache. I think it's a little premature to call 20 to 40MB Apache processes "leaking". As I said, our web site is running 20MB Apache processes and it's not doing that much. One of our clients runs a fairly high traffic web site that is fairly dynamic, using mod_perl, and their Apache processes stabilize around 30MB each. Their problem is that they want to handle a concurrency of 200 or more, which adds up to 6GB right there, not counting the database. Once system load starts going up above 1, it is probably just going to result in a death spiral because memory accesses are now taking milliseconds instead of tens of nanoseconds. I've taken the fairly drastic measure of pushing down the requests that Apache will handle before restarting from 1,000 to 10. Whether the problem is that something is leaking memory in the Apache processes, or that we just don't have enough RAM for the Apache processes to stabilize at whatever their natural limit is once they have everything they need loaded, this should help. We still have 60 concurrent clients allowed, so we're less vulnerable to running out of hooks due to slow clients or spiders with high concurrency. So far, things are looking good. I don't have much real-world time with it running this way, but I'm optimistic that this will help prevent the Apache restart process from being required much if any. However, I'd really like to see this box get 4 to 8GB of RAM. This is absolutely a place where the PSF can throw money at a problem to make it go away. The donation of another machine to host pypi is very generous, but it's not something that we really have the manpower to be able to take advantage of it seems. Anyway, the current config seems to be working well based on the baseline I took and some other changes I played around with. By now the baseline would have been consuming 20MB per Apache instance. Thomas: Can you contact someone at XS4ALL about getting us a quote for upgrading the RAM in ximinez to 4 or 8GB? This will require the least amount of time from the community. Sorry I haven't been able to look at this in more depth sooner. Busy busy. Sean -- "I was on IRC once and got mistaken for Dan Bernstein. I still have nightmares." -- Donnie Barnes Sean Reifschneider, Member of Technical Staff tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability From renesd at gmail.com Tue Jun 26 15:09:20 2007 From: renesd at gmail.com (=?ISO-8859-1?Q?Ren=E9_Dudfield?=) Date: Tue, 26 Jun 2007 23:09:20 +1000 Subject: [Catalog-sig] Caching and expires for the cheese shop. In-Reply-To: <468092B3.8050305@v.loewis.de> References: <64ddb72c0706251703l42118fbbmcfede59653d05596@mail.gmail.com> <468092B3.8050305@v.loewis.de> Message-ID: <64ddb72c0706260609j2011bee4wb321017890ebfb0f@mail.gmail.com> Hi, I've been trying to set up the cheeseshop for testing cache changes for pypi. I've been able to set up a test environment, so I should be able to work on the caching changes now. It seems the /pypi page, and the /pypi/ pages are different than what is shown on the real cheeseshop. Are there modifications on the live site which aren't checked into subversion? My local copy shows information about pypi itself. ie, the header for the page is "pypi 2005-08-01" Anyone have ideas about that? Cheers, On 6/26/07, "Martin v. L?wis" wrote: > > I volunteered to get this working before, but I was never sent the > > cheese shop apache config so I could test it. > > That's easy enough resolved; I'll send it to you in private. > > Regards, > Martin > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From martin at v.loewis.de Tue Jun 26 20:37:27 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 26 Jun 2007 20:37:27 +0200 Subject: [Catalog-sig] Caching and expires for the cheese shop. In-Reply-To: <64ddb72c0706260609j2011bee4wb321017890ebfb0f@mail.gmail.com> References: <64ddb72c0706251703l42118fbbmcfede59653d05596@mail.gmail.com> <468092B3.8050305@v.loewis.de> <64ddb72c0706260609j2011bee4wb321017890ebfb0f@mail.gmail.com> Message-ID: <46815CE7.2070406@v.loewis.de> > It seems the /pypi page, and the /pypi/ pages are different than what > is shown on the real cheeseshop. Are there modifications on the live > site which aren't checked into subversion? Nothing essential. The path to the config.ini is different. I'm not surprised you see something different, though - you don't have the database of registered packages, so *of course* you cannot see the identical same page. > My local copy shows information about pypi itself. ie, the header for > the page is "pypi 2005-08-01" > > Anyone have ideas about that? I'm not sure what you are referring to. What is the exact page that you get, and in what why do you think it is incorrect? Regards, Martin From jafo at tummy.com Tue Jun 26 22:36:55 2007 From: jafo at tummy.com (Sean Reifschneider) Date: Tue, 26 Jun 2007 14:36:55 -0600 Subject: [Catalog-sig] Cheeseshop performance improved In-Reply-To: <467CC2E1.3010708@v.loewis.de> References: <467CC2E1.3010708@v.loewis.de> Message-ID: <20070626203655.GA2745@tummy.com> On Sat, Jun 23, 2007 at 08:51:13AM +0200, "Martin v. L?wis" wrote: >I now took some measures to improve the cheeseshop performance. Just a follow-up, it looks like Apache hasn't restarted since I made the Apache changes around 9 hours ago. We aren't out of the woods yet, I'd say, but at the moment we have reasonable amounts of memory in cache, and haven't gone into thrashing. The processes seem to be hovering around 22MB. Load is higher than I'd like, mostly due to just straight disc I/O. More RAM would help that by increasing the cache (the easy way) or adding more disc spindles (the hard way). For our hosting service we've recently been deploying relatively inexpensive dual quad core 1.6GHz machines with 4 to 16GB of RAM and 8 drives in a RAID-10 array. These things have tons of performance and are relatively inexpensive, something in the $4k neighborhood (6 to 9 months ago a similar system would have easily been 5 to 10 times that much). I would like to propose that the PSF look at getting a box like this, with no immediate plans for use. Just have it sitting there in reserve for when someone actually finds the time to move some services there or deploy new services there. Perhaps we could get XS4ALL to give us pricing for hosting a box like this? Just to give you a ballpark idea, tummy is offering this class of machine for around $400/month, so I'd expect it to be something the PSF can afford no problem. I'd recommend XS4ALL to provide it, if they can. This is in addition to the RAM upgrade for ximinez. Sean -- "Fixing Unix is easier than living with NT." -- Jonathan Gilpin Sean Reifschneider, Member of Technical Staff tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability Back off man. I'm a scientist. http://HackingSociety.org/ From jodok at lovelysystems.com Tue Jun 26 14:10:17 2007 From: jodok at lovelysystems.com (Jodok Batlogg) Date: Tue, 26 Jun 2007 14:10:17 +0200 Subject: [Catalog-sig] Cheeseshop performance improved In-Reply-To: <20070626105201.GA14025@tummy.com> References: <467CC2E1.3010708@v.loewis.de> <46801FDC.4060502@v.loewis.de> <65F50ECE-9555-4F7E-B450-4ECD19E18795@zope.com> <46802A10.8080205@v.loewis.de> <200706252144.l5PLi7cs032424@theraft.openend.se> <20070626105201.GA14025@tummy.com> Message-ID: <5B34A6A5-1123-44CB-8690-45A1B1704C2F@lovelysystems.com> On 26.06.2007, at 12:52, Sean Reifschneider wrote: > On Mon, Jun 25, 2007 at 11:44:07PM +0200, Laura Creighton wrote: >> Jodok Batlogg of Lovely Systems has already proposed donating a >> machine > > The problem is that nobody has the 8-ish hours of time required to > migrate > the services to a new machine. > > The quick fix would be to engage XS4ALL to upgrade the RAM in that > box, > leaving the box otherwise untouched. The system has only 1GB of > RAM in it. > It's got a 2.8GHz Xeon CPU in it, so I would expect it can take at > least > 4GB of RAM, if not 8 or 16GB. > > Thomas: If the PSF threw a grand or two at XS4ALL, could we get the > memory > in ximinez upgraded? Preferably to 4 or 8GB of RAM? very good idea. memory is cheap. that should solve the issues without a lot of work. regarding the cheeseshop-software: it seems hard to find qualified persons that can improve the pypi software (for money). me and parts of my team will be at europython and have some time for sprinting on code improvements (baking static files,...) probably we can organize a cheeseshop sprint there? jodok >> From what I've seen, the problem really boils down to 50 instances of > Apache at 20MB each consumes all RAM. 20MB each for Apache > processes is > not unusual. Our web-site, which is fairly light-weight apache > +mod_python > uses around 20MB per Apache process resident, for example. > > The Apache processes consuming memory cause other problems. As > memory is > consumed, disc caching is reduced to nearly nothing, causing more > data to > have to be pulled from disc, causing fewer requests able to be > processed > per second. For example, right now I'm looking at the system and it's > spending half it's time waiting for the disc. This is after a > restart of > Apache. > > I think it's a little premature to call 20 to 40MB Apache processes > "leaking". As I said, our web site is running 20MB Apache > processes and > it's not doing that much. One of our clients runs a fairly high > traffic > web site that is fairly dynamic, using mod_perl, and their Apache > processes > stabilize around 30MB each. Their problem is that they want to > handle a > concurrency of 200 or more, which adds up to 6GB right there, not > counting > the database. > > Once system load starts going up above 1, it is probably just going to > result in a death spiral because memory accesses are now taking > milliseconds instead of tens of nanoseconds. > > I've taken the fairly drastic measure of pushing down the requests > that > Apache will handle before restarting from 1,000 to 10. Whether the > problem > is that something is leaking memory in the Apache processes, or > that we > just don't have enough RAM for the Apache processes to stabilize at > whatever their natural limit is once they have everything they need > loaded, > this should help. > > We still have 60 concurrent clients allowed, so we're less > vulnerable to > running out of hooks due to slow clients or spiders with high > concurrency. > > So far, things are looking good. I don't have much real-world time > with it > running this way, but I'm optimistic that this will help prevent > the Apache > restart process from being required much if any. > > However, I'd really like to see this box get 4 to 8GB of RAM. This is > absolutely a place where the PSF can throw money at a problem to > make it go > away. > > The donation of another machine to host pypi is very generous, but > it's not > something that we really have the manpower to be able to take > advantage of > it seems. > > Anyway, the current config seems to be working well based on the > baseline I > took and some other changes I played around with. By now the baseline > would have been consuming 20MB per Apache instance. > > Thomas: Can you contact someone at XS4ALL about getting us a quote for > upgrading the RAM in ximinez to 4 or 8GB? This will require the least > amount of time from the community. > > Sorry I haven't been able to look at this in more depth sooner. > Busy busy. > > Sean > -- > "I was on IRC once and got mistaken for Dan Bernstein. I still have > nightmares." -- Donnie Barnes > Sean Reifschneider, Member of Technical Staff > tummy.com, ltd. - Linux Consulting since 1995: Ask me about High > Availability > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig -- "Flat is better than nested." -- The Zen of Python, by Tim Peters Jodok Batlogg, Lovely Systems Schmelzh?tterstra?e 26a, 6850 Dornbirn, Austria phone: +43 5572 908060, fax: +43 5572 908060-77 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2454 bytes Desc: not available Url : http://mail.python.org/pipermail/catalog-sig/attachments/20070626/ece2a40a/attachment.bin From renesd at gmail.com Wed Jun 27 01:05:53 2007 From: renesd at gmail.com (=?ISO-8859-1?Q?Ren=E9_Dudfield?=) Date: Wed, 27 Jun 2007 09:05:53 +1000 Subject: [Catalog-sig] Caching and expires for the cheese shop. In-Reply-To: <46815CE7.2070406@v.loewis.de> References: <64ddb72c0706251703l42118fbbmcfede59653d05596@mail.gmail.com> <468092B3.8050305@v.loewis.de> <64ddb72c0706260609j2011bee4wb321017890ebfb0f@mail.gmail.com> <46815CE7.2070406@v.loewis.de> Message-ID: <64ddb72c0706261605y6890d09arf65f37193e7e2f5e@mail.gmail.com> It looks like possibly some rewrite rules are missing or something. The package pages don't appear to work. I got a db dump from Richard a while ago. Here's a screen shot of the front page. http://rene.f0o.com/~rene/stuff/screen_shot_2007_06_27_09_01_28.jpg It looks like the package pages aren't working. Maybe because a path is wrong somewhere, or rewrite rules are missing. eg. this url does not work: /pypi/4Suite/1.0b1 On 6/27/07, "Martin v. L?wis" wrote: > > It seems the /pypi page, and the /pypi/ pages are different than what > > is shown on the real cheeseshop. Are there modifications on the live > > site which aren't checked into subversion? > > Nothing essential. The path to the config.ini is different. > > I'm not surprised you see something different, though - you don't > have the database of registered packages, so *of course* you cannot > see the identical same page. > > > > My local copy shows information about pypi itself. ie, the header for > > the page is "pypi 2005-08-01" > > > > Anyone have ideas about that? > > I'm not sure what you are referring to. What is the exact page that > you get, and in what why do you think it is incorrect? > > Regards, > Martin > From martin at v.loewis.de Wed Jun 27 05:20:36 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 27 Jun 2007 05:20:36 +0200 Subject: [Catalog-sig] Cheeseshop performance improved In-Reply-To: <20070626203655.GA2745@tummy.com> References: <467CC2E1.3010708@v.loewis.de> <20070626203655.GA2745@tummy.com> Message-ID: <4681D784.3030407@v.loewis.de> > I would like to propose that the PSF look at getting a box like this, with > no immediate plans for use. Just have it sitting there in reserve for when > someone actually finds the time to move some services there or deploy new > services there. Perhaps we could get XS4ALL to give us pricing for hosting > a box like this? IIRC, the machine itself would not be that much of a problem - rack space is. We could return creosote to XS4ALL, and reuse that space. Regards, Martin From jafo at tummy.com Wed Jun 27 05:27:13 2007 From: jafo at tummy.com (Sean Reifschneider) Date: Tue, 26 Jun 2007 21:27:13 -0600 Subject: [Catalog-sig] Cheeseshop performance improved In-Reply-To: <4681D784.3030407@v.loewis.de> References: <467CC2E1.3010708@v.loewis.de> <20070626203655.GA2745@tummy.com> <4681D784.3030407@v.loewis.de> Message-ID: <20070627032713.GD13405@tummy.com> On Wed, Jun 27, 2007 at 05:20:36AM +0200, "Martin v. L?wis" wrote: >IIRC, the machine itself would not be that much of a problem - rack >space is. We could return creosote to XS4ALL, and reuse that space. Even if we purchased additional rack space? I don't know what the possibilities are, just throwing some things out. Is creosote completely idle, or would it require our time to get it decommissioned? Sean -- Gone Postal Sort: Iterate over elements, any element that is out of order you blow away. -- Evelyn, Kevin, and Sean, watching Monty Python and reading DDJ Sean Reifschneider, Member of Technical Staff tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability From martin at v.loewis.de Wed Jun 27 05:28:46 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 27 Jun 2007 05:28:46 +0200 Subject: [Catalog-sig] Caching and expires for the cheese shop. In-Reply-To: <64ddb72c0706261605y6890d09arf65f37193e7e2f5e@mail.gmail.com> References: <64ddb72c0706251703l42118fbbmcfede59653d05596@mail.gmail.com> <468092B3.8050305@v.loewis.de> <64ddb72c0706260609j2011bee4wb321017890ebfb0f@mail.gmail.com> <46815CE7.2070406@v.loewis.de> <64ddb72c0706261605y6890d09arf65f37193e7e2f5e@mail.gmail.com> Message-ID: <4681D96E.6060507@v.loewis.de> > It looks like the package pages aren't working. Maybe because a path > is wrong somewhere, or rewrite rules are missing. The path_info that mod_python provides must be incorrect. Apparently, for your URL, it is /pypi, when it should be an empty string. This causes pypi to render its own package registration, rather than the home page. Regards, Martin From renesd at gmail.com Wed Jun 27 05:32:50 2007 From: renesd at gmail.com (=?ISO-8859-1?Q?Ren=E9_Dudfield?=) Date: Wed, 27 Jun 2007 13:32:50 +1000 Subject: [Catalog-sig] Caching and expires for the cheese shop. In-Reply-To: <4681D96E.6060507@v.loewis.de> References: <64ddb72c0706251703l42118fbbmcfede59653d05596@mail.gmail.com> <468092B3.8050305@v.loewis.de> <64ddb72c0706260609j2011bee4wb321017890ebfb0f@mail.gmail.com> <46815CE7.2070406@v.loewis.de> <64ddb72c0706261605y6890d09arf65f37193e7e2f5e@mail.gmail.com> <4681D96E.6060507@v.loewis.de> Message-ID: <64ddb72c0706262032u611b6335ma4de867f08f473d0@mail.gmail.com> This extra apache config that you sent me is needed for that to work. I'll update the cheeseshop dev wiki a little with the missing bits when I've got it all working. Cheers, # Redirect RSS to a static file Alias /pypi/?:action=rss /pathto/pypi_rss.xml AddHandler cgi-script .cgi Options Indexes # Rewrite rules RewriteEngine on # Point to package directory RewriteRule /packages(/.*)?$ /data/packages$1 [last] RewriteRule /icons/(.*$) /usr/share/apache2/icons/$1 [last] RedirectMatch permanent ^/$ "http://cheeseshop.python.org/pypi" On 6/27/07, "Martin v. L?wis" wrote: > > It looks like the package pages aren't working. Maybe because a path > > is wrong somewhere, or rewrite rules are missing. > > The path_info that mod_python provides must be incorrect. Apparently, > for your URL, it is /pypi, when it should be an empty string. This > causes pypi to render its own package registration, rather than > the home page. > > Regards, > Martin > From renesd at gmail.com Wed Jun 27 05:43:59 2007 From: renesd at gmail.com (=?ISO-8859-1?Q?Ren=E9_Dudfield?=) Date: Wed, 27 Jun 2007 13:43:59 +1000 Subject: [Catalog-sig] Caching and expires for the cheese shop. In-Reply-To: <4681D96E.6060507@v.loewis.de> References: <64ddb72c0706251703l42118fbbmcfede59653d05596@mail.gmail.com> <468092B3.8050305@v.loewis.de> <64ddb72c0706260609j2011bee4wb321017890ebfb0f@mail.gmail.com> <46815CE7.2070406@v.loewis.de> <64ddb72c0706261605y6890d09arf65f37193e7e2f5e@mail.gmail.com> <4681D96E.6060507@v.loewis.de> Message-ID: <64ddb72c0706262043w7d51fb35lb3b1b7b8cd9ec31a@mail.gmail.com> Thanks. The rewrites didn't help for this. But getting rid of the extra /pypi in the path did. Cheers, On 6/27/07, "Martin v. L?wis" wrote: > > It looks like the package pages aren't working. Maybe because a path > > is wrong somewhere, or rewrite rules are missing. > > The path_info that mod_python provides must be incorrect. Apparently, > for your URL, it is /pypi, when it should be an empty string. This > causes pypi to render its own package registration, rather than > the home page. > > Regards, > Martin > From martin at v.loewis.de Wed Jun 27 05:46:17 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 27 Jun 2007 05:46:17 +0200 Subject: [Catalog-sig] Cheeseshop performance improved In-Reply-To: <20070627032713.GD13405@tummy.com> References: <467CC2E1.3010708@v.loewis.de> <20070626203655.GA2745@tummy.com> <4681D784.3030407@v.loewis.de> <20070627032713.GD13405@tummy.com> Message-ID: <4681DD89.8040603@v.loewis.de> > Is creosote completely idle, or would it require our time to get it > decommissioned? To my knowledge, it does not run any services anymore, and I would hope that it does not contain any data anymore. It might take time to agree on that status - or you could help running a backup once before decommissioning it (hoping that this would be little effort) :-) Regards, Martin From jafo at tummy.com Wed Jun 27 06:08:50 2007 From: jafo at tummy.com (Sean Reifschneider) Date: Tue, 26 Jun 2007 22:08:50 -0600 Subject: [Catalog-sig] Cheeseshop performance improved In-Reply-To: <4681DD89.8040603@v.loewis.de> References: <467CC2E1.3010708@v.loewis.de> <20070626203655.GA2745@tummy.com> <4681D784.3030407@v.loewis.de> <20070627032713.GD13405@tummy.com> <4681DD89.8040603@v.loewis.de> Message-ID: <20070627040850.GE13405@tummy.com> On Wed, Jun 27, 2007 at 05:46:17AM +0200, "Martin v. L?wis" wrote: >on that status - or you could help running a backup once before >decommissioning it (hoping that this would be little effort) :-) I've started a copy. According to my math it'll take around 3 days to complete. For some reason it's not running particularly fast, 100KB/sec or so. I'll let you know when it's done, probably shutting down creosote when it's done. Thanks, Sean -- Nothing is illegal if one hundred businessmen decide to do it. -- Andrew Young Sean Reifschneider, Member of Technical Staff tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability From jafo at tummy.com Fri Jun 29 04:23:21 2007 From: jafo at tummy.com (Sean Reifschneider) Date: Thu, 28 Jun 2007 20:23:21 -0600 Subject: [Catalog-sig] creosote shut down (was: Re: Cheeseshop performance improved) In-Reply-To: <4681DD89.8040603@v.loewis.de> References: <467CC2E1.3010708@v.loewis.de> <20070626203655.GA2745@tummy.com> <4681D784.3030407@v.loewis.de> <20070627032713.GD13405@tummy.com> <4681DD89.8040603@v.loewis.de> Message-ID: <20070629022321.GA24128@tummy.com> On Wed, Jun 27, 2007 at 05:46:17AM +0200, "Martin v. L?wis" wrote: >> Is creosote completely idle, or would it require our time to get it >> decommissioned? > >To my knowledge, it does not run any services anymore, and I would hope >that it does not contain any data anymore. It might take time to agree >on that status - or you could help running a backup once before >decommissioning it (hoping that this would be little effort) :-) Creosote has been halted after it's backup to our server has been completed. If you need any files off it, either restart it or contact tummy.com and we can get files back for you. Thanks, Sean -- Good idea: Slaves Girls of Gor Bad idea: Slave Girls of Al Gore. Sean Reifschneider, Member of Technical Staff tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability