From donald.stufft at gmail.com Tue Dec 18 14:14:22 2012 From: donald.stufft at gmail.com (Donald Stufft) Date: Tue, 18 Dec 2012 08:14:22 -0500 Subject: [Catalog-sig] PyPI changelog API giving wrong results Message-ID: <257321C7C867418DB95550C0080BDCE0@gmail.com> PyPI is claiming that events are happening 5 hours prior to when they actually occured. See https://gist.github.com/4327695 The gist above illustrates grabbing the "current" time from PyPI via /daytime, submitting a package, and then going backwards in time 10 seconds at a time until we find when PyPI claims this package was registered. It ended up being 5 hours earlier than the "current" time grabbed from PyPI. (On a side note, the current time on /daytime is off by 15 minutes or so). -------------- next part -------------- An HTML attachment was scrubbed... URL: From holger.krekel at gmail.com Tue Dec 18 15:54:50 2012 From: holger.krekel at gmail.com (Holger Krekel) Date: Tue, 18 Dec 2012 15:54:50 +0100 Subject: [Catalog-sig] disabling the serving of links from description_html? Message-ID: Hi Richard, hi all, While reading the pypi main and other sources i wondered how we could switch off serving links from description_html, at least on a per-project basis. It's really annoying that when you start to add some links to a long_description that installation of your package will thus slow down around the world. Even if you remove the links from the next release. How could we arrange for a maintainer to communicate to the pypi-server that a particular project should not ever serve links from description_html (and maybe not even from the homepage while we are at it)? Preferably it should be something that can be done from existing setup.py files, like adding a special trove-classifier or keyword. But a little custom tool or a web page form would be ok as well. If maintainers could easily switch off these extra links, then this means less stress for the pypi server and a global considerable speedup of installing python packages as often most of the pip/easy_install time is spent with checking out these URLs. best, holger -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald.stufft at gmail.com Tue Dec 18 16:56:50 2012 From: donald.stufft at gmail.com (Donald Stufft) Date: Tue, 18 Dec 2012 10:56:50 -0500 Subject: [Catalog-sig] disabling the serving of links from description_html? In-Reply-To: References: Message-ID: On Tuesday, December 18, 2012 at 9:54 AM, Holger Krekel wrote: > Hi Richard, hi all, > > While reading the pypi main and other sources i wondered how we could switch off serving links from description_html, at least on a per-project basis. It's really annoying that when you start to add some links to a long_description that installation of your package will thus slow down around the world. Even if you remove the links from the next release. > > How could we arrange for a maintainer to communicate to the pypi-server that a particular project should not ever serve links from description_html (and maybe not even from the homepage while we are at it)? > > Preferably it should be something that can be done from existing setup.py files, like adding a special trove-classifier or keyword. But a little custom tool or a web page form would be ok as well. > > If maintainers could easily switch off these extra links, then this means less stress for the pypi server and a global considerable speedup of installing python packages as often most of the pip/easy_install time is spent with checking out these URLs. > > best, > holger +1 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Tue Dec 18 17:46:52 2012 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 18 Dec 2012 17:46:52 +0100 Subject: [Catalog-sig] disabling the serving of links from description_html? In-Reply-To: References: Message-ID: <50D09DFC.7020007@egenix.com> On 18.12.2012 15:54, Holger Krekel wrote: > Hi Richard, hi all, > > While reading the pypi main and other sources i wondered how we could > switch off serving links from description_html, at least on a per-project > basis. It's really annoying that when you start to add some links to a > long_description that installation of your package will thus slow down > around the world. Even if you remove the links from the next release. > > How could we arrange for a maintainer to communicate to the pypi-server > that a particular project should not ever serve links from description_html > (and maybe not even from the homepage while we are at it)? > > Preferably it should be something that can be done from existing setup.py > files, like adding a special trove-classifier or keyword. But a little > custom tool or a web page form would be ok as well. > > If maintainers could easily switch off these extra links, then this means > less stress for the pypi server and a global considerable speedup of > installing python packages as often most of the pip/easy_install time is > spent with checking out these URLs. Are you sure about about this ? AFAIK, setuptools/distribute only looks at links with rel="homepage" or rel="download" attributes, not all links on the PyPI project page. The links from the description don't receive such attributes. See e.g. http://pypi.python.org/simple/pytest/ -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 18 2012) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2012-12-14: Released mxODBC.Connect 2.0.2 ... http://egenix.com/go38 2012-12-05: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go37 2013-01-22: Python Meeting Duesseldorf ... 35 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From holger.krekel at gmail.com Tue Dec 18 18:54:29 2012 From: holger.krekel at gmail.com (Holger Krekel) Date: Tue, 18 Dec 2012 18:54:29 +0100 Subject: [Catalog-sig] disabling the serving of links from description_html? In-Reply-To: <50D09DFC.7020007@egenix.com> References: <50D09DFC.7020007@egenix.com> Message-ID: On Tue, Dec 18, 2012 at 5:46 PM, M.-A. Lemburg wrote: > On 18.12.2012 15:54, Holger Krekel wrote: > > Hi Richard, hi all, > > > > While reading the pypi main and other sources i wondered how we could > > switch off serving links from description_html, at least on a per-project > > basis. It's really annoying that when you start to add some links to a > > long_description that installation of your package will thus slow down > > around the world. Even if you remove the links from the next release. > > > > How could we arrange for a maintainer to communicate to the pypi-server > > that a particular project should not ever serve links from > description_html > > (and maybe not even from the homepage while we are at it)? > > > > Preferably it should be something that can be done from existing setup.py > > files, like adding a special trove-classifier or keyword. But a little > > custom tool or a web page form would be ok as well. > > > > If maintainers could easily switch off these extra links, then this means > > less stress for the pypi server and a global considerable speedup of > > installing python packages as often most of the pip/easy_install time is > > spent with checking out these URLs. > > Are you sure about about this ? > > AFAIK, setuptools/distribute only looks at links with rel="homepage" > or rel="download" attributes, not all links on the PyPI project page. > The links from the description don't receive such attributes. > > See e.g. http://pypi.python.org/simple/pytest/ > > You are right, Marc. Only the download and home page links (from all versions ever published) are considered from pip/easy_install. I just examined it more closely via urlsnarf. They were so many in some projects and mixed with the other links so i didn't see it clearly before (although i did notice the rel classification). So to avoid the overhead one could retroactively remove all download links and maybe also all homepage links except the one for the latest version or so. But that can be done without changes to pypi itself i guess. best & thanks for the clarification, holger > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Dec 18 2012) > >>> Python Projects, Consulting and Support ... http://www.egenix.com/ > >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > 2012-12-14: Released mxODBC.Connect 2.0.2 ... http://egenix.com/go38 > 2012-12-05: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go37 > 2013-01-22: Python Meeting Duesseldorf ... 35 days to go > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Tue Dec 18 19:36:14 2012 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 18 Dec 2012 19:36:14 +0100 Subject: [Catalog-sig] disabling the serving of links from description_html? In-Reply-To: References: <50D09DFC.7020007@egenix.com> Message-ID: <50D0B79E.9090208@egenix.com> On 18.12.2012 18:54, Holger Krekel wrote: > On Tue, Dec 18, 2012 at 5:46 PM, M.-A. Lemburg wrote: > >> On 18.12.2012 15:54, Holger Krekel wrote: >>> Hi Richard, hi all, >>> >>> While reading the pypi main and other sources i wondered how we could >>> switch off serving links from description_html, at least on a per-project >>> basis. It's really annoying that when you start to add some links to a >>> long_description that installation of your package will thus slow down >>> around the world. Even if you remove the links from the next release. >>> >>> How could we arrange for a maintainer to communicate to the pypi-server >>> that a particular project should not ever serve links from >> description_html >>> (and maybe not even from the homepage while we are at it)? >>> >>> Preferably it should be something that can be done from existing setup.py >>> files, like adding a special trove-classifier or keyword. But a little >>> custom tool or a web page form would be ok as well. >>> >>> If maintainers could easily switch off these extra links, then this means >>> less stress for the pypi server and a global considerable speedup of >>> installing python packages as often most of the pip/easy_install time is >>> spent with checking out these URLs. >> >> Are you sure about about this ? >> >> AFAIK, setuptools/distribute only looks at links with rel="homepage" >> or rel="download" attributes, not all links on the PyPI project page. >> The links from the description don't receive such attributes. >> >> See e.g. http://pypi.python.org/simple/pytest/ >> >> > You are right, Marc. Only the download and home page links (from all > versions ever published) are considered from pip/easy_install. I just > examined it more closely via urlsnarf. They were so many in some projects > and mixed with the other links so i didn't see it clearly before (although > i did notice the rel classification). > > So to avoid the overhead one could retroactively remove all download links > and maybe also all homepage links except the one for the latest version or > so. But that can be done without changes to pypi itself i guess. It may be useful to add rel="description" to the links from the descriptions. That way, a download tool could more easily detect the origin of the links. And perhaps rel="distribution_file" to links of the distribution files. Given that the simple index lists links for all releases, it may also be useful to add a new version="x.y.z" attribute to the links, so that a download tool can more easily determine which links belong to which release. (More correct would be to add the version to the rel attribute, but doing so would break setuptools, since it does s substring search rather than parse the HTML.) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 18 2012) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2012-12-14: Released mxODBC.Connect 2.0.2 ... http://egenix.com/go38 2012-12-05: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go37 2013-01-22: Python Meeting Duesseldorf ... 35 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From richard at python.org Wed Dec 19 03:31:46 2012 From: richard at python.org (Richard Jones) Date: Wed, 19 Dec 2012 13:31:46 +1100 Subject: [Catalog-sig] PyPI changelog API giving wrong results In-Reply-To: <257321C7C867418DB95550C0080BDCE0@gmail.com> References: <257321C7C867418DB95550C0080BDCE0@gmail.com> Message-ID: Erk. PyPI host has significant clock error. NTP wasn't installed so I just did. Richard On 19 December 2012 00:14, Donald Stufft wrote: > PyPI is claiming that events are happening 5 hours prior > to when they actually occured. > > See https://gist.github.com/4327695 > > The gist above illustrates grabbing the "current" time from > PyPI via /daytime, submitting a package, and then going > backwards in time 10 seconds at a time until we find when > PyPI claims this package was registered. It ended up > being 5 hours earlier than the "current" time grabbed > from PyPI. > > (On a side note, the current time on /daytime is off by 15 minutes > or so). > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig > From noah at coderanger.net Wed Dec 19 03:34:20 2012 From: noah at coderanger.net (Noah Kantrowitz) Date: Tue, 18 Dec 2012 18:34:20 -0800 Subject: [Catalog-sig] PyPI changelog API giving wrong results In-Reply-To: References: <257321C7C867418DB95550C0080BDCE0@gmail.com> Message-ID: <64695B7A-AFD2-4C57-86FE-6BAF3C91EAB7@coderanger.net> NTP was definitely installed, it is in the base chef role and syncs to the local mirror at OSU. What commands did you run? --Noah On Dec 18, 2012, at 6:31 PM, Richard Jones wrote: > Erk. PyPI host has significant clock error. NTP wasn't installed so I just did. > > > Richard > > On 19 December 2012 00:14, Donald Stufft wrote: >> PyPI is claiming that events are happening 5 hours prior >> to when they actually occured. >> >> See https://gist.github.com/4327695 >> >> The gist above illustrates grabbing the "current" time from >> PyPI via /daytime, submitting a package, and then going >> backwards in time 10 seconds at a time until we find when >> PyPI claims this package was registered. It ended up >> being 5 hours earlier than the "current" time grabbed >> from PyPI. >> >> (On a side note, the current time on /daytime is off by 15 minutes >> or so). >> >> _______________________________________________ >> Catalog-SIG mailing list >> Catalog-SIG at python.org >> http://mail.python.org/mailman/listinfo/catalog-sig >> > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig From donald.stufft at gmail.com Wed Dec 19 03:39:13 2012 From: donald.stufft at gmail.com (Donald Stufft) Date: Tue, 18 Dec 2012 21:39:13 -0500 Subject: [Catalog-sig] PyPI changelog API giving wrong results In-Reply-To: References: <257321C7C867418DB95550C0080BDCE0@gmail.com> Message-ID: <6980028BDB554BD7BDE220CE783A9877@gmail.com> On Tuesday, December 18, 2012 at 9:31 PM, Richard Jones wrote: > Erk. PyPI host has significant clock error. NTP wasn't installed so I just did. > > That sounds like it solves the ~15 minutes off issue, but the changelog being off the system time shouldn't have affected that. I wasn't comparing to my local time I was comparing to what PyPI was giving me via /daytime. So even if the clock was 5 hours off /daytime and the changelog should have agreed. > > > Richard > > On 19 December 2012 00:14, Donald Stufft wrote: > > PyPI is claiming that events are happening 5 hours prior > > to when they actually occured. > > > > See https://gist.github.com/4327695 > > > > The gist above illustrates grabbing the "current" time from > > PyPI via /daytime, submitting a package, and then going > > backwards in time 10 seconds at a time until we find when > > PyPI claims this package was registered. It ended up > > being 5 hours earlier than the "current" time grabbed > > from PyPI. > > > > (On a side note, the current time on /daytime is off by 15 minutes > > or so). > > > > _______________________________________________ > > Catalog-SIG mailing list > > Catalog-SIG at python.org (mailto:Catalog-SIG at python.org) > > http://mail.python.org/mailman/listinfo/catalog-sig > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From noah at coderanger.net Wed Dec 19 03:39:31 2012 From: noah at coderanger.net (Noah Kantrowitz) Date: Tue, 18 Dec 2012 18:39:31 -0800 Subject: [Catalog-sig] PyPI changelog API giving wrong results In-Reply-To: <64695B7A-AFD2-4C57-86FE-6BAF3C91EAB7@coderanger.net> References: <257321C7C867418DB95550C0080BDCE0@gmail.com> <64695B7A-AFD2-4C57-86FE-6BAF3C91EAB7@coderanger.net> Message-ID: Just looked through the peerstats, it got desync'd at some point and ntpd refused to long-step is back in to line. I'll throw a once a day hard reset into the base config :) 56280 8070.605 128.193.10.15 9024 1010.388545253 0.000672201 1.937515635 0.000157584 56280 8072.605 128.193.10.15 9024 1010.388462110 0.000721831 0.937521306 0.000199891 56280 9083.994 128.193.10.15 9044 -0.000038184 0.000713046 7.937500363 0.000000238 56280 9085.994 128.193.10.15 9044 -0.000196773 0.000617402 3.937508044 0.000158589 --Noah On Dec 18, 2012, at 6:34 PM, Noah Kantrowitz wrote: > NTP was definitely installed, it is in the base chef role and syncs to the local mirror at OSU. What commands did you run? > > --Noah > > On Dec 18, 2012, at 6:31 PM, Richard Jones wrote: > >> Erk. PyPI host has significant clock error. NTP wasn't installed so I just did. >> >> >> Richard >> >> On 19 December 2012 00:14, Donald Stufft wrote: >>> PyPI is claiming that events are happening 5 hours prior >>> to when they actually occured. >>> >>> See https://gist.github.com/4327695 >>> >>> The gist above illustrates grabbing the "current" time from >>> PyPI via /daytime, submitting a package, and then going >>> backwards in time 10 seconds at a time until we find when >>> PyPI claims this package was registered. It ended up >>> being 5 hours earlier than the "current" time grabbed >>> from PyPI. >>> >>> (On a side note, the current time on /daytime is off by 15 minutes >>> or so). >>> >>> _______________________________________________ >>> Catalog-SIG mailing list >>> Catalog-SIG at python.org >>> http://mail.python.org/mailman/listinfo/catalog-sig >>> >> _______________________________________________ >> Catalog-SIG mailing list >> Catalog-SIG at python.org >> http://mail.python.org/mailman/listinfo/catalog-sig > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org > http://mail.python.org/mailman/listinfo/catalog-sig From donald.stufft at gmail.com Wed Dec 19 03:40:31 2012 From: donald.stufft at gmail.com (Donald Stufft) Date: Tue, 18 Dec 2012 21:40:31 -0500 Subject: [Catalog-sig] PyPI changelog API giving wrong results In-Reply-To: <6980028BDB554BD7BDE220CE783A9877@gmail.com> References: <257321C7C867418DB95550C0080BDCE0@gmail.com> <6980028BDB554BD7BDE220CE783A9877@gmail.com> Message-ID: <074D37510A354A06B2114C56983669F8@gmail.com> > That sounds like it solves the ~15 minutes off issue, but the changelog being > off the system time shouldn't have affected that. I wasn't comparing to my > local time I was comparing to what PyPI was giving me via /daytime. So even > if the clock was 5 hours off /daytime and the changelog should have agreed. Note the changelog is 5 hours behind /daytime down to the second. It's likely a timezone issue somewhere but I couldn't track down where in the source it would be coming from. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Wed Dec 19 04:37:25 2012 From: pje at telecommunity.com (PJ Eby) Date: Tue, 18 Dec 2012 22:37:25 -0500 Subject: [Catalog-sig] disabling the serving of links from description_html? In-Reply-To: <50D09DFC.7020007@egenix.com> References: <50D09DFC.7020007@egenix.com> Message-ID: On Tue, Dec 18, 2012 at 11:46 AM, M.-A. Lemburg wrote: > AFAIK, setuptools/distribute only looks at links with rel="homepage" > or rel="download" attributes, not all links on the PyPI project page. > The links from the description don't receive such attributes. Those are the only links that are unconditionally followed, yes. But all links it sees are parsed to see if they appear to be a direct download link (e.g. .tgz, .zip, .egg, "#egg=" link, etc.). They're just not *followed* unless they appear to be a direct link to a desired version of something, or if it's marked as a homepage or download link. All other on-page links are ignored, whether they're part of the description or otherwise. (Any given link is also retrieved at most once per run of easy_install.) From jimmyislive at gmail.com Mon Dec 24 19:37:42 2012 From: jimmyislive at gmail.com (Jimmy John) Date: Mon, 24 Dec 2012 10:37:42 -0800 Subject: [Catalog-sig] PyPI JSON API Message-ID: Hello, I am trying to build a personal tool for which I am looking at the PyPI JSON API. I can get details of a particular project e.g. http://pypi.python.org/pypi/Django/json But, is there an endpoint that gives me the index of all the packages available. Something like what this page has: http://pypi.python.org/pypi?%3Aaction=index but in json form. This will allow me to iterate over all the packages and extract details from it. thx Jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug.hellmann at gmail.com Thu Dec 27 21:43:13 2012 From: doug.hellmann at gmail.com (Doug Hellmann) Date: Thu, 27 Dec 2012 15:43:13 -0500 Subject: [Catalog-sig] setting up a "full" mirror Message-ID: <976B18CD-4B80-494D-AE40-3C5E62329D51@gmail.com> DreamHost is considering setting up a mirror of PyPI. Before we commit to doing so, we would like to find out how much space is currently required and how fast that space requirement has been growing over the past few months. Are those stats available easily? Doug From donald.stufft at gmail.com Thu Dec 27 21:51:20 2012 From: donald.stufft at gmail.com (Donald Stufft) Date: Thu, 27 Dec 2012 15:51:20 -0500 Subject: [Catalog-sig] setting up a "full" mirror In-Reply-To: <976B18CD-4B80-494D-AE40-3C5E62329D51@gmail.com> References: <976B18CD-4B80-494D-AE40-3C5E62329D51@gmail.com> Message-ID: When I started Crate.io in February 2012 a full mirror was 27GB, When I did a fresh sync the other day it was 38GB. On Thursday, December 27, 2012 at 3:43 PM, Doug Hellmann wrote: > DreamHost is considering setting up a mirror of PyPI. Before we commit to doing so, we would like to find out how much space is currently required and how fast that space requirement has been growing over the past few months. > > Are those stats available easily? > > Doug > > _______________________________________________ > Catalog-SIG mailing list > Catalog-SIG at python.org (mailto:Catalog-SIG at python.org) > http://mail.python.org/mailman/listinfo/catalog-sig > > -------------- next part -------------- An HTML attachment was scrubbed... URL: