From pje at telecommunity.com Wed Jul 5 17:36:03 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 05 Jul 2006 11:36:03 -0400 Subject: [Catalog-sig] [Distutils] Specification for package indexes? In-Reply-To: References: <5.1.1.6.0.20060623162538.01eabc20@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060705112046.021c5720@sparrow.telecommunity.com> At 04:04 AM 7/5/2006 -0400, Jim Fulton wrote: >On Jun 23, 2006, at 4:51 PM, Jim Fulton wrote: >... >> >>That's a lot of screen scraping. :) >> >>It would be good to capture this as part of the documentation IMO >> >>>I'm considering adding XML-RPC support to easy_install in 0.7, >>>though. PyPI now has a nice XML-RPC API that is more responsive >>>than the web UI, and it supports case-insensitive partial match >>>searches, making it suitable for easy_install to query when a typed- >>>in name doesn't exactly match the spelling of a PyPI entry. >> >>I think that would be much better. > >I just wanted to emphasize that I think this would be a good >idea. Patches welcome. :) Note that there should still be a fallback to the screen scraping code in case of a problem with the XML-RPC, to allow people to continue using static mirrors of PyPI or imitation PyPIs without needing to support XML-RPC. > I was just talking to Richard, and he pointed out that the >current approach is a problem for him, because it means he can't >evolve the pypi UI without risking breaking setuptools. What I would suggest is creating a "microformat" for marking up web pages with sniffable information. For example, adding rel="homepage" and rel="download" to the links that go to those URLs. In other words, invisible hints on the page to supplement the visible information. Then, I could change easy_install to start using the invisible hints, and drop the visible ones, freeing PyPI to evolve the UI again. While the XML-RPC API would be great, I still want easy_install to be able to use a package index that's made from static files, and that requires some kind of screen scraping. So, let's make it invisible scraping of a documented format, so that anybody can use it, with whatever visual formats they like. Currently, easy_install gets most of its information from URLs; the only actual scraping of visible data is of the title, the download MD5's, and the table cells that identify links as being to the home page or download URL (since it needs to specifically identify these in order to spider them). The MD5 information dependency could be removed if PyPI included "#md5=..." at the end of the download URLs; easy_install can see that information and use it. The table cell checking could be removed by adding 'rel="easy_install"' or something like that to the spiderable links. The title checking is used to distinguish pages that list multiple packages from pages that list single packages. I don't have any ready ideas as to how that could or should be represented in a semantic (as opposed to visual) way. Your thoughts? From richardjones at optusnet.com.au Thu Jul 6 11:09:23 2006 From: richardjones at optusnet.com.au (richardjones at optusnet.com.au) Date: Thu, 06 Jul 2006 19:09:23 +1000 Subject: [Catalog-sig] [Distutils] Specification for package indexes? Message-ID: <200607060909.k6699NLd004491@mail12.syd.optusnet.com.au> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://mail.python.org/pipermail/catalog-sig/attachments/20060706/b5b6bc57/attachment.asc From fdrake at gmail.com Thu Jul 6 15:58:23 2006 From: fdrake at gmail.com (Fred Drake) Date: Thu, 6 Jul 2006 09:58:23 -0400 Subject: [Catalog-sig] [Distutils] Specification for package indexes? In-Reply-To: <200607060909.k6699NLd004491@mail12.syd.optusnet.com.au> References: <200607060909.k6699NLd004491@mail12.syd.optusnet.com.au> Message-ID: <9cee7ab80607060658i31dfd34l4c1eeed7e5330129@mail.gmail.com> On 7/6/06, richardjones at optusnet.com.au wrote: > Phillip J. Eby wrote: > > Patches welcome. :) Note that there should still be a fallback to the > > screen scraping code in case of a problem with the XML-RPC, to allow > > people > > to continue using static mirrors of PyPI or imitation PyPIs without > > needing > > to support XML-RPC. > > Why? So we can easily have alternate or additional package repositories implemented simply as (simple) HTML files and the downloadable packages. We want to be able to have an internal repository that plays the easy_install game without running our own PyPI. -Fred -- Fred L. Drake, Jr. "Every sin is the result of a collaboration." --Lucius Annaeus Seneca From pje at telecommunity.com Thu Jul 6 16:03:56 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 06 Jul 2006 10:03:56 -0400 Subject: [Catalog-sig] [Distutils] Specification for package indexes? In-Reply-To: <200607060909.k6699NLd004491@mail12.syd.optusnet.com.au> Message-ID: <5.1.1.6.0.20060706094755.02bf1d48@sparrow.telecommunity.com> At 07:09 PM 7/6/2006 +1000, richardjones at optusnet.com.au wrote: >Phillip J. Eby wrote: > > Patches welcome. :) Note that there should still be a fallback to the > > screen scraping code in case of a problem with the XML-RPC, to allow > > people > > to continue using static mirrors of PyPI or imitation PyPIs without > > needing > > to support XML-RPC. > >Why? Why not? ;) From easy_install's point of view, PyPI is just a place to find links for a given package name. Preferably links that go directly to downloads, but also to pages that might contain downloads. If someone doesn't want to use PyPI as the source of download links, shouldn't they be able to use their own, without having to implement an XML-RPC interface? Actually, the question of "how do I get easy_install to use something other than PyPI?" has been becoming somewhat of a FAQ recently. Well, two people have asked about it in the last couple of weeks, anyway. And it would've sucked to have to say "well, first you need an XML-RPC server..." :) Nonetheless, there are various aspects of easy_install's behavior and performance that could be significantly improved by using XML-RPC, so I definitely want it to do that in 0.7. I'm just wary of removing the existing behavior until it's clear that it's unnecessary for it to. From richardjones at optusnet.com.au Thu Jul 6 17:03:33 2006 From: richardjones at optusnet.com.au (richardjones at optusnet.com.au) Date: Fri, 07 Jul 2006 01:03:33 +1000 Subject: [Catalog-sig] [Distutils] Specification for package indexes? Message-ID: <200607061503.k66F3X7S005870@mail20.syd.optusnet.com.au> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://mail.python.org/pipermail/catalog-sig/attachments/20060707/d96eaade/attachment.pot From richardjones at optusnet.com.au Thu Jul 6 17:39:12 2006 From: richardjones at optusnet.com.au (richardjones at optusnet.com.au) Date: Fri, 07 Jul 2006 01:39:12 +1000 Subject: [Catalog-sig] PyPI XML-RPC changes Message-ID: <200607061539.k66FdCV0007024@mail07.syd.optusnet.com.au> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://mail.python.org/pipermail/catalog-sig/attachments/20060707/54e6525a/attachment.asc From exarkun at divmod.com Thu Jul 6 17:43:48 2006 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Thu, 6 Jul 2006 11:43:48 -0400 Subject: [Catalog-sig] [Distutils] Specification for package indexes? In-Reply-To: <200607061503.k66F3X7S005870@mail20.syd.optusnet.com.au> Message-ID: <20060706154348.29014.1424458626.divmod.quotient.24966@ohm> On Fri, 07 Jul 2006 01:03:33 +1000, richardjones at optusnet.com.au wrote: >> Phillip J. Eby wrote: >> Why not? ;) > >That was actually what I was afraid the reasoning was ;) > >I guess I just go all wobbly in the knees at the thought of having to maintain a "screen scraping" interface. > >Funnily enough, Johannes Gisjbers, Andrew Dalke and I were talking about this very issue last night. I proposed that we detect the user-agent of the setuptools client, and in response send back really minimalist HTML (no surrounding page template). Probably overkill, but this may have been after we'd had beer :) Making this explicit actually makes it a good idea. http://host/pypiurl?format=simple It doesn't even have to be html at this point, either. Return a nicely structured xml or csv or whatever document. And then you can completely omit setuptools hints from the actual markup, since setuptools won't ever care about that. Jean-Paul From fdrake at gmail.com Thu Jul 6 17:44:16 2006 From: fdrake at gmail.com (Fred Drake) Date: Thu, 6 Jul 2006 11:44:16 -0400 Subject: [Catalog-sig] PyPI XML-RPC changes In-Reply-To: <200607061539.k66FdCV0007024@mail07.syd.optusnet.com.au> References: <200607061539.k66FdCV0007024@mail07.syd.optusnet.com.au> Message-ID: <9cee7ab80607060844n397a96c5i498ee5cd8465d19a@mail.gmail.com> On 7/6/06, richardjones at optusnet.com.au wrote: > The XML-RPC interface is being reviewed at present, and some quite reasonable changes are being suggested. These will alter the actual method names, so I'd like to know whether anyone on this list is already actually *using* the methods as posted to this list. Do calls to the XML -RPC methods get logged? That would be a good indicator. -Fred -- Fred L. Drake, Jr. "Every sin is the result of a collaboration." --Lucius Annaeus Seneca From richardjones at optusnet.com.au Thu Jul 6 17:50:38 2006 From: richardjones at optusnet.com.au (richardjones at optusnet.com.au) Date: Fri, 07 Jul 2006 01:50:38 +1000 Subject: [Catalog-sig] PyPI XML-RPC changes Message-ID: <200607061550.k66Foc5W030225@mail26.syd.optusnet.com.au> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://mail.python.org/pipermail/catalog-sig/attachments/20060707/fb9e8066/attachment.asc From richardjones at optusnet.com.au Thu Jul 6 17:52:36 2006 From: richardjones at optusnet.com.au (richardjones at optusnet.com.au) Date: Fri, 07 Jul 2006 01:52:36 +1000 Subject: [Catalog-sig] [Distutils] Specification for package indexes? Message-ID: <200607061552.k66Fqan8021109@mail07.syd.optusnet.com.au> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://mail.python.org/pipermail/catalog-sig/attachments/20060707/c478e8ab/attachment.pot From fdrake at gmail.com Thu Jul 6 18:36:16 2006 From: fdrake at gmail.com (Fred Drake) Date: Thu, 6 Jul 2006 12:36:16 -0400 Subject: [Catalog-sig] PyPI XML-RPC changes In-Reply-To: <200607061550.k66Foc5W030225@mail26.syd.optusnet.com.au> References: <200607061550.k66Foc5W030225@mail26.syd.optusnet.com.au> Message-ID: <9cee7ab80607060936x58f03254h8a54dd080adbd127@mail.gmail.com> On 7/6/06, richardjones at optusnet.com.au wrote: > Yep, they do, and someone here also suggested this approach. There is a chance though that someone has distributed code that hasn't been exercised yet. Slim, of course. That code's only interesting if it gets executed. If it was distributed without even a test run, it's probably buggy anyway. ;-) Frequency of execution is harder to make predictions about, though. I figure if you look through a few months (like 3) of logs, that's plenty. Any calls that aren't found can be changed with impunity. Of course, one more round of emails on the topic means it'll be cheaper to leave the existing names regardless of how much better the new names are. -Fred -- Fred L. Drake, Jr. "Every sin is the result of a collaboration." --Lucius Annaeus Seneca From pje at telecommunity.com Thu Jul 6 18:56:03 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 06 Jul 2006 12:56:03 -0400 Subject: [Catalog-sig] [Distutils] Specification for package indexes? In-Reply-To: <200607061503.k66F3X7S005870@mail20.syd.optusnet.com.au> Message-ID: <5.1.1.6.0.20060706122834.02056f00@sparrow.telecommunity.com> At 01:03 AM 7/7/2006 +1000, richardjones at optusnet.com.au wrote: > > Phillip J. Eby wrote: > > Why not? ;) > >That was actually what I was afraid the reasoning was ;) > >I guess I just go all wobbly in the knees at the thought of having to >maintain a "screen scraping" interface. You don't need to -- at least not in the long term. Once setuptools 0.7 supports the XML-RPC interface, it won't need the other scraping tricks to read PyPI. Those would be left in for people who are creating their own package indexes, not constraining further development of PyPI itself. Please keep in mind that easy_install makes *extremely* minimal assumptions about PyPI's interface: 1. It assumes that baseURL/projectname will get to the current version of projectname, or a page with a list of projectname's active versions 2. It assumes that links within PyPI of the form baseURL/something1/something2 are links to version 'something2' of a project named 'something1' 3. It assumes that going to baseURL directly will result in a page with links to all available packages in the form described in #2. 4. It assumes that if baseURL/projectname returns a page containing the text "Index of Packages", it is a list of links of the form described in #2. 5. It looks for and follows the first links following the strings "Home Page" and "Download URL" in a project page. 6. It makes assumptions about how to find MD5 data on a PyPI page, but if it fails to do so, it simply won't check the MD5 of downloads. Also note that even with an XML-RPC interface, easy_install will *still* need to read an HTML page to gather links, because it's valid for people to provide links in their long_description using reStructuredText. It's just that assumptions 1, 3, and 4 (and maybe 5) would not be necessary. Also note that in a pinch, you can put the strings easy_install is looking for inside HTML comments. Easy_install really isn't that bright. ;) However, if you can provide *all* of this data via the API (including an html-formatted long description), then the screen scraping can go away as far as PyPI is concerned. >Funnily enough, Johannes Gisjbers, Andrew Dalke and I were talking about >this very issue last night. I proposed that we detect the user-agent of >the setuptools client, and in response send back really minimalist HTML >(no surrounding page template). Probably overkill, but this may have been >after we'd had beer :) There's a simpler solution that could be implemented: adding a 'rel="easy-install"' attribute to links that easy_install should follow. Currently, those links are the project's home page URL, download URL, and the links to specific versions that show up when you go to a project that has multiple active versions. Adding it to these, and *only* these links would give easy_install enough information to do the right thing. However, support would have to wait for setuptools 0.7 anyhow, so there's little reason to do this. Hm. I just tried to make multiple versions of PEAK active, and it seems like you can't get the page that lists multiple versions any more. No wonder some people have been having problems downloading older versions of certain packages. :( How are people supposed to get to older package versions now? That is, what's the point of being able to have multiple active versions if you can't find them? Is this an intended change, or a bug? >Could you provide a clear list of all the specific changes you wish for us >to make at the Sprint? I've provided a list above of what changes I want you *not* to make. How's that? ;) > > Nonetheless, there are various aspects of easy_install's behavior and > > performance that could be significantly improved by using XML-RPC, so I > > definitely want it to do that in 0.7. I'm just wary of removing the > > existing behavior until it's clear that it's unnecessary for it to. > >Oh - another thing that occurred to me -- does setuptools auto update itself? What do you mean? You can run "easy_install -u setuptools" to upgrade to the latest release at any time. But it doesn't go out looking for updates on its own. From richardjones at optusnet.com.au Fri Jul 7 11:41:50 2006 From: richardjones at optusnet.com.au (richardjones at optusnet.com.au) Date: Fri, 07 Jul 2006 19:41:50 +1000 Subject: [Catalog-sig] [Distutils] Specification for package indexes? Message-ID: <200607070941.k679fo2f025860@mail23.syd.optusnet.com.au> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://mail.python.org/pipermail/catalog-sig/attachments/20060707/3fa8eff3/attachment.pot From jim at zope.com Fri Jul 7 12:55:44 2006 From: jim at zope.com (Jim Fulton) Date: Fri, 7 Jul 2006 06:55:44 -0400 Subject: [Catalog-sig] [Distutils] Specification for package indexes? In-Reply-To: <5.1.1.6.0.20060706122834.02056f00@sparrow.telecommunity.com> References: <5.1.1.6.0.20060706122834.02056f00@sparrow.telecommunity.com> Message-ID: <1DEFC153-D9BB-4FA9-A1FA-65E5F4056AF2@zope.com> I'd like to suggest that we take a step back. It feels as though we are reacting rather than designing. I think we have the following goals: 1. setuptools should be able to read indexes robustly and efficiently. 2. It should be straightforward, and preferably *easy* for people to implement their own indexes. This is very important to me. :) Perhaps: 3. It should be easy to mirror an index 4. It should be possible to create a read index as a static HTTP server. And I suggest: 5. It should be possible to provide an end-user experience for an index without affecting the setuptools interface 4. It should be possible to write other setuptools-like applications for accessing indexes. This means that the web-service (small w-s) should be well defined and/or that setuptools should expose a Python API for accessing indexes. From a design perspective: a. screen scraping is bad b. the web API should be simple and well defined. I suggest, as others have suggested, that we create an *alternate* web API for reading an index focussed on cleanliness and on making the API as easy as possible to implement for both index and client developers. If we agree with all of the goals stated above, I think this should be static HTTP interface using XHTML or some other XML dialect. Perhaps we could even use specific HTML class attrs to make it possible to combine the pypi and user interfaces if an index implementor desires. Thoughts? Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From pje at telecommunity.com Fri Jul 7 17:52:29 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 07 Jul 2006 11:52:29 -0400 Subject: [Catalog-sig] [Distutils] Specification for package indexes? In-Reply-To: <200607070941.k679fo2f025860@mail23.syd.optusnet.com.au> Message-ID: <5.1.1.6.0.20060707114135.02d91280@sparrow.telecommunity.com> At 07:41 PM 7/7/2006 +1000, richardjones at optusnet.com.au wrote: > > 3. It assumes that going to baseURL directly will result in a page with > > links to all available packages in the form described in #2. > >This has been removed as it seems completely unnecessary (a flat listing >of all 1400+ packages, that is). The XML-RPC interface provides the >functionality you require here. Note that this will BREAK easy_install in the field, as it will no longer be possible for easy_install to find packages with odd punctuation in their names, or which the user has incorrectly specified the case of. E.g. if someone asks for "sqlobject" instead of "SQLObject", it will no longer work. This is a pretty serious breakage. If you really *must* remove this, then you need to add name canonicalization so that going to /pypi/sqlobject works the same as /pypi/SQLObject. Similarly, going to "/pypi/foo-bar" should work the same as "/pypi/Foo & Bar", and so on. That is, a case-insensitive, "safe_name()" match (see pkg_resources code or docs for the definition of safe_name()). If you can't support this (which I've previously asked you to over the last year so that I could remove the list dependency), PLEASE put the package list back, because you just broke easy_install's ability to support user friendly names. You've known for a *year* that easy_install depended on this feature of PyPI: http://mail.python.org/pipermail/catalog-sig/2005-June/000654.html If you needed it to go away, giving me some notice would have been nice. Not just to me, but to all the people who use PyPI via easy_install. This gratuitous breakage is not nice. From pje at telecommunity.com Fri Jul 7 18:07:19 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 07 Jul 2006 12:07:19 -0400 Subject: [Catalog-sig] [Distutils] Specification for package indexes? In-Reply-To: <5.1.1.6.0.20060707114135.02d91280@sparrow.telecommunity.co m> References: <200607070941.k679fo2f025860@mail23.syd.optusnet.com.au> Message-ID: <5.1.1.6.0.20060707115709.03fe2ea8@sparrow.telecommunity.com> At 11:52 AM 7/7/2006 -0400, Phillip J. Eby wrote: >If you needed it to go away, giving me some notice would have been >nice. Not just to me, but to all the people who use PyPI via >easy_install. This gratuitous breakage is not nice. It looks like easy_install is still actually working at the moment; I guess that means you kept the list working if the user-agent is easy_install. Thank you, and sorry for jumping to conclusions. I took "This has been removed" to mean that it had been, well, removed. :) I will mention again, though, that it *can* be removed entirely, without needing any in-the-field upgrades, if project name matches in URLs can be made case-insensitive and canonical via safe_name: def safe_name(name): """Convert an arbitrary string to a standard distribution name Any runs of non-alphanumeric/. characters are replaced with a single '-'. """ return re.sub('[^A-Za-z0-9.]+', '-', name) If you can fall back to matching safe_name(pkg_name).lower() against search_name.lower() if the URL is not exact, then easy_install will not need the full package listing *or* the XML-RPC interface. It will work as-is in the field today, with no patching required. The only thing the full package listing is used for is to do this search. (Likewise, it would've been the main thing the XML-RPC API would've been used for.) From pje at telecommunity.com Fri Jul 7 18:18:32 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 07 Jul 2006 12:18:32 -0400 Subject: [Catalog-sig] [Distutils] Specification for package indexes? In-Reply-To: <1DEFC153-D9BB-4FA9-A1FA-65E5F4056AF2@zope.com> References: <5.1.1.6.0.20060706122834.02056f00@sparrow.telecommunity.com> <5.1.1.6.0.20060706122834.02056f00@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060707120827.02d8f2b8@sparrow.telecommunity.com> At 06:55 AM 7/7/2006 -0400, Jim Fulton wrote: > From a design perspective: > >a. screen scraping is bad As long as you define "screen scraping" as "dependency on visible characteristics of HTML", then I agree. easy_install shouldn't be relying on the visible bits of HTML that it currently uses to scope out PyPI. Relying on a particular URL layout is not screen-scraping, though, and using the URL layout as part of the API has some good properties for ease of implementation in static form. So does using href's to obtain link information. What we should be doing is adding non-visible markup (e.g. class="" or rel="") information to the links to allow index creators to direct easy_install without affecting visible page characteristics. >b. the web API should be simple and well defined. > >I suggest, as others have suggested, that we create an *alternate* >web API for reading an index focussed on cleanliness and on making >the API as easy as possible to implement for both index and client >developers. If we agree with all of the goals stated above, I think >this should be static HTTP interface using XHTML or some other XML >dialect. Perhaps we could even use specific HTML class attrs to >make it possible to combine the pypi and user interfaces if an index >implementor desires. > >Thoughts? +1 on static pages. I don't, however, see a reason to require valid XML. Or rather, I don't expect to implement XML parsing in easy_install; if the spec is too complex to implement with regular expression matching, it's probably too complex for people to throw together an index with what's at hand. In particular, I'd like it to be practical to put together a simple index just using Apache's built-in directory indexes, as long as they use the right URL hierarchy. That means that class or rel attributes should only be required for links that are requesting non-index pages to be spidered. From richardjones at optusnet.com.au Fri Jul 7 18:51:27 2006 From: richardjones at optusnet.com.au (richardjones at optusnet.com.au) Date: Sat, 08 Jul 2006 02:51:27 +1000 Subject: [Catalog-sig] [Distutils] Specification for package indexes? Message-ID: <200607071651.k67GpRvT020914@mail20.syd.optusnet.com.au> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://mail.python.org/pipermail/catalog-sig/attachments/20060708/a445612a/attachment.pot From richardjones at optusnet.com.au Fri Jul 7 19:01:37 2006 From: richardjones at optusnet.com.au (richardjones at optusnet.com.au) Date: Sat, 08 Jul 2006 03:01:37 +1000 Subject: [Catalog-sig] [Distutils] Specification for package indexes? Message-ID: <200607071701.k67H1b22026443@mail16.syd.optusnet.com.au> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://mail.python.org/pipermail/catalog-sig/attachments/20060708/c514dd7d/attachment.asc From jim at zope.com Fri Jul 7 19:32:42 2006 From: jim at zope.com (Jim Fulton) Date: Fri, 7 Jul 2006 13:32:42 -0400 Subject: [Catalog-sig] [Distutils] Specification for package indexes? In-Reply-To: <5.1.1.6.0.20060707120827.02d8f2b8@sparrow.telecommunity.com> References: <5.1.1.6.0.20060706122834.02056f00@sparrow.telecommunity.com> <5.1.1.6.0.20060706122834.02056f00@sparrow.telecommunity.com> <5.1.1.6.0.20060707120827.02d8f2b8@sparrow.telecommunity.com> Message-ID: On Jul 7, 2006, at 12:18 PM, Phillip J. Eby wrote: > At 06:55 AM 7/7/2006 -0400, Jim Fulton wrote: >> From a design perspective: >> >> a. screen scraping is bad > > As long as you define "screen scraping" as "dependency on visible > characteristics of HTML", then I agree. easy_install shouldn't be > relying on the visible bits of HTML that it currently uses to scope > out PyPI. Yup > Relying on a particular URL layout is not screen-scraping, though, > and using the URL layout as part of the API has some good > properties for ease of implementation in static form. So does > using href's to obtain link information. Yes. > What we should be doing is adding non-visible markup (e.g. class="" > or rel="") information to the links to allow index creators to > direct easy_install without affecting visible page characteristics. Yes >> b. the web API should be simple and well defined. >> >> I suggest, as others have suggested, that we create an *alternate* >> web API for reading an index focussed on cleanliness and on making >> the API as easy as possible to implement for both index and client >> developers. If we agree with all of the goals stated above, I think >> this should be static HTTP interface using XHTML or some other XML >> dialect. Perhaps we could even use specific HTML class attrs to >> make it possible to combine the pypi and user interfaces if an index >> implementor desires. >> >> Thoughts? > > +1 on static pages. I don't, however, see a reason to require > valid XML. Or rather, I don't expect to implement XML parsing in > easy_install; if the spec is too complex to implement with regular > expression matching, it's probably too complex for people to throw > together an index with what's at hand. In particular, I'd like it > to be practical to put together a simple index just using Apache's > built-in directory indexes, as long as they use the right URL > hierarchy. That means that class or rel attributes should only be > required for links that are requesting non-index pages to be spidered. I would find parsing much easier with an XML parser than with regular expressions. I think it would be much more robust too. I do want to see something that is well documented and pretty easy to implement. Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From pje at telecommunity.com Fri Jul 7 20:02:43 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 07 Jul 2006 14:02:43 -0400 Subject: [Catalog-sig] [Distutils] Specification for package indexes? In-Reply-To: <200607071701.k67H1b22026443@mail16.syd.optusnet.com.au> Message-ID: <5.1.1.6.0.20060707134531.02d1d778@sparrow.telecommunity.com> At 03:01 AM 7/8/2006 +1000, richardjones at optusnet.com.au wrote: >And I will mention again that I'm not willing to just whack this code into >the package index without some thought to to ramifications given that >distutils has no such name mangling or limitation. ...which is arguably a serious bug in the distutils. Actually, it's several bugs having to do with inconsistent filename mangling throughout the distutils' various commands, and a design limitation of not anticipating that people would register Python packages with name's like "Bob's Incredible Package for Python". :) The ramifications of adding a fallback search to URL lookup (and you need *only* add it to URL lookup for "/pypi/projectname") are these: 1. Exact matches will work exactly as they do now 2. URLs that now produce "Not Found" would produce a list of links that easy_install is already capable of sorting through -- just a shorter one than the whole package index 3. There is no ramification three. :) 4. Users who manually type a URL to find a package will get a nice helpful list of links. :) 5. You can remove the full package listing. >We will have potential (however remote) of name collision and we'll have >to deal with that somehow. Yes, and we should deal with it by rejecting registration of colliding names. The difference between "you can't register an identically-named package" and "you can't register a package that differs only in letter case and punctuation from another package" is very small, but also very *good* for human users, who are not going to be able to remember whether they want "Bob's very Incredible Package" or "Bob's very incredible package", or be able to tell the difference between "My Super Package" and "My Super Package" at a glance. (One of those names has an extra space in it, in case you can't tell.) In truth, the name collision problem is already *possible*, and PyPI should simply reject registration for colliding package names. Allowing people to register say, "SQLObject" and "sqlobject" or "Zope" and "zope" or "foo bar" and "Foo!Bar" as if they were actually different packages isn't doing anybody a favor. Meanwhile, if you go to /pypi/SQLObject now, you get back a list of links that includes SQLObject2, so there's obviously some kind of mangling taking place for URL searches already. If you're concerned about it being ambiguous, just make the fallback search result in a list of matching packages, *exactly the same way that going to "SQLObject" does now*, just with a broader match criterion: http://cheeseshop.python.org/pypi/SQLObject From fdrake at gmail.com Fri Jul 7 20:04:12 2006 From: fdrake at gmail.com (Fred Drake) Date: Fri, 7 Jul 2006 14:04:12 -0400 Subject: [Catalog-sig] [Distutils] Specification for package indexes? In-Reply-To: References: <5.1.1.6.0.20060706122834.02056f00@sparrow.telecommunity.com> <5.1.1.6.0.20060707120827.02d8f2b8@sparrow.telecommunity.com> Message-ID: <9cee7ab80607071104wb987d3avd2dd9e872822346c@mail.gmail.com> On 7/7/06, Jim Fulton wrote: > > +1 on static pages. I don't, however, see a reason to require > > valid XML. Or rather, I don't expect to implement XML parsing in > > easy_install; if the spec is too complex to implement with regular > > expression matching, it's probably too complex for people to throw > > together an index with what's at hand. In particular, I'd like it > > to be practical to put together a simple index just using Apache's > > built-in directory indexes, as long as they use the right URL > > hierarchy. That means that class or rel attributes should only be > > required for links that are requesting non-index pages to be spidered. > > I would find parsing much easier with an XML parser than with > regular expressions. > I think it would be much more robust too. XHTML would be best, though I agree we shouldn't care about validity so much as just well-formedness (which is required). I think it should be possible to do it with valid XHTML, though, since whether that's desired or not is a python.org policy concern. (Not that I suspect we'll ever really care about that.) Of course, it should be possible to parse with htmllib and HTMLParser as well. -Fred -- Fred L. Drake, Jr. "Every sin is the result of a collaboration." --Lucius Annaeus Seneca From pje at telecommunity.com Fri Jul 7 20:31:01 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 07 Jul 2006 14:31:01 -0400 Subject: [Catalog-sig] [Distutils] Specification for package indexes? In-Reply-To: <9cee7ab80607071104wb987d3avd2dd9e872822346c@mail.gmail.com > References: <5.1.1.6.0.20060706122834.02056f00@sparrow.telecommunity.com> <5.1.1.6.0.20060707120827.02d8f2b8@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060707142217.02d15bd0@sparrow.telecommunity.com> At 02:04 PM 7/7/2006 -0400, Fred Drake wrote: >On 7/7/06, Jim Fulton wrote: > > > +1 on static pages. I don't, however, see a reason to require > > > valid XML. Or rather, I don't expect to implement XML parsing in > > > easy_install; if the spec is too complex to implement with regular > > > expression matching, it's probably too complex for people to throw > > > together an index with what's at hand. In particular, I'd like it > > > to be practical to put together a simple index just using Apache's > > > built-in directory indexes, as long as they use the right URL > > > hierarchy. That means that class or rel attributes should only be > > > required for links that are requesting non-index pages to be spidered. > > > > I would find parsing much easier with an XML parser than with > > regular expressions. > > I think it would be much more robust too. > >XHTML would be best, though I agree we shouldn't care about validity >so much as just well-formedness (which is required). I think it >should be possible to do it with valid XHTML, though, since whether >that's desired or not is a python.org policy concern. (Not that I >suspect we'll ever really care about that.) > >Of course, it should be possible to parse with htmllib and HTMLParser as well. I still think requiring even HTML validity or well-formedness is YAGNI; one could indeed just pull all well-formed URLs from the page. EasyInstall uses this case-insensitive regular expression to find only href'd urls: href\s*=\s*['"]?([^'"> ]+) In the absence of a requirement for more information than this (perhaps coupled with a "rel" attribute in the same element), I'm wary of starting out by requiring even well-formedness, because it's way overkill for the requirements as I understand them. One of the advantage of defining the URL layout as part of the API is that it gives you enough contextual information to decide what links should be followed, and which ones are purely informational. Indeed, the only reason to look at anything *but* hrefs is to indicate that an *external* (i.e. non-index) link should be followed, to spider for other download links. So if following external links is out of scope for the API we want to define, then *any* information other than the URLs in an API page are YAGNI. Now, all of this is based on my assumption that the use case here is somebody wants to throw together a rough-and-ready package index that tools should be able to use to find *downloadable distributions*. If you and Jim have much more elaborate use cases in mind, then of course some well-formedness might be useful. On the other hand, if such rigor is required, then it seems like we should just be using machine-readable data in the first place, rather than using a dual-purpose format like HTML or XHTML. Just go with a specialized XML dialect or some kind of text format (ZConfig? ;) ) and be done with it. From pje at telecommunity.com Fri Jul 7 20:50:36 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 07 Jul 2006 14:50:36 -0400 Subject: [Catalog-sig] [Distutils] Specification for package indexes? In-Reply-To: <5.1.1.6.0.20060707142217.02d15bd0@sparrow.telecommunity.co m> References: <9cee7ab80607071104wb987d3avd2dd9e872822346c@mail.gmail.com > <5.1.1.6.0.20060706122834.02056f00@sparrow.telecommunity.com> <5.1.1.6.0.20060707120827.02d8f2b8@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060707144633.01edd910@sparrow.telecommunity.com> At 02:31 PM 7/7/2006 -0400, Phillip J. Eby wrote: >On the other hand, if such rigor is required, then it seems like we should >just be using machine-readable data in the first place, rather than using a >dual-purpose format like HTML or XHTML. Just go with a specialized XML >dialect or some kind of text format (ZConfig? ;) ) and be done with it. FWIW, I just discovered DOAP, and it seems like a good basis for this sort of thing. PyPI isn't generating 'file-release' info in its DOAP output, but probably should, if this is going to be used for that sort of thing. Likewise, I think that the reST-to-HTML translation should be carried over into the long-description content. (This is all assuming, of course, that you have use cases for which the richness of DOAP is relevant, anyway, as opposed to just trying to find download links for projects with known names.) From jim at zope.com Fri Jul 7 20:52:51 2006 From: jim at zope.com (Jim Fulton) Date: Fri, 7 Jul 2006 14:52:51 -0400 Subject: [Catalog-sig] [Distutils] Specification for package indexes? In-Reply-To: <5.1.1.6.0.20060707142217.02d15bd0@sparrow.telecommunity.com> References: <5.1.1.6.0.20060706122834.02056f00@sparrow.telecommunity.com> <5.1.1.6.0.20060707120827.02d8f2b8@sparrow.telecommunity.com> <5.1.1.6.0.20060707142217.02d15bd0@sparrow.telecommunity.com> Message-ID: <46433695-C678-4CE9-AC62-253D01C3EF69@zope.com> On Jul 7, 2006, at 2:31 PM, Phillip J. Eby wrote: > At 02:04 PM 7/7/2006 -0400, Fred Drake wrote: >> On 7/7/06, Jim Fulton wrote: >> > > +1 on static pages. I don't, however, see a reason to require >> > > valid XML. Or rather, I don't expect to implement XML parsing in >> > > easy_install; if the spec is too complex to implement with >> regular >> > > expression matching, it's probably too complex for people to >> throw >> > > together an index with what's at hand. In particular, I'd >> like it >> > > to be practical to put together a simple index just using >> Apache's >> > > built-in directory indexes, as long as they use the right URL >> > > hierarchy. That means that class or rel attributes should >> only be >> > > required for links that are requesting non-index pages to be >> spidered. >> > >> > I would find parsing much easier with an XML parser than with >> > regular expressions. >> > I think it would be much more robust too. >> >> XHTML would be best, though I agree we shouldn't care about validity >> so much as just well-formedness (which is required). I think it >> should be possible to do it with valid XHTML, though, since whether >> that's desired or not is a python.org policy concern. (Not that I >> suspect we'll ever really care about that.) >> >> Of course, it should be possible to parse with htmllib and >> HTMLParser as well. > > I still think requiring even HTML validity or well-formedness is > YAGNI; one could indeed just pull all well-formed URLs from the > page. EasyInstall uses this case-insensitive regular expression to > find only href'd urls: > > href\s*=\s*['"]?([^'"> ]+) > > In the absence of a requirement for more information than this > (perhaps coupled with a "rel" attribute in the same element), I'm > wary of starting out by requiring even well-formedness, because > it's way overkill for the requirements as I understand them. But I thought we *were* talking about adding rel or class tags so that we could determine information about the intended use of a URL. > One of the advantage of defining the URL layout as part of the API > is that it gives you enough contextual information to decide what > links should be followed, and which ones are purely informational. Perhaps someone should propose an API and we'll see. :) > Indeed, the only reason to look at anything *but* hrefs is to > indicate that an *external* (i.e. non-index) link should be > followed, to spider for other download links. So if following > external links is out of scope for the API we want to define, then > *any* information other than the URLs in an API page are YAGNI. Who said following external links is out of scope. > Now, all of this is based on my assumption that the use case here > is somebody wants to throw together a rough-and-ready package index > that tools should be able to use to find *downloadable > distributions*. If you and Jim have much more elaborate use cases > in mind, then of course some well-formedness might be useful. setuptools has a notion of an index. That notion is not at all well defined. Currently, the index has linkes that are followed to find package links elsewhere. This seems reasonably useful. I dunno. I'm not sure I care. What I do care about is that the index API should be well defined so that we can implement alternate indexes and alternate tools to read indexes. I'm not looking to satisfy use cases beyond what we have now. All I want is an API. :) I'm not bent on XML. Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From pje at telecommunity.com Fri Jul 7 22:20:11 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 07 Jul 2006 16:20:11 -0400 Subject: [Catalog-sig] [Distutils] Specification for package indexes? In-Reply-To: <46433695-C678-4CE9-AC62-253D01C3EF69@zope.com> References: <5.1.1.6.0.20060707142217.02d15bd0@sparrow.telecommunity.com> <5.1.1.6.0.20060706122834.02056f00@sparrow.telecommunity.com> <5.1.1.6.0.20060707120827.02d8f2b8@sparrow.telecommunity.com> <5.1.1.6.0.20060707142217.02d15bd0@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060707160846.032ffc88@sparrow.telecommunity.com> At 02:52 PM 7/7/2006 -0400, Jim Fulton wrote: >On Jul 7, 2006, at 2:31 PM, Phillip J. Eby wrote: > >>At 02:04 PM 7/7/2006 -0400, Fred Drake wrote: >>>On 7/7/06, Jim Fulton wrote: >>> > > +1 on static pages. I don't, however, see a reason to require >>> > > valid XML. Or rather, I don't expect to implement XML parsing in >>> > > easy_install; if the spec is too complex to implement with >>>regular >>> > > expression matching, it's probably too complex for people to >>>throw >>> > > together an index with what's at hand. In particular, I'd >>>like it >>> > > to be practical to put together a simple index just using >>>Apache's >>> > > built-in directory indexes, as long as they use the right URL >>> > > hierarchy. That means that class or rel attributes should >>>only be >>> > > required for links that are requesting non-index pages to be >>>spidered. >>> > >>> > I would find parsing much easier with an XML parser than with >>> > regular expressions. >>> > I think it would be much more robust too. >>> >>>XHTML would be best, though I agree we shouldn't care about validity >>>so much as just well-formedness (which is required). I think it >>>should be possible to do it with valid XHTML, though, since whether >>>that's desired or not is a python.org policy concern. (Not that I >>>suspect we'll ever really care about that.) >>> >>>Of course, it should be possible to parse with htmllib and >>>HTMLParser as well. >> >>I still think requiring even HTML validity or well-formedness is >>YAGNI; one could indeed just pull all well-formed URLs from the >>page. EasyInstall uses this case-insensitive regular expression to >>find only href'd urls: >> >> href\s*=\s*['"]?([^'"> ]+) >> >>In the absence of a requirement for more information than this >>(perhaps coupled with a "rel" attribute in the same element), I'm >>wary of starting out by requiring even well-formedness, because >>it's way overkill for the requirements as I understand them. > >But I thought we *were* talking about adding rel or class tags so >that we >could determine information about the intended use of a URL. Yes -- but they're only needed to support following second-order external links: i.e., links to non-index HTML pages. >>One of the advantage of defining the URL layout as part of the API >>is that it gives you enough contextual information to decide what >>links should be followed, and which ones are purely informational. > >Perhaps someone should propose an API and we'll see. :) I thought I already did. :) Here it is again: baseURL/ should return a page containing href links to projects baseURL/projectname should return a page containing href links to version pages baseURL/projectname/version should return a page with download links (ideally with MD5 info) Links are found via href="" attributes URLs' trailing path components are used to identify distributions. This is a sufficient API to allow querying packages for downloading purposes, as long as all download links are found in the index's pages. Additional information is only needed to allow following external links to *other index pages*. Coincidentally, easy_install is already mostly compatible with such an API; it would mostly be a matter of *removing* things from easy_install, rather than adding them. >>Indeed, the only reason to look at anything *but* hrefs is to >>indicate that an *external* (i.e. non-index) link should be >>followed, to spider for other download links. So if following >>external links is out of scope for the API we want to define, then >>*any* information other than the URLs in an API page are YAGNI. > >Who said following external links is out of scope. Nobody; I was just saying that *if* it were out of scope, the class/rel stuff would become unnecessary. >>Now, all of this is based on my assumption that the use case here >>is somebody wants to throw together a rough-and-ready package index >>that tools should be able to use to find *downloadable >>distributions*. If you and Jim have much more elaborate use cases >>in mind, then of course some well-formedness might be useful. > >setuptools has a notion of an index. That notion is not at all well >defined. It's mostly operationally defined in terms of what PyPI did when it was written. >Currently, the index has linkes that are followed to find package >links elsewhere. >This seems reasonably useful. I dunno. I'm not sure I care. What I >do care >about is that the index API should be well defined so that we can >implement >alternate indexes and alternate tools to read indexes. I'm not >looking to >satisfy use cases beyond what we have now. Sure. I'm just saying we only need something beyond href="" links if they are intended to be followed by tools looking for package links. The reason this is necessary, is that it's not sufficient to just follow links that point outside the package index; PyPI has links on its pages that go to other parts of python.org, so there needs to be something that distinguishes "links that might help find downloads". Links that *are* downloads are detected via URL content. From jim at zope.com Fri Jul 7 22:45:59 2006 From: jim at zope.com (Jim Fulton) Date: Fri, 7 Jul 2006 16:45:59 -0400 Subject: [Catalog-sig] [Distutils] Specification for package indexes? In-Reply-To: <5.1.1.6.0.20060707160846.032ffc88@sparrow.telecommunity.com> References: <5.1.1.6.0.20060707142217.02d15bd0@sparrow.telecommunity.com> <5.1.1.6.0.20060706122834.02056f00@sparrow.telecommunity.com> <5.1.1.6.0.20060707120827.02d8f2b8@sparrow.telecommunity.com> <5.1.1.6.0.20060707142217.02d15bd0@sparrow.telecommunity.com> <5.1.1.6.0.20060707160846.032ffc88@sparrow.telecommunity.com> Message-ID: <9A0CD067-84D0-4BF0-A5D9-9306AFCDD42E@zope.com> On Jul 7, 2006, at 4:20 PM, Phillip J. Eby wrote: > At 02:52 PM 7/7/2006 -0400, Jim Fulton wrote: ... >> Perhaps someone should propose an API and we'll see. :) > > I thought I already did. :) Here it is again: > > baseURL/ should return a page containing href links to projects > baseURL/projectname should return a page containing href links to > version pages > baseURL/projectname/version should return a page with download > links (ideally with MD5 info) > Links are found via href="" attributes > URLs' trailing path components are used to identify distributions. Hm. I hadn't seen this before. Perhaps I'm missing some messages from this thread. By "download links", do you mean links to distributions? Or to links to pages containing links to distributions. Can the links to projects, links to version pages, or download links point off site? Can any of these pages contain other links? > This is a sufficient API to allow querying packages for downloading > purposes, as long as all download links are found in the index's > pages. Additional information is only needed to allow following > external links to *other index pages*. so, for example: http://www.python.org/pypi/ZODB3/3.6.0 Has a link to http://www.zope.org/Products/ZODB3.6. Is this a download link? Or an off-site index link. I'm having a little trouble following the jargon. >> setuptools has a notion of an index. That notion is not at all well >> defined. > > It's mostly operationally defined in terms of what PyPI did when it > was written. Right, not well defined. :) I'm not criticizing. What it does was great as a prototype, but it would be good move beyond this. >> Currently, the index has linkes that are followed to find package >> links elsewhere. >> This seems reasonably useful. I dunno. I'm not sure I care. What I >> do care >> about is that the index API should be well defined so that we can >> implement >> alternate indexes and alternate tools to read indexes. I'm not >> looking to >> satisfy use cases beyond what we have now. > > Sure. I'm just saying we only need something beyond href="" links > if they are intended to be followed by tools looking for package > links. > > The reason this is necessary, is that it's not sufficient to just > follow links that point outside the package index; PyPI has links > on its pages that go to other parts of python.org, so there needs > to be something that distinguishes "links that might help find > downloads". Links that *are* downloads are detected via URL content. Right. That's why I think the hrefs we care about should be marked with class attributes or some such. Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From pje at telecommunity.com Sat Jul 8 03:12:03 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 07 Jul 2006 21:12:03 -0400 Subject: [Catalog-sig] [Distutils] Specification for package indexes? In-Reply-To: <9A0CD067-84D0-4BF0-A5D9-9306AFCDD42E@zope.com> References: <5.1.1.6.0.20060707160846.032ffc88@sparrow.telecommunity.com> <5.1.1.6.0.20060707142217.02d15bd0@sparrow.telecommunity.com> <5.1.1.6.0.20060706122834.02056f00@sparrow.telecommunity.com> <5.1.1.6.0.20060707120827.02d8f2b8@sparrow.telecommunity.com> <5.1.1.6.0.20060707142217.02d15bd0@sparrow.telecommunity.com> <5.1.1.6.0.20060707160846.032ffc88@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060707175405.01edc3a8@sparrow.telecommunity.com> At 04:45 PM 7/7/2006 -0400, Jim Fulton wrote: >On Jul 7, 2006, at 4:20 PM, Phillip J. Eby wrote: > >>At 02:52 PM 7/7/2006 -0400, Jim Fulton wrote: >... >>>Perhaps someone should propose an API and we'll see. :) >> >>I thought I already did. :) Here it is again: >> >>baseURL/ should return a page containing href links to projects >>baseURL/projectname should return a page containing href links to >>version pages >>baseURL/projectname/version should return a page with download >>links (ideally with MD5 info) >>Links are found via href="" attributes >>URLs' trailing path components are used to identify distributions. > >Hm. I hadn't seen this before. Perhaps I'm missing some messages from >this thread. > >By "download links", do you mean links to distributions? Yes. >Or to links >to pages containing links to distributions. No, these would be either "index pages", or "external links" >Can the links to projects, links to version pages, or download links >point off site? Download links can be anywhere, since they are identified from the tail of the URL. The links to project or version pages are defined by the URL hierarchy of the API. >Can any of these pages contain other links? All of them can contain download links. Index pages can link to other index pages. Index pages linked to anything else are ignored, unless we allow "external links", in which case a method of identifying them is required. Currently, easy_install identifies only uses two kinds of external links: home page and "download URL". These are identified via HTML snippets that PyPI uses. This is one of only two pieces of "screen scraping" (as opposed to URL inspection and link detection) that easy_install has. (The other is used to distinguish between a page that lists links to projects, from an actual project page, as sometimes PyPI can display the former at a URL that is nominally for the latter.) >>This is a sufficient API to allow querying packages for downloading >>purposes, as long as all download links are found in the index's >>pages. Additional information is only needed to allow following >>external links to *other index pages*. > >so, for example: > > http://www.python.org/pypi/ZODB3/3.6.0 > >Has a link to http://www.zope.org/Products/ZODB3.6. >Is this a download link? Or an off-site index link. I'm having a >little trouble >following the jargon. It's an "external link", and thus only followed if it's seen to be the "home page" or "download URL" on a package version page. >>Sure. I'm just saying we only need something beyond href="" links >>if they are intended to be followed by tools looking for package >>links. >> >>The reason this is necessary, is that it's not sufficient to just >>follow links that point outside the package index; PyPI has links >>on its pages that go to other parts of python.org, so there needs >>to be something that distinguishes "links that might help find >>downloads". Links that *are* downloads are detected via URL content. > >Right. That's why I think the hrefs we care about should be marked >with class >attributes or some such. Yes, as long as we care about supporting the external links. I'm not certain we do, at least for the "third-party index" case. From richardjones at optusnet.com.au Sat Jul 8 10:24:50 2006 From: richardjones at optusnet.com.au (richardjones at optusnet.com.au) Date: Sat, 08 Jul 2006 18:24:50 +1000 Subject: [Catalog-sig] Ickiness Message-ID: <200607080824.k688OpFE026560@mail27.syd.optusnet.com.au> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://mail.python.org/pipermail/catalog-sig/attachments/20060708/31c2e915/attachment.pot From jim at zope.com Sat Jul 8 13:38:06 2006 From: jim at zope.com (Jim Fulton) Date: Sat, 8 Jul 2006 07:38:06 -0400 Subject: [Catalog-sig] [Distutils] Specification for package indexes? In-Reply-To: <5.1.1.6.0.20060707175405.01edc3a8@sparrow.telecommunity.com> References: <5.1.1.6.0.20060707160846.032ffc88@sparrow.telecommunity.com> <5.1.1.6.0.20060707142217.02d15bd0@sparrow.telecommunity.com> <5.1.1.6.0.20060706122834.02056f00@sparrow.telecommunity.com> <5.1.1.6.0.20060707120827.02d8f2b8@sparrow.telecommunity.com> <5.1.1.6.0.20060707142217.02d15bd0@sparrow.telecommunity.com> <5.1.1.6.0.20060707160846.032ffc88@sparrow.telecommunity.com> <5.1.1.6.0.20060707175405.01edc3a8@sparrow.telecommunity.com> Message-ID: On Jul 7, 2006, at 9:12 PM, Phillip J. Eby wrote: > At 04:45 PM 7/7/2006 -0400, Jim Fulton wrote: ... >> By "download links", do you mean links to distributions? > > Yes. > > >> Or to links >> to pages containing links to distributions. > > No, these would be either "index pages", or "external links" Which seems to be an important use case now. > >> Can the links to projects, links to version pages, or download links >> point off site? > > Download links can be anywhere, since they are identified from the > tail of the URL. The links to project or version pages are defined > by the URL hierarchy of the API. Hm. Why does it matter? I understand that you want to be able to go to index_url/project first, but I don't see that it matters where versions actually are. For that matter, I could see value in a minimal index that just pointed to external project pages. In which case, going to index_url/project might even be allowed to redirect to an offsite project page. Of course, this couldn't be implemented with a static server, but could still be a valuable option. > >> Can any of these pages contain other links? > > All of them can contain download links. Index pages can link to > other index pages. Index pages linked to anything else are > ignored, unless we allow "external links", in which case a method > of identifying them is required. I think we want external links. We have them now. In fact, I think there is value in a project index that has no distributions or even version information but provides a central place to find project pages. Note that, in a separate discussion, you pointed out that some considered it bad form to put interim project releases on pypi. If pypi could have links to remote project pages, then those sites could have different policies as needed by a project. > Currently, easy_install identifies only uses two kinds of external > links: home page and "download URL". These are identified via HTML > snippets that PyPI uses. This is one of only two pieces of "screen > scraping" (as opposed to URL inspection and link detection) that > easy_install has. (The other is used to distinguish between a page > that lists links to projects, from an actual project page, as > sometimes PyPI can display the former at a URL that is nominally > for the latter.) > >>> This is a sufficient API to allow querying packages for downloading >>> purposes, as long as all download links are found in the index's >>> pages. Additional information is only needed to allow following >>> external links to *other index pages*. >> >> so, for example: >> >> http://www.python.org/pypi/ZODB3/3.6.0 >> >> Has a link to http://www.zope.org/Products/ZODB3.6. >> Is this a download link? Or an off-site index link. I'm having a >> little trouble >> following the jargon. > > It's an "external link", and thus only followed if it's seen to be > the "home page" or "download URL" on a package version page. Right, which is currently identified by sniffing the surrounding HTML. > >>> Sure. I'm just saying we only need something beyond href="" links >>> if they are intended to be followed by tools looking for package >>> links. >>> >>> The reason this is necessary, is that it's not sufficient to just >>> follow links that point outside the package index; PyPI has links >>> on its pages that go to other parts of python.org, so there needs >>> to be something that distinguishes "links that might help find >>> downloads". Links that *are* downloads are detected via URL >>> content. >> >> Right. That's why I think the hrefs we care about should be marked >> with class >> attributes or some such. > > Yes, as long as we care about supporting the external links. I'm > not certain we do, at least for the "third-party index" case. I think we do. I'm pretty sure we do for pypi and I sure has heck don't want a different api for pypi and for other indexes. I'd really like to see a single index api. I would *like* to see the possibility of allowing off-site (off- index) projects, although I could live without this. I have to say again that all of these details can get quite confusing. Maybe I'm alone in being confused by this, but I don't think so. I've spent a lot of time on and off over the last few months trying to leverage setuptools and now pypi and while I've had a lot of success, it has been harder than I think it should be. I think that this is an impediment to greater adoption of and benefit from setuptools. I think we need to do a good job of documenting and explaining this API. I also think we need to write up some best practices or rational to guide people toward better use of setuptools and pypi together. I'm happy to help with this once we have agreement and once I understand what we agree to. :) Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From richardjones at optusnet.com.au Sat Jul 8 15:02:10 2006 From: richardjones at optusnet.com.au (richardjones at optusnet.com.au) Date: Sat, 08 Jul 2006 23:02:10 +1000 Subject: [Catalog-sig] safe_names again Message-ID: <200607081302.k68D2AaW011573@mail05.syd.optusnet.com.au> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://mail.python.org/pipermail/catalog-sig/attachments/20060708/ca90f3e2/attachment.pot From jim at zope.com Sat Jul 8 15:11:02 2006 From: jim at zope.com (Jim Fulton) Date: Sat, 8 Jul 2006 09:11:02 -0400 Subject: [Catalog-sig] safe_names again In-Reply-To: <200607081302.k68D2AaW011573@mail05.syd.optusnet.com.au> References: <200607081302.k68D2AaW011573@mail05.syd.optusnet.com.au> Message-ID: <9558562B-825A-45A4-89D3-6FDA90815A50@zope.com> On Jul 8, 2006, at 9:02 AM, richardjones at optusnet.com.au wrote: > [sorry for the terrible email quoting / formatting - I'm stuck in > webmail ATM. I'm also having trouble following the discussion PJE > and Jim are having - and that's a concern to me because some of the > stuff I'm reading worries me.] Don't feel bad, I have trouble following this too. :/ > I have created a branch for this and have begun the slow process of > working in the setuptools name mangling. It's going to take some > time since package names are used all over the place and form an > integral part of the database referential structure. IMO, no one should be writing code at this point. I think we should slow down, decide on and carefully document an API before we do any more coding. (Where "we" mostly refers to you and Phillip. :) Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From richardjones at optusnet.com.au Sat Jul 8 15:26:18 2006 From: richardjones at optusnet.com.au (richardjones at optusnet.com.au) Date: Sat, 08 Jul 2006 23:26:18 +1000 Subject: [Catalog-sig] safe_names again Message-ID: <200607081326.k68DQIV1024749@mail31.syd.optusnet.com.au> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://mail.python.org/pipermail/catalog-sig/attachments/20060708/bb64c71e/attachment.asc From jim at zope.com Sat Jul 8 16:08:17 2006 From: jim at zope.com (Jim Fulton) Date: Sat, 8 Jul 2006 10:08:17 -0400 Subject: [Catalog-sig] safe_names again In-Reply-To: <200607081326.k68DQIV1024749@mail31.syd.optusnet.com.au> References: <200607081326.k68DQIV1024749@mail31.syd.optusnet.com.au> Message-ID: <1D30D765-834F-4F15-AC3D-09B280B63C07@zope.com> On Jul 8, 2006, at 9:26 AM, richardjones at optusnet.com.au wrote: >> Jim Fulton wrote: >> Don't feel bad, I have trouble following this too. :/ > > Great, Johannes (who is next to me and in a similar state of mind) > and I feel much better now :) > > >>> I have created a branch for this and have begun the slow process of >>> working in the setuptools name mangling. It's going to take some >>> time since package names are used all over the place and form an >>> integral part of the database referential structure. >> >> IMO, no one should be writing code at this point. I think we should >> slow down, decide on and carefully document an API before we do any >> more coding. (Where "we" mostly refers to you and Phillip. :) > > My position on all this is that: > > 1. PyPI is the central index for metadata about python packages. I > am not aware of any other such indexes, nor am I aware of an > intention to create one. I would not be in favour of doing so, > since it would fragment the community. I am mostly concerned that we have a well-defined API, however, I think there is a potential place for other indexes. There are all sorts of reasons one might not want or be able to put software in PyPI. For example, an organization may have proprietary information that they can't put in PyPI. They should be able to run their own index. This would not be quire so important if setuptools provides an option to not look in an index or, if having found packages via find-links, it was willing to not look at an index. > 2. I'm not particularly interested in developing APIs for PyPI I'm gomma interpret this as meaning that you don't want to have to develop (define) an API yourself, but you would be happy if there was an API. I hope I'm right. > though I am happy to add in XML-RPC functions as needed since > they're trivial to write. So you are unconcerned by the use case for static indexes? Or static mirrors of indexes? Do you really want to prevent such a use case from being met? > 3. I would really like to see an emphasis on moving away from > scraping HTML since the HTML is changed for *cosmetic* reasons. If > need be we can produce "static" XML (like the DOAP) to go alongside > it which will not change the next time the pydotorg layout is changed. I think everyone agrees with you. I think screen scraping was a valuable expedient when setuptools was being prototyped, but it's time to move beyond that. Obviously, we have to agree what to do, but that means an API. > 4. Patches, and additional svn committers, are always welcome. I'd be willing to help out once we agree *clearly* what we're doing. That means, among other things, an API. Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From richardjones at optusnet.com.au Sat Jul 8 16:37:21 2006 From: richardjones at optusnet.com.au (richardjones at optusnet.com.au) Date: Sun, 09 Jul 2006 00:37:21 +1000 Subject: [Catalog-sig] safe_names again Message-ID: <200607081437.k68EbMEC016454@mail18.syd.optusnet.com.au> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://mail.python.org/pipermail/catalog-sig/attachments/20060709/224b061c/attachment.asc From pje at telecommunity.com Sat Jul 8 20:07:31 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat, 08 Jul 2006 14:07:31 -0400 Subject: [Catalog-sig] safe_names again In-Reply-To: <200607081302.k68D2AaW011573@mail05.syd.optusnet.com.au> Message-ID: <5.1.1.6.0.20060708135905.01ec4910@sparrow.telecommunity.com> At 11:02 PM 7/8/2006 +1000, richardjones at optusnet.com.au wrote: >[sorry for the terrible email quoting / formatting - I'm stuck in webmail >ATM. I'm also having trouble following the discussion PJE and Jim are >having - and that's a concern to me because some of the stuff I'm reading >worries me.] > >I have created a branch for this and have begun the slow process of >working in the setuptools name mangling. It's going to take some time >since package names are used all over the place and form an integral part >of the database referential structure. > >My "plan" for implementation of this is as follows: > >1. Convert meta-data supplied by end users using safe_name. We call this >"name" internally (replacing the current use of that column). The original >name is retained for display purposes, stored as "display_name" on the >packages table. > >2. All user input of names for filtering must be mangled before searching >is performed. > >3. Find all places where we display names and convert them to use the >"display_name" column. It's just a suggestion, but you might find it easier to phase in by simply: 1. Reject creating a package if its safe-name conflicts with another package 2. Provide search facilities that search on mangled name (which is used by #1 to verify a package that's about to be added. This seems like a more conservative route with a lot less effort involved if you want to ease into it. In the simplest case it requires no database schema changes, although if performance is an issue, adding a mangled name column for the searches would be useful, but not strictly necessary. More to the point, these two changes do not require wholesale refactoring of PyPI. Of course, if you feel that all searches in PyPI should work this way, then your plan makes more sense. I wasn't sure if you felt that way or not. On the other hand, by leaving "name" alone, and having a "safe_name" column for searching, you can make the change more gradual, if that's a concern. You could make safe_name be uniquely indexed, while leaving name as the primary key, and just trap the error when inserting a conflicting safe_name. Anyway, just some random thoughts to see if there's an easier way for you to do this. From pje at telecommunity.com Sat Jul 8 20:33:48 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat, 08 Jul 2006 14:33:48 -0400 Subject: [Catalog-sig] Ickiness In-Reply-To: <200607080824.k688OpFE026560@mail27.syd.optusnet.com.au> Message-ID: <5.1.1.6.0.20060708142629.01f321b0@sparrow.telecommunity.com> At 06:24 PM 7/8/2006 +1000, richardjones at optusnet.com.au wrote: >With the new display formatting implemented yesterday, there is now a > in the package view page so that setuptools >may scrape "
Download URL" etc. You can also do this: right before the link. EasyInstall is very stupid and doesn't actually parse any HTML. The same thing is true for the "'Index of Packages'" string, which just has to appear somewhere in the page if it's a multi-package list. While these strings obviously aren't as "clean" as using rel="" info in the links, they will work for now and allow you to do whatever you want with the visual appearance. The only other visual dependency is on MD5 information, which is extracted using this pattern: PYPI_MD5 = re.compile( '([^<]+)\n\s+\\(md5\\)' ) However, if this pattern doesn't match, then EasyInstall will simply proceed without MD5 verification. From dangoor at gmail.com Sun Jul 9 02:19:12 2006 From: dangoor at gmail.com (Kevin Dangoor) Date: Sat, 8 Jul 2006 20:19:12 -0400 Subject: [Catalog-sig] PyPI XML-RPC changes In-Reply-To: <200607061539.k66FdCV0007024@mail07.syd.optusnet.com.au> References: <200607061539.k66FdCV0007024@mail07.syd.optusnet.com.au> Message-ID: On Jul 6, 2006, at 11:39 AM, richardjones at optusnet.com.au wrote: > The XML-RPC interface is being reviewed at present, and some quite > reasonable changes are being suggested. These will alter the actual > method names, so I'd like to know whether anyone on this list is > already actually *using* the methods as posted to this list. If so, > then I'll leave aliases in the code. If not, I'll just do the rename. > > The changes, BTW, are: > > package_urls -> release_urls > package_data -> release_data > package_stable_version -> package_stable_release I have code that uses package_data. Kevin From richardjones at optusnet.com.au Sun Jul 9 09:51:07 2006 From: richardjones at optusnet.com.au (richardjones at optusnet.com.au) Date: Sun, 09 Jul 2006 17:51:07 +1000 Subject: [Catalog-sig] Ickiness Message-ID: <200607090751.k697pCvN001127@mail15.syd.optusnet.com.au> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://mail.python.org/pipermail/catalog-sig/attachments/20060709/5bc0e7f3/attachment.pot From richardjones at optusnet.com.au Sun Jul 9 10:01:34 2006 From: richardjones at optusnet.com.au (richardjones at optusnet.com.au) Date: Sun, 09 Jul 2006 18:01:34 +1000 Subject: [Catalog-sig] PyPI XML-RPC changes Message-ID: <200607090801.k6981YwX012215@mail31.syd.optusnet.com.au> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://mail.python.org/pipermail/catalog-sig/attachments/20060709/8f2a7a3b/attachment.asc From richardjones at optusnet.com.au Sun Jul 9 10:03:57 2006 From: richardjones at optusnet.com.au (richardjones at optusnet.com.au) Date: Sun, 09 Jul 2006 18:03:57 +1000 Subject: [Catalog-sig] Ickiness Message-ID: <200607090804.k69842gT027039@mail14.syd.optusnet.com.au> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://mail.python.org/pipermail/catalog-sig/attachments/20060709/ccb86b9e/attachment.pot From pje at telecommunity.com Tue Jul 11 03:09:57 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 10 Jul 2006 21:09:57 -0400 Subject: [Catalog-sig] "Package Index API" draft Message-ID: <5.1.1.6.0.20060710210032.03ac2e20@sparrow.telecommunity.com> This draft reflects the in-development versions of setuptools 0.7a1 and 0.6b4; it does not describe older setuptools versions except as noted under the "Backward Compatibility" section. The items described under "Backward Compatibility" need to be kept in PyPI until everyone in the field has upgraded to setuptools 0.6b4 or better. (Note that 0.6b4 is not released yet!) Package Index "API" ------------------- Custom package indexes (and PyPI) must follow the following rules for EasyInstall to be able to look up and download packages: 1. Except where stated otherwise, "pages" are HTML or XHTML, and "links" refer to ``href`` attributes. 2. Individual project version pages' URLs must be of the form ``base/projectname/version``, where ``base`` is the package index's base URL. 3. Omitting the ``/version`` part of a project page's URL (but keeping the trailing ``/``) should result in a page that is either: a) The single active version of that project, as though the version had been explicitly included, OR b) A page with links to all of the active version pages for that project. 4. Individual version pages should contain direct links to downloadable distributions where possible. It is explicitly permitted for a project's "long_description" to include URLs, and these should be formatted as HTML links by the package index, as EasyInstall does no special processing to identify what parts of a page are index-specific and which are part of the project's supplied description. 5. Where available, MD5 information should be added to download URLs by appending a fragment identifier of the form ``#md5=...``, where ``...`` is the 32-character hex MD5 digest. EasyInstall will verify that the downloaded file's MD5 digest matches the given value. 6. Individual project version pages should identify any "homepage" or "download" URLs using ``rel="homepage"`` and ``rel="download"`` attributes on the HTML elements linking to those URLs. Use of these attributes will cause EasyInstall to always follow the provided links, unless it can be determined by inspection that they are downloadable distributions. If the links are not to downloadable distributions, they are retrieved, and if they are HTML, they are scanned for download links. They are *not* scanned for additional "homepage" or "download" links, as these are only processed for pages that are part of a package index site. 7. The root URL of the index, if retrieved with a trailing ``/``, must result in a page containing links to *all* projects' active version pages. (Note: This requirement is a workaround for the absence of case- insensitive ``safe_name()`` matching of project names in URL paths. If project names are matched in this fashion (e.g. via the PyPI server, mod_rewrite, or a similar mechanism), then it is not necessary to include this all-packages listing page.) 8. If a package index is accessed via a ``file://`` URL, then EasyInstall will automatically use ``index.html`` files, if present, when trying to read a directory with a trailing ``/`` on the URL. Backward Compatibility ~~~~~~~~~~~~~~~~~~~~~~ Package indexes that wish to support setuptools versions prior to 0.6b4 should also follow these rules: * Homepage and download links must be preceded with ``"Home Page"`` or ``"Download URL"``, in addition to (or instead of) the ``rel=""`` attributes on the actual links. These marker strings do not need to be visible, or uncommented, however! For example, the following is a valid homepage link that will work with any version of setuptools::
  • Home Page: http://sqlobject.org
  • Even though the marker string is in an HTML comment, older versions of EasyInstall will still "see" it and know that the link that follows is the project's home page URL. * The pages described by paragraph 3(b) of the preceding section *must* contain the string ``"Index of Packages"`` somewhere in their text. This can be inside of an HTML comment, if desired, and it can be anywhere in the page. (Note: this string MUST NOT appear on normal project pages, as described in paragraphs 2 and 3(a)!) In addition, for compatibility with PyPI versions that do not use ``#md5=`` fragment IDs, EasyInstall uses the following regular expression to match PyPI's displayed MD5 info (broken onto two lines for readability):: ([^<]+)\n\s+\(md5\) From pje at telecommunity.com Tue Jul 11 21:12:16 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 11 Jul 2006 15:12:16 -0400 Subject: [Catalog-sig] setuptools 0.6b4 released Message-ID: <5.1.1.6.0.20060711145922.03ca5aa0@sparrow.telecommunity.com> I have just released version 0.6b4 of setuptools, the last beta release of setuptools 0.6. Please upgrade and test at your earliest convenience, as I would like to issue a release candidate version next week. Changes include numerous bug fixes and tweaks for corner cases in easy_install processing, mostly discovered by Jim Fulton in the process of developing his "zc.buildout" tool (a Make-like system that supports simple installation of complex environments). In addition, there is now a formally-documented "package index API" that easy_install supports: http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api So, people who wish to create their own "package indexes" for easy_install can do so, even using static HTML -- even without a web server. Last, but not least, the ability was added to turn off SVN revision numbers or dates from the command line, so that you don't have to edit setup.cfg in order to issue a release. There are a few outstanding requests that I was *not* able to consider for 0.6 because the required changes would've been too disruptive to stability. They have been bumped to the 0.7 list to receive further consideration after 0.6 final is released: * A request to change the PYTHONPATH resolution order * A request to allow installed scripts to use -O * Adding XML-RPC support for PyPI From bob at redivi.com Tue Jul 11 21:33:16 2006 From: bob at redivi.com (Bob Ippolito) Date: Tue, 11 Jul 2006 12:33:16 -0700 Subject: [Catalog-sig] [Distutils] setuptools 0.6b4 released In-Reply-To: <5.1.1.6.0.20060711145922.03ca5aa0@sparrow.telecommunity.com> References: <5.1.1.6.0.20060711145922.03ca5aa0@sparrow.telecommunity.com> Message-ID: On Jul 11, 2006, at 12:12 PM, Phillip J. Eby wrote: > Last, but not least, the ability was added to turn off SVN revision > numbers > or dates from the command line, so that you don't have to edit > setup.cfg in > order to issue a release. Would've been convenient if you said what the option was... took me a few minutes to figure out where to look. Presumably you're referring to the --no-svn-revision option? Options for 'egg_info' command: --egg-base (-e) directory containing .egg-info directories (default: top of the source tree) --tag-svn-revision (-r) Add subversion revision ID to version number --tag-date (-d) Add date stamp (e.g. 20050528) to version number --tag-build (-b) Specify explicit tag to add to version number --no-svn-revision (-R) Don't add subversion revision ID [default] --no-date (-D) Don't include date stamp [default] --tag-build (-b) Specify explicit tag to add to version number Note that tag-build shows up twice, here's a patch: Index: setuptools/command/egg_info.py =================================================================== --- setuptools/command/egg_info.py (revision 50587) +++ setuptools/command/egg_info.py (working copy) @@ -28,7 +28,6 @@ ('no-svn-revision', 'R', "Don't add subversion revision ID [default]"), ('no-date', 'D', "Don't include date stamp [default]"), - ('tag-build=', 'b', "Specify explicit tag to add to version number"), ] boolean_options = ['tag-date', 'tag-svn-revision'] -bob From pje at telecommunity.com Tue Jul 11 21:53:01 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 11 Jul 2006 15:53:01 -0400 Subject: [Catalog-sig] [Distutils] setuptools 0.6b4 released In-Reply-To: References: <5.1.1.6.0.20060711145922.03ca5aa0@sparrow.telecommunity.com> <5.1.1.6.0.20060711145922.03ca5aa0@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060711154900.03a9ff88@sparrow.telecommunity.com> At 12:33 PM 7/11/2006 -0700, Bob Ippolito wrote: >On Jul 11, 2006, at 12:12 PM, Phillip J. Eby wrote: >>Last, but not least, the ability was added to turn off SVN revision >>numbers >>or dates from the command line, so that you don't have to edit >>setup.cfg in >>order to issue a release. > >Would've been convenient if you said what the option was... took me a >few minutes to figure out where to look. Presumably you're referring >to the --no-svn-revision option? Yep. See also: http://peak.telecommunity.com/DevCenter/setuptools#managing-continuous-releases-using-subversion In particular, the new subsection: http://peak.telecommunity.com/DevCenter/setuptools#making-official-non-snapshot-releases Quick tip: you can use: http://peak.telecommunity.com/DevCenter/setuptools?action=diff to check on the latest changes to setuptools' doc after a release. The wiki also allows you to subscribe to receive change notices via email. This also goes for the EasyInstall and PkgResources doc pages. >Note that tag-build shows up twice, here's a patch: Argh. :( Thanks for catching that. I've fixed it in the trunk and 0.6 branch. From bob at redivi.com Thu Jul 13 06:00:06 2006 From: bob at redivi.com (Bob Ippolito) Date: Wed, 12 Jul 2006 21:00:06 -0700 Subject: [Catalog-sig] [Distutils] setuptools 0.6b4 released In-Reply-To: <5.1.1.6.0.20060711145922.03ca5aa0@sparrow.telecommunity.com> References: <5.1.1.6.0.20060711145922.03ca5aa0@sparrow.telecommunity.com> Message-ID: <8D2232FF-D626-47F6-97AF-0CD1515FB59F@redivi.com> On Jul 11, 2006, at 12:12 PM, Phillip J. Eby wrote: > I have just released version 0.6b4 of setuptools, the last beta > release of > setuptools 0.6. Please upgrade and test at your earliest > convenience, as I > would like to issue a release candidate version next week. It seems there's another bug in 0.6b4. I don't know what conditions cause it, however: File "/Volumes/Data/developer-external/py2app/setuptools-0.6b4- py2.4.egg/setuptools/dist.py", line 274, in fetch_build_egg AttributeError: Distribution instance has no attribute 'dependency_links' It looks like there's no default for dependency_links in some cases.. so that needs to change to a getattr, or it needs to set some default. -bob From bob at redivi.com Thu Jul 13 06:09:20 2006 From: bob at redivi.com (Bob Ippolito) Date: Wed, 12 Jul 2006 21:09:20 -0700 Subject: [Catalog-sig] [Distutils] setuptools 0.6b4 released In-Reply-To: <8D2232FF-D626-47F6-97AF-0CD1515FB59F@redivi.com> References: <5.1.1.6.0.20060711145922.03ca5aa0@sparrow.telecommunity.com> <8D2232FF-D626-47F6-97AF-0CD1515FB59F@redivi.com> Message-ID: <1C91F591-0072-4E7C-9E5B-F1DC0F79FA3D@redivi.com> On Jul 12, 2006, at 9:00 PM, Bob Ippolito wrote: > > On Jul 11, 2006, at 12:12 PM, Phillip J. Eby wrote: > >> I have just released version 0.6b4 of setuptools, the last beta >> release of >> setuptools 0.6. Please upgrade and test at your earliest >> convenience, as I >> would like to issue a release candidate version next week. > > It seems there's another bug in 0.6b4. I don't know what conditions > cause it, however: > > File "/Volumes/Data/developer-external/py2app/setuptools-0.6b4- > py2.4.egg/setuptools/dist.py", line 274, in fetch_build_egg > AttributeError: Distribution instance has no attribute > 'dependency_links' > > It looks like there's no default for dependency_links in some cases.. > so that needs to change to a getattr, or it needs to set some default. Ok, it seems this happens whenever you do develop, install, etc. when all dependencies aren't available. easy_install seems to work though. This patch seems to make everything work as expected: =================================================================== --- setuptools/dist.py (revision 50587) +++ setuptools/dist.py (working copy) @@ -271,7 +271,7 @@ for key in opts.keys(): if key not in keep: del opts[key] # don't use any other settings - if self.dependency_links: + if getattr(self, 'dependency_links', None): links = self.dependency_links[:] if 'find_links' in opts: links = opts['find_links'][1].split() + links -bob From bob at redivi.com Thu Jul 13 09:43:32 2006 From: bob at redivi.com (Bob Ippolito) Date: Thu, 13 Jul 2006 00:43:32 -0700 Subject: [Catalog-sig] [Distutils] setuptools 0.6b4 released In-Reply-To: <5.1.1.6.0.20060711145922.03ca5aa0@sparrow.telecommunity.com> References: <5.1.1.6.0.20060711145922.03ca5aa0@sparrow.telecommunity.com> Message-ID: <2E9807AF-1C61-4E49-88A9-DC32CD9D0336@redivi.com> On Jul 11, 2006, at 12:12 PM, Phillip J. Eby wrote: > I have just released version 0.6b4 of setuptools, the last beta > release of > setuptools 0.6. Please upgrade and test at your earliest > convenience, as I > would like to issue a release candidate version next week. Here's another patch I'd like to see ASAP. Some vendors have Q&A procedures that disallow empty files without special exception, so we should ensure that setuptools doesn't create empty safety flag files. Index: setuptools/command/bdist_egg.py =================================================================== --- setuptools/command/bdist_egg.py (revision 50611) +++ setuptools/command/bdist_egg.py (working copy) @@ -360,7 +360,9 @@ if safe is None or bool(safe)<>flag: os.unlink(fn) elif safe is not None and bool(safe)==flag: - open(fn,'w').close() + f = open(fn,'w') + f.write(fn) + f.close() safety_flags = { True: 'zip-safe', -bob From jim at zope.com Thu Jul 13 21:57:45 2006 From: jim at zope.com (Jim Fulton) Date: Thu, 13 Jul 2006 15:57:45 -0400 Subject: [Catalog-sig] [Distutils] setuptools 0.6b4 released In-Reply-To: <5.1.1.6.0.20060711145922.03ca5aa0@sparrow.telecommunity.com> References: <5.1.1.6.0.20060711145922.03ca5aa0@sparrow.telecommunity.com> Message-ID: On Jul 11, 2006, at 3:12 PM, Phillip J. Eby wrote: > I have just released version 0.6b4 of setuptools, the last beta > release of > setuptools 0.6. Please upgrade and test at your earliest > convenience, as I > would like to issue a release candidate version next week. With the new version, I'm still having the problem that the --no-deps option is ignored by setup.py develop. Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From richardjones at optushome.com.au Sat Jul 15 06:42:46 2006 From: richardjones at optushome.com.au (Richard Jones) Date: Sat, 15 Jul 2006 14:42:46 +1000 Subject: [Catalog-sig] Cheese Shop Tutorial Message-ID: <200607151442.46661.richardjones@optushome.com.au> I've added a wiki page with a basic tutorial: http://wiki.python.org/moin/CheeseShopTutorial This is linked from the Cheese Shop sidebar. Improvements welcome :) Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/catalog-sig/attachments/20060715/7eb3c60d/attachment.htm From bob at redivi.com Mon Jul 17 16:17:18 2006 From: bob at redivi.com (Bob Ippolito) Date: Mon, 17 Jul 2006 07:17:18 -0700 Subject: [Catalog-sig] [Distutils] setuptools 0.6b4 released In-Reply-To: References: <5.1.1.6.0.20060711145922.03ca5aa0@sparrow.telecommunity.com> Message-ID: <32A53630-2832-4FD0-8752-5E15B4F4A4FE@redivi.com> On Jul 11, 2006, at 12:33 PM, Bob Ippolito wrote: > > On Jul 11, 2006, at 12:12 PM, Phillip J. Eby wrote: > >> Last, but not least, the ability was added to turn off SVN revision >> numbers >> or dates from the command line, so that you don't have to edit >> setup.cfg in >> order to issue a release. > > Would've been convenient if you said what the option was... took me a > few minutes to figure out where to look. Presumably you're referring > to the --no-svn-revision option? It seems that --no-svn-revision is only part of the solution to making releases less of a hassle. The problem is that setup.cfg is still included in sdist, so any user that builds your egg from source is going to have it tagged with ".dev_r0". This is especially problematic for dependencies because it may download the correct version of source, but ".dev_r0" ranks lower so it will say the dependency is not satisfied. What's the best solution to this, barring deleting setup.cfg from the release branch? -bob From pje at telecommunity.com Mon Jul 17 23:35:32 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 17 Jul 2006 17:35:32 -0400 Subject: [Catalog-sig] [Distutils] setuptools 0.6b4 released In-Reply-To: <32A53630-2832-4FD0-8752-5E15B4F4A4FE@redivi.com> References: <5.1.1.6.0.20060711145922.03ca5aa0@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060717173328.03393048@sparrow.telecommunity.com> At 07:17 AM 7/17/2006 -0700, Bob Ippolito wrote: >It seems that --no-svn-revision is only part of the solution to >making releases less of a hassle. The problem is that setup.cfg is >still included in sdist, so any user that builds your egg from source >is going to have it tagged with ".dev_r0". This is especially >problematic for dependencies because it may download the correct >version of source, but ".dev_r0" ranks lower so it will say the >dependency is not satisfied. > >What's the best solution to this, barring deleting setup.cfg from the >release branch? Argh. Well, you could put an exclude in MANIFEST.in to drop the setup.cfg, but that sucks. Maybe the right thing to do is to include the setup.cfg, but to update it with the tag and version settings that were in effect when the sdist was built. From pje at telecommunity.com Tue Jul 18 01:10:09 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 17 Jul 2006 19:10:09 -0400 Subject: [Catalog-sig] [Distutils] setuptools 0.6b4 released In-Reply-To: <05D15BCF-8880-4DA1-99E0-52FE69EA1DB2@redivi.com> References: <5.1.1.6.0.20060717173328.03393048@sparrow.telecommunity.com> <5.1.1.6.0.20060711145922.03ca5aa0@sparrow.telecommunity.com> <5.1.1.6.0.20060717173328.03393048@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060717190924.02447288@sparrow.telecommunity.com> At 02:49 PM 7/17/2006 -0700, Bob Ippolito wrote: >On Jul 17, 2006, at 2:35 PM, Phillip J. Eby wrote: >>Argh. Well, you could put an exclude in MANIFEST.in to drop the >>setup.cfg, but that sucks. Maybe the right thing to do is to >>include the setup.cfg, but to update it with the tag and version >>settings that were in effect when the sdist was built. > >That's not a bad idea (update setup.cfg on sdist w/ --no-svn- revision). >Any chance of getting this in setuptools 0.6 or should I >start adding MANIFEST.in files to the relevant projects? That depends on how big the change ends up being. If it looks simple and unlikely to fail, I could live with adding it to 0.6. From bob at redivi.com Mon Jul 17 23:49:31 2006 From: bob at redivi.com (Bob Ippolito) Date: Mon, 17 Jul 2006 14:49:31 -0700 Subject: [Catalog-sig] [Distutils] setuptools 0.6b4 released In-Reply-To: <5.1.1.6.0.20060717173328.03393048@sparrow.telecommunity.com> References: <5.1.1.6.0.20060711145922.03ca5aa0@sparrow.telecommunity.com> <5.1.1.6.0.20060717173328.03393048@sparrow.telecommunity.com> Message-ID: <05D15BCF-8880-4DA1-99E0-52FE69EA1DB2@redivi.com> On Jul 17, 2006, at 2:35 PM, Phillip J. Eby wrote: > At 07:17 AM 7/17/2006 -0700, Bob Ippolito wrote: >> It seems that --no-svn-revision is only part of the solution to >> making releases less of a hassle. The problem is that setup.cfg is >> still included in sdist, so any user that builds your egg from source >> is going to have it tagged with ".dev_r0". This is especially >> problematic for dependencies because it may download the correct >> version of source, but ".dev_r0" ranks lower so it will say the >> dependency is not satisfied. >> >> What's the best solution to this, barring deleting setup.cfg from the >> release branch? > > Argh. Well, you could put an exclude in MANIFEST.in to drop the > setup.cfg, but that sucks. Maybe the right thing to do is to > include the setup.cfg, but to update it with the tag and version > settings that were in effect when the sdist was built. That's not a bad idea (update setup.cfg on sdist w/ --no-svn- revision). Any chance of getting this in setuptools 0.6 or should I start adding MANIFEST.in files to the relevant projects? -bob From richardjones at optushome.com.au Tue Jul 18 10:55:56 2006 From: richardjones at optushome.com.au (Richard Jones) Date: Tue, 18 Jul 2006 18:55:56 +1000 Subject: [Catalog-sig] "Package Index API" draft In-Reply-To: <5.1.1.6.0.20060710210032.03ac2e20@sparrow.telecommunity.com> References: <5.1.1.6.0.20060710210032.03ac2e20@sparrow.telecommunity.com> Message-ID: <200607181855.56357.richardjones@optushome.com.au> On Tuesday 11 July 2006 11:09, Phillip J. Eby wrote: > This draft reflects the in-development versions of setuptools 0.7a1 and > 0.6b4; it does not describe older setuptools versions except as noted under > the "Backward Compatibility" section. The items described under "Backward > Compatibility" need to be kept in PyPI until everyone in the field has > upgraded to setuptools 0.6b4 or better. (Note that 0.6b4 is not released > yet!) This looks good - wanna add it to the wiki with a link on the CheeseShopDev page please? Richard From pje at telecommunity.com Tue Jul 18 17:53:46 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 18 Jul 2006 11:53:46 -0400 Subject: [Catalog-sig] "Package Index API" draft In-Reply-To: <200607181855.56357.richardjones@optushome.com.au> References: <5.1.1.6.0.20060710210032.03ac2e20@sparrow.telecommunity.com> <5.1.1.6.0.20060710210032.03ac2e20@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060718115157.0204be78@sparrow.telecommunity.com> At 06:55 PM 7/18/2006 +1000, Richard Jones wrote: >On Tuesday 11 July 2006 11:09, Phillip J. Eby wrote: > > This draft reflects the in-development versions of setuptools 0.7a1 and > > 0.6b4; it does not describe older setuptools versions except as noted under > > the "Backward Compatibility" section. The items described under "Backward > > Compatibility" need to be kept in PyPI until everyone in the field has > > upgraded to setuptools 0.6b4 or better. (Note that 0.6b4 is not released > > yet!) > >This looks good - wanna add it to the wiki with a link on the CheeseShopDev >page please? I added a link to: http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api in the "Developing the Cheese Shop" section of the CheeseShopDev page. The above link is the home of the HTML version of the doc for the currently-released stable version of setuptools (0.6b4 right now). From pje at telecommunity.com Tue Jul 18 18:59:11 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 18 Jul 2006 12:59:11 -0400 Subject: [Catalog-sig] [Distutils] setuptools 0.6b4 released In-Reply-To: <05D15BCF-8880-4DA1-99E0-52FE69EA1DB2@redivi.com> References: <5.1.1.6.0.20060717173328.03393048@sparrow.telecommunity.com> <5.1.1.6.0.20060711145922.03ca5aa0@sparrow.telecommunity.com> <5.1.1.6.0.20060717173328.03393048@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060718125558.0203e0c8@sparrow.telecommunity.com> At 02:49 PM 7/17/2006 -0700, Bob Ippolito wrote: >That's not a bad idea (update setup.cfg on sdist w/ --no-svn- revision). >Any chance of getting this in setuptools 0.6 or should I >start adding MANIFEST.in files to the relevant projects? Okay, it's in the trunk now as of 0.7a1dev-r50702 and 0.6c1dev-r50703. It even handles date and SVN revision tags correctly, by converting them to a single --tag-build string and disabling the other tagging options. So if you just build from an sdist without doing anything special, you get the exact same version the sdist was built with, regardless of how the version was originally specified. From pje at telecommunity.com Thu Jul 20 04:00:03 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 19 Jul 2006 22:00:03 -0400 Subject: [Catalog-sig] Recent UI changes on PyPI Message-ID: <5.1.1.6.0.20060719214938.02622318@sparrow.telecommunity.com> No, don't worry, this isn't about problems with EasyInstall. :) I just noticed that most of the package metadata has been moved to the bottom, below the long description. However, this makes it harder to tell at a glance who's responsible for the package, what license it has, etc., if the long description is more than a paragraph or two. Most of my packages include a slice of their release notes in the long description, usually bumping the total page size such that the other metadata no longer appears on the first screenful. I don't think this is a good thing, as it seems to create an impression that the package's description is being provided by the Cheese Shop. That is, that the Cheese Shop is somehow *responsible* for the package. Putting the author, home page, and so on at the top previously provided a strong hint that the description was just part of a bunch of data supplied by the package author, and not an article or review being written by the maintainers of the CheeseShop itself. As you might guess from this, I think the change is bad and at least the metadata should be put back to the top, although I do not know what the original reason for making this change was. I don't think it makes a difference whether the files go at the top or the bottom, but the metadata *really* belongs up-top, and the description fields should probably be prefixed with something like "Package Description" to help hint that this is author-supplied info. From pje at telecommunity.com Thu Jul 20 04:40:23 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 19 Jul 2006 22:40:23 -0400 Subject: [Catalog-sig] FYI: UI change also broke EasyInstall MD5 checks :( Message-ID: <5.1.1.6.0.20060719223347.02027348@sparrow.telecommunity.com> This post *is* about EasyInstall breakage in the new UI change, I'm afraid, albeit a relatively minor one. Per the API description here: http://peak.telecommunity.com/DevCenter/EasyInstall#backward-compatibility EasyInstall expects a PyPI-listed package's MD5 to be found in parentheses, not square brackets. Whoever changed this, also didn't implement the non-visual way to indicate MD5s (using "#md5=" link fragments, as described in paragraph 5 of the API doc), so deployed versions of EasyInstall can no longer check md5's for packages downloaded from PyPI. EasyInstall only screenscrapes MD5's because PyPI doesn't include them in its download URLs; if they are included in the URLs, then the backward-compatibility regex can be removed, and it will work properly with deployed versions of EasyInstall. From richardjones at optusnet.com.au Thu Jul 20 05:45:09 2006 From: richardjones at optusnet.com.au (richardjones at optusnet.com.au) Date: Thu, 20 Jul 2006 13:45:09 +1000 Subject: [Catalog-sig] FYI: UI change also broke EasyInstall MD5 checks :( Message-ID: <200607200345.k6K3j9Ws000488@mail25.syd.optusnet.com.au> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://mail.python.org/pipermail/catalog-sig/attachments/20060720/8bcab24e/attachment.pot From richardjones at optusnet.com.au Thu Jul 20 05:47:09 2006 From: richardjones at optusnet.com.au (richardjones at optusnet.com.au) Date: Thu, 20 Jul 2006 13:47:09 +1000 Subject: [Catalog-sig] Recent UI changes on PyPI Message-ID: <200607200347.k6K3l96P027850@mail26.syd.optusnet.com.au> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://mail.python.org/pipermail/catalog-sig/attachments/20060720/804517e9/attachment.asc From pje at telecommunity.com Thu Jul 20 05:54:52 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 19 Jul 2006 23:54:52 -0400 Subject: [Catalog-sig] FYI: UI change also broke EasyInstall MD5 checks :( In-Reply-To: <200607200345.k6K3j9Ws000488@mail25.syd.optusnet.com.au> Message-ID: <5.1.1.6.0.20060719235028.02622318@sparrow.telecommunity.com> At 01:45 PM 7/20/2006 +1000, richardjones at optusnet.com.au wrote: > > Phillip J. Eby wrote: > > EasyInstall expects a PyPI-listed package's MD5 to be found in parentheses, > > not square brackets. > >Crap, sorry, I thought I'd caught all the changes, but I missed this one. >I'll fix it ASAP. Note that if you add the #md5= anchor described in paragraph 5 of the API spec, neither of us will need to maintain this bit of backward-compatibility hackery any more. I'd love to have one more bit of screen-scraping I can remove from 0.6c1. I know you've mentioned before that you're not sure what browsers will do with it, but the only way to find that out for sure is to give it a try. ;-) Seriously, if browsers had a problem with downloads that have trailing junk on their URLs, then a whole lot of SourceForge projects, porn sites, and other dynamic downloading applications would be quite seriously out of luck. :) From pje at telecommunity.com Thu Jul 20 06:35:05 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 20 Jul 2006 00:35:05 -0400 Subject: [Catalog-sig] FYI: UI change also broke EasyInstall MD5 checks :( In-Reply-To: <200607200345.k6K3j9Ws000488@mail25.syd.optusnet.com.au> Message-ID: <5.1.1.6.0.20060720003006.02622318@sparrow.telecommunity.com> At 01:45 PM 7/20/2006 +1000, richardjones at optusnet.com.au wrote: > > Phillip J. Eby wrote: > > EasyInstall expects a PyPI-listed package's MD5 to be found in parentheses, > > not square brackets. > >Crap, sorry, I thought I'd caught all the changes, but I missed this one. >I'll fix it ASAP. FYI, the fix doesn't work either, as there's still a title="" attribute in there. Please see the regex in the API doc. (Yes, it's a terribly crufty regex; it was always intended as merely a backward-compatibility stopgap until PyPI started supporting #md5 anchors.) From constant.beta at gmail.com Thu Jul 20 12:19:58 2006 From: constant.beta at gmail.com (=?ISO-8859-2?Q?Micha=B3_Kwiatkowski?=) Date: Thu, 20 Jul 2006 12:19:58 +0200 Subject: [Catalog-sig] Recent UI changes on PyPI In-Reply-To: <200607200347.k6K3l96P027850@mail26.syd.optusnet.com.au> References: <200607200347.k6K3l96P027850@mail26.syd.optusnet.com.au> Message-ID: <5e8b0f6b0607200319s48690cafpaa324d19d56f8a6c@mail.gmail.com> On 7/20/06, richardjones at optusnet.com.au wrote: > Yes, with longer descriptions the "content" of the page can > be lost down the bottom. What do other people think? I'm > leaning towards changing it back. I agree with Phillip, metadata should be on the top. Cheers, mk -- . o . >> http://joker.linuxstuff.pl << . . o It's easier to get forgiveness for being wrong o o o than forgiveness for being right. From richardjones at optusnet.com.au Fri Jul 21 02:16:19 2006 From: richardjones at optusnet.com.au (Richard Jones) Date: Fri, 21 Jul 2006 10:16:19 +1000 Subject: [Catalog-sig] FYI: UI change also broke EasyInstall MD5 checks :( In-Reply-To: <5.1.1.6.0.20060719235028.02622318@sparrow.telecommunity.com> References: <5.1.1.6.0.20060719235028.02622318@sparrow.telecommunity.com> Message-ID: <200607211016.19235.richardjones@optusnet.com.au> On Thursday 20 July 2006 13:54, Phillip J. Eby wrote: > Seriously, if browsers had a problem with downloads that have trailing junk > on their URLs, then a whole lot of SourceForge projects, porn sites, and > other dynamic downloading applications would be quite seriously out of > luck. :) Is there really prior art for this? Sourceforge doesn't do this, AFAIK. Richard From richardjones at optusnet.com.au Fri Jul 21 02:19:13 2006 From: richardjones at optusnet.com.au (Richard Jones) Date: Fri, 21 Jul 2006 10:19:13 +1000 Subject: [Catalog-sig] FYI: UI change also broke EasyInstall MD5 checks :( In-Reply-To: <5.1.1.6.0.20060720003006.02622318@sparrow.telecommunity.com> References: <5.1.1.6.0.20060720003006.02622318@sparrow.telecommunity.com> Message-ID: <200607211019.13422.richardjones@optusnet.com.au> On Thursday 20 July 2006 14:35, Phillip J. Eby wrote: > At 01:45 PM 7/20/2006 +1000, richardjones at optusnet.com.au wrote: > > > Phillip J. Eby wrote: > > > EasyInstall expects a PyPI-listed package's MD5 to be found in > > > parentheses, not square brackets. > > > >Crap, sorry, I thought I'd caught all the changes, but I missed this one. > >I'll fix it ASAP. > > FYI, the fix doesn't work either, as there's still a title="" attribute in > there. Please see the regex in the API doc. OK, I've removed the title attrs. Are we convinced that screen-scraping is bad yet? Richard From pje at telecommunity.com Fri Jul 21 05:15:53 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 20 Jul 2006 23:15:53 -0400 Subject: [Catalog-sig] FYI: UI change also broke EasyInstall MD5 checks :( In-Reply-To: <200607211016.19235.richardjones@optusnet.com.au> References: <5.1.1.6.0.20060719235028.02622318@sparrow.telecommunity.com> <5.1.1.6.0.20060719235028.02622318@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060720231014.026223f0@sparrow.telecommunity.com> At 10:16 AM 7/21/2006 +1000, Richard Jones wrote: >On Thursday 20 July 2006 13:54, Phillip J. Eby wrote: > > Seriously, if browsers had a problem with downloads that have trailing junk > > on their URLs, then a whole lot of SourceForge projects, porn sites, and > > other dynamic downloading applications would be quite seriously out of > > luck. :) > >Is there really prior art for this? Sourceforge doesn't do this, AFAIK. I'm referring to query strings, actually, with respect to prior art. A few minutes experimentation shows that the following browsers work just fine with #md5 links: * Mozilla 1.7 * Firefox 1.0 * Internet Explorer 6 * Lynx 2.8 * Opera 8.5.4 I think that's a pretty good indication that most web browsers know how to handle URI fragment identifiers in compliance with the RFCs. :) I would be surprised if there are any other browsers in substantial use on PyPI, with the exception of Safari. From pje at telecommunity.com Fri Jul 21 05:27:40 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 20 Jul 2006 23:27:40 -0400 Subject: [Catalog-sig] FYI: UI change also broke EasyInstall MD5 checks :( In-Reply-To: <200607211019.13422.richardjones@optusnet.com.au> References: <5.1.1.6.0.20060720003006.02622318@sparrow.telecommunity.com> <5.1.1.6.0.20060720003006.02622318@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20060720231559.03e77ea0@sparrow.telecommunity.com> At 10:19 AM 7/21/2006 +1000, Richard Jones wrote: >On Thursday 20 July 2006 14:35, Phillip J. Eby wrote: > > At 01:45 PM 7/20/2006 +1000, richardjones at optusnet.com.au wrote: > > > > Phillip J. Eby wrote: > > > > EasyInstall expects a PyPI-listed package's MD5 to be found in > > > > parentheses, not square brackets. > > > > > >Crap, sorry, I thought I'd caught all the changes, but I missed this one. > > >I'll fix it ASAP. > > > > FYI, the fix doesn't work either, as there's still a title="" attribute in > > there. Please see the regex in the API doc. > >OK, I've removed the title attrs. > >Are we convinced that screen-scraping is bad yet? I never argued that this particular bit was *good*. As I said, it was always a stopgap measure until PyPI supported a better way to retrieve the MD5's, preferably as part of the URLs. I don't recall that you've never proposed any alternative, and I seem to recall that not even the XML-RPC API offers this information. Please note that I proposed the use of #md5 links well before I implemented the screen scraping, and asked for feedback. Your only feedback was to inquire what happened in browsers, to which I replied -- and that was the end of the thread. What else was I supposed to do, beat you about the head and shoulders? At the time, you said you were busy. However, since you now seem to have some time, maybe it would be a good time to implement some of the things I suggested you implement a year ago so that screen scraping wouldn't be necessary. ;) From strawman at astraw.com Fri Jul 21 20:20:23 2006 From: strawman at astraw.com (Andrew Straw) Date: Fri, 21 Jul 2006 11:20:23 -0700 Subject: [Catalog-sig] [Distutils] setuptools 0.6b4 released In-Reply-To: <5.1.1.6.0.20060718125558.0203e0c8@sparrow.telecommunity.com> References: <5.1.1.6.0.20060717173328.03393048@sparrow.telecommunity.com> <5.1.1.6.0.20060711145922.03ca5aa0@sparrow.telecommunity.com> <5.1.1.6.0.20060717173328.03393048@sparrow.telecommunity.com> <5.1.1.6.0.20060718125558.0203e0c8@sparrow.telecommunity.com> Message-ID: <44C11AE7.3030708@astraw.com> Phillip J. Eby wrote: > At 02:49 PM 7/17/2006 -0700, Bob Ippolito wrote: > >> That's not a bad idea (update setup.cfg on sdist w/ --no-svn- revision). >> Any chance of getting this in setuptools 0.6 or should I >> start adding MANIFEST.in files to the relevant projects? >> > > Okay, it's in the trunk now as of 0.7a1dev-r50702 and 0.6c1dev-r50703. It > even handles date and SVN revision tags correctly, by converting them to a > single --tag-build string and disabling the other tagging options. So if > you just build from an sdist without doing anything special, you get the > exact same version the sdist was built with, regardless of how the version > was originally specified. > I'm glad this is being worked on. But a related issue is still biting me with setuptools 0.6c1 in my stdeb package (which builds debian source packages from unmodified setup.py scripts) : Any distutils commands using "self.distribution.get_version()" still get tagged (at least with the svn revision), even if they're being built from the sdist-generated .tar.gz package. Not knowing the innards of setuptools very well, one idea would be to add something to the .egg-info built by sdist that tells future runs of setuptools not to add tags. This keeps setup.cfg from getting modified but still has the right effect. There's probably a flaw I haven't thought of, though... Cheers! Andrew From richardjones at optusnet.com.au Sat Jul 22 06:36:05 2006 From: richardjones at optusnet.com.au (Richard Jones) Date: Sat, 22 Jul 2006 14:36:05 +1000 Subject: [Catalog-sig] FYI: UI change also broke EasyInstall MD5 checks :( In-Reply-To: <5.1.1.6.0.20060720231014.026223f0@sparrow.telecommunity.com> References: <5.1.1.6.0.20060719235028.02622318@sparrow.telecommunity.com> <5.1.1.6.0.20060720231014.026223f0@sparrow.telecommunity.com> Message-ID: <200607221436.05554.richardjones@optusnet.com.au> On Friday 21 July 2006 13:15, Phillip J. Eby wrote: > A few minutes experimentation shows that the following browsers work just > fine with #md5 links: > > * Mozilla 1.7 > * Firefox 1.0 > * Internet Explorer 6 > * Lynx 2.8 > * Opera 8.5.4 > > I think that's a pretty good indication that most web browsers know how to > handle URI fragment identifiers in compliance with the RFCs. :) I would > be surprised if there are any other browsers in substantial use on PyPI, > with the exception of Safari. OK, I'll enable it. Let's see how it goes... Richard From ianb at colorstudy.com Tue Jul 25 01:45:15 2006 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 24 Jul 2006 18:45:15 -0500 Subject: [Catalog-sig] Search gone... Message-ID: <44C55B8B.3010706@colorstudy.com> In the UI refactoring the link to search the Cheese Shop seems to have disappeared (except to search through Google, which isn't quite the same). Can we get that back? And also a search box on the front page (preferably one that searches for text in any of title, keywords, description). Also, the Trove classifiers are served as text/html now instead of text/plain: http://www.python.org/pypi?%3Aaction=list_classifiers -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org From ianb at colorstudy.com Tue Jul 25 01:47:46 2006 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 24 Jul 2006 18:47:46 -0500 Subject: [Catalog-sig] Zope framework category Message-ID: <44C55C22.2060802@colorstudy.com> I'd like a Zope framework category, especially since Zope is starting to consume typical distutils packages so they are showing up in the Cheese Shop. But I'm not sure if there should separate Zope 2 and Zope 3 sections or not, or... I'm not sure. Does anyone here have opinions? With Zope Five the 2/3 distinction becomes fuzzy. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org From fdrake at gmail.com Tue Jul 25 03:54:54 2006 From: fdrake at gmail.com (Fred Drake) Date: Mon, 24 Jul 2006 21:54:54 -0400 Subject: [Catalog-sig] Zope framework category In-Reply-To: <44C55C22.2060802@colorstudy.com> References: <44C55C22.2060802@colorstudy.com> Message-ID: <9cee7ab80607241854l4fa093ddnee9db5699efed951@mail.gmail.com> On 7/24/06, Ian Bicking wrote: > I'd like a Zope framework category, especially since Zope is starting to > consume typical distutils packages so they are showing up in the > Cheese Shop. But I'm not sure if there should separate Zope 2 and Zope > 3 sections or not, or... I'm not sure. Does anyone here have opinions? I think Zope 2 and Zope 3 should be considered completely separate frameworks, though Jim Fulton's long-term vision suggests they'll be more closely related in the future. They will likely remain distinct even then, since the audiences are a bit different. > With Zope Five the 2/3 distinction becomes fuzzy. I don't think it does. It just means that *sometimes* Zope 2 users will be interested in Zope 3 components. Components may appear that fit in both categories -- that's ok. -Fred -- Fred L. Drake, Jr. "Every sin is the result of a collaboration." --Lucius Annaeus Seneca From richardjones at optusnet.com.au Tue Jul 25 06:40:42 2006 From: richardjones at optusnet.com.au (richardjones at optusnet.com.au) Date: Tue, 25 Jul 2006 14:40:42 +1000 Subject: [Catalog-sig] Search gone... Message-ID: <200607250440.k6P4egXX002121@mail17.syd.optusnet.com.au> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://mail.python.org/pipermail/catalog-sig/attachments/20060725/d79c4d41/attachment.pot From jim at zope.com Fri Jul 28 20:36:57 2006 From: jim at zope.com (Jim Fulton) Date: Fri, 28 Jul 2006 14:36:57 -0400 Subject: [Catalog-sig] Zope framework category In-Reply-To: <44C55C22.2060802@colorstudy.com> References: <44C55C22.2060802@colorstudy.com> Message-ID: On Jul 24, 2006, at 7:47 PM, Ian Bicking wrote: > I'd like a Zope framework category, especially since Zope is > starting to > consume typical distutils packages so they are showing up in the > Cheese Shop. But I'm not sure if there should separate Zope 2 and > Zope > 3 sections or not, or... I'm not sure. Does anyone here have > opinions? > With Zope Five the 2/3 distinction becomes fuzzy. I think a Zope 2 framework category makes sense. Zope 3 is a collection of much smaller and independent frameworks. Each should be handled separately as the need arises. I suppose that as people upload distributions that are meant to plug into some framework, they should request that the framework be registered. Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From richardjones at optushome.com.au Mon Jul 31 09:45:07 2006 From: richardjones at optushome.com.au (Richard Jones) Date: Mon, 31 Jul 2006 17:45:07 +1000 Subject: [Catalog-sig] Zope framework category In-Reply-To: References: <44C55C22.2060802@colorstudy.com> Message-ID: <200607311745.07897.richardjones@optushome.com.au> On Saturday 29 July 2006 04:36, Jim Fulton wrote: > I suppose that as people upload distributions that are meant to plug > into some framework, they should request that the framework be > registered. That's how I'm handling it at the moment :) I've added the info about requesting new packages to the Shop's Tutorial page (it was previously only on the Developers page). Richard From richardjones at optushome.com.au Mon Jul 31 11:14:03 2006 From: richardjones at optushome.com.au (Richard Jones) Date: Mon, 31 Jul 2006 19:14:03 +1000 Subject: [Catalog-sig] Search gone... In-Reply-To: <44C55B8B.3010706@colorstudy.com> References: <44C55B8B.3010706@colorstudy.com> Message-ID: <200607311914.03566.richardjones@optushome.com.au> On Tuesday 25 July 2006 09:45, Ian Bicking wrote: > Also, the Trove classifiers are served as text/html now instead of > text/plain: http://www.python.org/pypi?%3Aaction=list_classifiers This should be fixed the next time the server restarts its processes. Richard From jjl at pobox.com Mon Jul 31 13:46:00 2006 From: jjl at pobox.com (John J Lee) Date: Mon, 31 Jul 2006 12:46:00 +0100 (GMT Standard Time) Subject: [Catalog-sig] Zope framework category In-Reply-To: <200607311745.07897.richardjones@optushome.com.au> References: <44C55C22.2060802@colorstudy.com> <200607311745.07897.richardjones@optushome.com.au> Message-ID: On Mon, 31 Jul 2006, Richard Jones wrote: > On Saturday 29 July 2006 04:36, Jim Fulton wrote: >> I suppose that as people upload distributions that are meant to plug >> into some framework, they should request that the framework be >> registered. > > That's how I'm handling it at the moment :) > > I've added the info about requesting new packages to the Shop's Tutorial page > (it was previously only on the Developers page). Do we really have to keep "Framework :: TruboGears :: Applications"? (note the spelling of "TruboGears") John From richardjones at optushome.com.au Mon Jul 31 13:56:24 2006 From: richardjones at optushome.com.au (Richard Jones) Date: Mon, 31 Jul 2006 21:56:24 +1000 Subject: [Catalog-sig] Zope framework category In-Reply-To: References: <44C55C22.2060802@colorstudy.com> <200607311745.07897.richardjones@optushome.com.au> Message-ID: <200607312156.24434.richardjones@optushome.com.au> On Monday 31 July 2006 21:46, you wrote: > On Mon, 31 Jul 2006, Richard Jones wrote: > > On Saturday 29 July 2006 04:36, Jim Fulton wrote: > >> I suppose that as people upload distributions that are meant to plug > >> into some framework, they should request that the framework be > >> registered. > > > > That's how I'm handling it at the moment :) > > > > I've added the info about requesting new packages to the Shop's Tutorial > > page (it was previously only on the Developers page). > > Do we really have to keep "Framework :: TruboGears :: Applications"? > (note the spelling of "TruboGears") No, and it's gone now - no-one was using it :) Richard