From techtonik at gmail.com Fri Apr 3 10:50:51 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 3 Apr 2015 11:50:51 +0300 Subject: [core-workflow] web API to get a list of all module in stdlib In-Reply-To: References: Message-ID: Here is the hack that does the thing locally, not from web, and rationale. https://github.com/jackmaney/python-stdlib-list Quoting here just in case there are still people who can talk with trolls: Python Standard Library List This package includes lists of all of the standard libraries for Python 2.6, 2.7, 3.2, 3.3, and 3.4, along with the code for scraping the official Python docs to get said lists. Listing the modules in the standard library? Wait, why on Earth would you care about that?! Because knowing whether or not a module is part of the standard library will come in handy in a project of mine . And I'm not the only one who would find this useful. Or, the TL;DR answer is that it's handy in situations when you're analyzing Python code and would like to find module dependencies. After googling for a way to generate a list of Python standard libraries (and looking through the answers to the previously-linked Stack Overflow question), I decided that I didn't like the existing solutions. So, I started by writing a scraper for the TOC of the Python Module Index for each of the versions of Python above. However, web scraping can be a fragile affair. Thanks to a suggestion by @ncoghlan , and some further help from @birkenfeld and @epc , the population of the lists is now done by grabbing and parsing the Sphinx object inventory for the official Python docs of each relevant version. -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Mon Apr 6 13:35:46 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 6 Apr 2015 14:35:46 +0300 Subject: [core-workflow] web API to get a list of all module in stdlib In-Reply-To: References: Message-ID: The above library extracts module info from Sphinx objects.inv database https://docs.python.org/2.7/objects.inv which is some binary format and requires local parsing. Ideally, the Sphinx should give out the open data about stdlib structure, such as : https://docs.python.org/2.7/dataset/1.0/modules.json https://docs.python.org/2.7/dataset/1.0/modules.csv list all module names, sorted by name Then you can easily load this data into web app or use it as a table for analysis. http://www.w3.org/2013/csvw/wiki/Main_Page -------------- next part -------------- An HTML attachment was scrubbed... URL: From francismb at email.de Mon Apr 6 16:44:32 2015 From: francismb at email.de (francis) Date: Mon, 06 Apr 2015 16:44:32 +0200 Subject: [core-workflow] web API to get a list of all module in stdlib In-Reply-To: References: Message-ID: <55229BD0.2060207@email.de> Hi Anatoly, On 03/23/2015 01:06 PM, anatoly techtonik wrote: > Hi, > > I am doing an exercise as a part of agile ux data mining > team, and I need to get a list of Python modules: > > https://stackoverflow.com/questions/6463918/how-can-i-get-a-list-of-all-the-python-standard-library-modules > > But this gives only the modules that were compiled into > specific interpreter, and I need a list of modules that are > de-facto included in stdlib standard. > > I also need this for all Python versions, and be able to > fetch it as csv, json or html table format over webm so > that result of my work could be validated and experiment > repeated as necessary. > > > I see the data as the necessary step to organize a work > around "externally evolving standard library", so a way > to query it should be somewhat sustainable and obvious. > > It might be possible to generate something from docs, like: > > https://docs.python.org/2.7.2/dataset/modules.json > > This way you get static information without ability to > version or refresh the info (still good to have anyway to > compare docs and other sources). +1 for the idea to publish the final results to avoid "reparsing the wheel". IMHO it could be interesting for new versions to have some kind of "sys.stdlib_module_names" (as stated in SO). Why not proposing it on python-ideas? Regards, francis