[Distutils] dump of all PyPI project metadata available?

Wes Turner wes.turner at gmail.com
Thu Jul 23 06:12:21 CEST 2015


On Jul 22, 2015 5:12 PM, "Brett Cannon" <bcannon at gmail.com> wrote:
>
>
>
> On Wed, Jul 22, 2015 at 2:19 PM Wes Turner <wes.turner at gmail.com> wrote:
>>
>> https://github.com/dstufft/pypi-stats
>>
>> https://github.com/dstufft/pypi-external-stats
>
>
> I'm not quite sure what I'm supposed to get from those links, Wes, as
that code still scrapes every project individually and downloads them while
all I'm trying to avoid having to scrape PyPI and instead just download a
single file (plus I don't want the files but just the metadata already
returned by the JSON API).

An online query or an offline dump?

>
> -Brett
>
>>
>> - [ ] a flat bigquery w/ pandas.io.gbq ala GitHub Archive would be great

http://pandas.pydata.org/pandas-docs/version/0.16.2/io.html#io-bigquery

>> - [ ] it's probably worth it to add RDFa to PyPi and warehouse pages (in
addition to the auxiliary executed/extracted JSON) for #search

https://github.com/pypa/warehouse/blob/master/warehouse/packaging/models.py

https://github.com/pypa/warehouse/blob/master/tests/unit/packaging/test_models.py

https://github.com/pypa/warehouse/blob/master/warehouse/packaging/views.py

https://github.com/pypa/warehouse/blob/master/warehouse/templates/packaging/detail.html

https://github.com/pypa/warehouse/blob/master/warehouse/routes.py

https://github.com/pypa/warehouse/blob/master/tests/unit/legacy/api/test_json.py

https://github.com/pypa/warehouse/blob/master/warehouse/legacy/api/json.py

>>
>> On Jul 22, 2015 4:08 PM, "Brett Cannon" <bcannon at gmail.com> wrote:
>>>
>>> When I wrote https://nothingbutsnark.svbtle.com/python-3-support-on-pypi
I wrote a script to download every project's JSON metadata by scraping the
simple index and then making the appropriate GET request for the JSON
metadata. It worked, but somewhat of a hassle.
>>>
>>> Is there some dump somewhere that is built daily, weekly, or monthly of
all the metadata on PyPI for offline analysis?
>>>
>>> _______________________________________________
>>> Distutils-SIG maillist  -  Distutils-SIG at python.org
>>> https://mail.python.org/mailman/listinfo/distutils-sig
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20150722/3d8ce694/attachment-0001.html>


More information about the Distutils-SIG mailing list