[Distutils] Data on requirement files on GitHub

Paul Eipper lkraider at gmail.com
Thu Mar 9 17:41:11 EST 2017


PS: took 2 hours to parse the dataset into the linearized version (stored
as "parsed.json") on my notebook.


--
Paul Eipper

On Thu, Mar 9, 2017 at 7:39 PM, Paul Eipper <lkraider at gmail.com> wrote:

> I had some fun parsing and plotting the data (very simple, just the top
> packages for now). See here:
> https://github.com/lkraider/requirements-dataset/blob/master/index.ipynb
>
> Let me know if you would accept a pull request so others can use that as a
> starting point.
>
> att,
>
>
> --
> Paul Eipper
>
> On Wed, Mar 8, 2017 at 1:36 PM, Nick Timkovich <prometheus235 at gmail.com>
> wrote:
>
>> Looks like a fun chunk of data, what's the query you used? Can you add a
>> README to the repo with some description if others want to iterate on it
>> (maybe look into setup.py's?)
>>
>> Nick
>>
>> On Tue, Mar 7, 2017 at 5:06 AM, Jannis Gebauer <ja.geb at me.com> wrote:
>>
>>> Hi,
>>>
>>> I ran a couple of queries against GitHubs public big query dataset [0]
>>> last week. I’m interested in requirement files in particular, so I ran a
>>> query extracting all available requirement files.
>>>
>>> Since queries against this dataset are rather expensive ($7 on all
>>> repos), I thought I’d share the raw data here [1]. The data contains the
>>> repo name, the requirements file path and the contents of the file. Every
>>> line represents a JSON blob, read it with:
>>>
>>> with open('data.json') as f:
>>>     for line in f.readlines():
>>>         data = json.loads(line)
>>>
>>> Maybe that’s of interest to some of you.
>>>
>>> If you have any ideas on what to do with the data, please let me know.
>>>
>>>>>>
>>> Jannis Gebauer
>>>
>>>
>>>
>>> [0]: https://cloud.google.com/bigquery/public-data/github
>>> [1]: https://github.com/jayfk/requirements-dataset
>>>
>>> _______________________________________________
>>> Distutils-SIG maillist  -  Distutils-SIG at python.org
>>> https://mail.python.org/mailman/listinfo/distutils-sig
>>>
>>>
>>
>> _______________________________________________
>> Distutils-SIG maillist  -  Distutils-SIG at python.org
>> https://mail.python.org/mailman/listinfo/distutils-sig
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20170309/01a2789a/attachment.html>


More information about the Distutils-SIG mailing list