[SciPy-dev] Is "Creative Commons: Attribution" an acceptable license for datasets included in scipy ?

Tue Jan 12 11:34:25 EST 2010

On Tue, Jan 12, 2010 at 10:52 AM, Bruce Southey <bsouthey at gmail.com> wrote:
> On 01/12/2010 09:12 AM, josef.pktd at gmail.com wrote:
>> On Tue, Jan 12, 2010 at 3:07 AM, Gael Varoquaux
>> <gael.varoquaux at normalesup.org>  wrote:
>>
>>> On Tue, Jan 12, 2010 at 05:01:14PM +0900, David Cournapeau wrote:
>>>
>>>> Hi,
>>>>
>>>
>>>>       Everything is in the title - I have some new IO code for
>>>> scipy.sparse I would like to include in scipy, and the tests include
>>>> some dataset under this license. Should I remove them before inclusion ?
>>>>
>>> I believe you must: the attribution clause is not free by OSI definition.
>>> In addition, I am pretty sure that none of the CC licenses are DFSG-free
>>> up to version 3.0 (don't ask me why).
>>>
>> cc-by looks pretty innocent for bundling with a package, especially if
>> it's only used for tests and for examples and not part of the main
>> program (like icons or sound).
>>
>> Bundling doesn't look infectious and the user is free to make use of
>> them or not. Attribution for bundling doesn't look more restrictive
>> than including the copyright statement for BSD lisencend code.
>>
>> In statsmodels, we have several datasets,  some public domain, some
>> with authorization by the author, but sometimes it is not very clear
>> whether a dataset is copyrightable or not.
>>
>> Although, I haven't seen any cc-by datasets in econometrics that I
>> remember,  and cc-by-nc looks clearly inconsistent.
>>
>> Are there some guidelines somewhere what would be consistent with this
>> kind of bundling of datasets (tests and examples)?
>>
>> US government data is nice because it's all public domain.
>>
>> But there are a lot of efforts to make data more widely available, e.g.
>> http://www.ckan.net
>> http://opendefinition.org/licenses
>>
>> Sorry, if this expands too much on the original question, but this is
>> bugging me for a while.
>>
>> Josef
>>
>>
> A little off topic, but search google for 'is data copyrightable'.
> For example:
> http://answers.google.com/answers/threadview/id/778789.html
> http://scienceblogs.com/commonknowledge/2009/01/data_copyrights_and_slogans_oh.php
> http://sciencecommons.org/resources/faq/databases#dbcopyright
>
> The important case that is referred to is Feist vs Rural:
> http://en.wikipedia.org/wiki/Feist_Publications_v._Rural_Telephone_Service

Thanks, for the links, especially the wikipedia article is pretty clear.
It looks like datasets in R (based on published or publicly available
information) is pretty much free game, since we only use the facts and
not the R code.

>
> The answer really depends on what country, what the data is ('facts' are
> not copyrightable), how (and when) it was collected and who  collected it.

Do we need a disclaimer, don't look at the data if you are in Australia?

>
> I agree with Robert with regards to data with tests. As for examples, it
> depends on the point you want to make as I would suggest simulated data
> or well-known datasets that are most likely in public domain.

In statsmodels, we don't just want to have tests, we also want to
verify that we can replicate known results and for illustration as
part of the documentation. (maybe it's functional/acceptance tests
versus unit tests)

Thanks,

Josef

>
> Bruce
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>