[SciPy-dev] Dataset for examples and license

David Cournapeau david at ar.media.kyoto-u.ac.jp
Tue Apr 24 09:01:20 EDT 2007


Anne Archibald wrote:
> On 24/04/07, David Cournapeau <david at ar.media.kyoto-u.ac.jp> wrote:
>> Hi,
>>
>>     I would like to know what should be done when including some dataset
>> in scipy ? For example, during the development of my project pymachine,
>> I would like to include some famous data like iris/old faithful data,
>> etc... for demo of classic machine learning algorithms. R has some
>> intereseting data, but is licensed under the GPL, and I am not quite
>> sure what the status of the data are wrt the license ? Does GPL also
>> cover raw data ?
>
> Not necessarily appropriate for machine learning, and this doesn't
> answer your question, but there's lots of astronomy data which is
> public (and in fact I think in the public domain as it's a NASA
> product).
>
> For inclusion in scipy, supposing the license is fine, if the data is
> small (a few kilobytes?) it can go in a test case. (Does scipy *have*
> a collection of example code in the distribution? It would be nice...)
> If it's bigger (a few megabytes?) it could go on the Wiki; if it's
> really big it could probably go on the Wikimedia Commons (though do
> they support arbitrary file types?).
Well, I guess once scipy is modularized and can be installed package by 
package, having a package dataset ala R would be nice. For now, I have a 
small python script which convert those dataset to hdf5, so they can be 
read easily from python, and if including them to scipy is OK 
license-wise, I can easily add the data as a package for distribution 
(the compressed, pickled, related data takes ~ 100 kb).

David



More information about the SciPy-Dev mailing list