[SciPy-dev] Machine learning datasets (was Presentation of pymachine, a python package for machine learning)

Bruce Southey bsouthey at gmail.com
Wed May 30 22:24:55 EDT 2007


Hi,
An example, AirPassengers  is not under the GPL. If you do
help("AirPassengers") you will see the source:
" Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1976) _Time Series
Analysis, Forecasting and Control._ Third Edition.  Holden-Day. Series
G."

Likewise for BJsales where the help notes was copied from the Time
Series Data Library
(http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/ ). The
license for this site is free. However, the source provided is:
"G. E. P. Box and G. M. Jenkins (1976): _Time Series Analysis,
Forecasting and Control_, Holden-Day, San Francisco, p. 537."

I don't have either book so I can not tell you if there are any terms
for use of the dataset. In some cases I presume people would argue
'fair use'. Also note that these books predate the GPL (v1 was
released Jan 1989)!


Bruce

On 5/30/07, David Cournapeau <david at ar.media.kyoto-u.ac.jp> wrote:
> Bruce Southey wrote:
> > Hi,
> > You might find the UCI Machine Learning Repository a useful resource for data:
> > http://www.ics.uci.edu/~mlearn/MLRepository.html
> >
> > Standard sources are:
> > Statlib: http://lib.stat.cmu.edu/
> > Netlib: http://www.netlib.org/
> >
> > Even with those included with R may be used because some are in public domain.
> The main problem of datasets seem to be license. For example, you say
> that some of the datasets in R are public domain: do you know which ones
> (how do you know ? I looked for informations on this issue, without any
> luck). For all I know, the datasets (at least the ones in R core) are
> under the GPL.
>
> cheers,
>
> David
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev at scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-dev
>



More information about the SciPy-Dev mailing list