[SciPy-User] Datasets in Scipy Code

Skipper Seabold jsseabold at gmail.com
Sat Aug 14 16:57:37 EDT 2010


On Sat, Aug 14, 2010 at 4:31 PM, Tim Michelsen
<timmichelsen at gmx-topmail.de> wrote:
>> Yeah, I think what we have in statsmodels is about as far as it's
>> gotten.  I rewrote a lot of the code and David's NEP at the beginning
>> of the summer based on our needs to keep it maintanable and flexible.
>> There is also an incarnation in scikits-learn with a few differences,
>> but we tried to keep them similar.
> Comparing with Learn at:
> http://scikit-learn.git.sourceforge.net/git/gitweb.cgi?p=scikit-learn/scikit-learn;a=tree;f=scikits/learn/datasets
> You have increase the variety.
>
>> It might make sense to combine the two at some point and distribute as
>> a standalone scikit.
> For time series analysis I'd appreciate to have a data set with time
> stamps of frequency >= 1min.
> I currently do not have one free of copyright.
>

What do you have in mind?  I have some US macro data in there at the
quarterly frequency.  I would like to get some higher frequency
finance stuff.

> I will use your code a starter and submit a good data set as soon as I
> get roalty-free data.

That'd be great.  There are some utility functions and templates so
adding datasets is easy, so let me know when you have some data and I
can walk you through it if you need it.   It should also be documented
in the updated datasets proposal.

The license is the rub.  Mostly I've contacted the original authors
and have had no problems getting expressed written permission for
reuse.  Authors have told me that I am the only one who has ever
asked, including datasets that are included in the R datasets library
and other packages, and I've never gotten a straight answer on the
licensing of datasets in R.  Other stuff is often public domain.

Skipper



More information about the SciPy-User mailing list