[SciPy-dev] Common data sets for testing purposes

Tue Jul 11 02:05:18 EDT 2006

Robert Kern wrote ..
> Good idea. The first step would be collecting some datasets and writing
> one scipy/matplotlib (dare I say Chaco?) example per dataset.

I double DOG dare you to say it.

Actually, having some good datasets would be really helpful.  As pretty as bessel functions are, I'm rather tired of using them for all the basic chaco demos.

> I would prefer to keep the datasets out of the trunk and the distribution
> tarballs, though. The current download burden is somewhat heavy as it is,
> and some of the worthwhile datasets will probably be substantial in size.

I vote for this as well.  The GIS data that is used for some of the older chaco examples is 2.6mb compressed, and I moved it out of the main enthought/src/lib/ directory structure for that reason.

> If you would like to start a Wiki page on www.scipy.org to collect pointers
> to useful datasets and example code, that would be great.

The UN has a Statistics Division with tons of demographic data:
http://unstats.un.org/unsd/cdb/cdb_help/cdb_quick_start.asp

Now, granted, this is financial and demographic data instead of strictly scientific data, but perhaps the sheer volume and quality of the data outweighs the lack of direct scientific applicability.

-Peter