[SciPy-user] A first proposal for dataset organization
Robert Kern
robert.kern at gmail.com
Wed Sep 19 13:22:19 EDT 2007
David Huard wrote:
> Hi Anne,
>
> 2007/9/19, Anne Archibald <peridot.faceted at gmail.com
> <mailto:peridot.faceted at gmail.com>>:
>
> On 18/09/2007, David Huard <david.huard at gmail.com
> <mailto:david.huard at gmail.com>> wrote:
>
> > For large data sets, I'm not sure I understand what you're
> meaning. Do you
> > intend to include netcdf or HDF5 files and provide an interface to
> access
> > those data sets so users don't have to bother about the underlying
> engine ?
> > Do we really want to distribute a package weighting > 1GB ?
>
> One of the points of this project, as I understand it, is to make it
> convenient for people to get and use real datasets. In particular, one
> possibility is to not include the data in this package, but instead
> only a script to download it from (say) the HEASARC. Thus big datasets
> are not outrageous, and more to the point, we need to be able to deal
> with them whatever form they are in natively.
>
>
> My understanding was rather :
> " ... to make it convenient for people to get and use real datasets for
> use in SciPy and NumPy examples, documentation and tutorials. " This
> limits the scope of the dataset package, at least for starters. If some
> tutorial deals with larger than memory issues, then using a specialized
> binary format makes sense. However, I think that pretty basic datasets
> can illustrate the use of most SciPy and NumPy functions.
That's an important use case, certainly, but I had in mind uses cases like the
one Anne gave, too, when I suggested parts of the design that David implemented.
The scope is still fairly broad.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the SciPy-User
mailing list