[SciPy-dev] Starting a datasets package, again

Wed Jun 6 14:58:15 EDT 2007

David Cournapeau wrote:
> Robert Kern wrote:
>> It works fine, provided that all of the relevant packages are registered on the
>> PyPI and are installed as eggs (or with egg metadata) on user's machines.
>>
>>> For example, you install foo which uses faithful in some examples, when 
>>> is the dependency resolved ?
>> I would recommend that the example's dependencies be listed as an "extras"
>> dependency. The setup() for, say, scikits.pyem would have these arguments:
>>
>>   ...
>>   install_requires = ['numpy'],
>>   extras_require = {
>>     'examples': ['scipydata.iris', 'scipydata.oldfaithful'],
>>   },
>>   ...
>>
>> Then, if you want to be able to run the examples for scikits.pyem, you would do
>> this:
>>
>>   $ easy_install "scikits.pyem[examples]"
>>
>> However, just running
>>
>>   $ easy_install scikits.pyem
>>
>> won't install the data packages (this is a good thing).
> Why is this a good idea ? I guess there is a reason, but I don't see it :)

Because there are two different things that have requirements, the Python
package itself and the package's examples.

> The case I am worrying about is: someone not too familiar with the whole 
> thing installs pyem, and wants to go through the examples because that's 
> easier than reading the doc. Then, he realizes it does not work: what is 
> the error message ?

If you do nothing special, just the regular ImportError. Of course, you can
catch that error and give whatever error message you like.

> Should I handle this case in my code, or is there 
> some kind of mechanism to handle it automatically ?

  pkg_resources.resolve(['scikits.pyem[examples]'])

> There are already so many emails on the scipy ML (and personally, maybe 
> 2/3 of the emails related to my packages) because of installation 
> problems, I really worry about this point. I think this hurts the whole 
> numpy/scipy community quite a lot (lack of one click button "make it 
> work"), and I am afraid this may be a step away from this goal.

There's no substitute for giving your users a binary with everything it needs in
one tarball, data included. However, that doesn't scale at all. Everything else
is a compromise between these two concerns. If bundling the example data into
your examples works for your needs, by all means, do it, and ignore all notions
of scipydata packages. There's nothing wrong with copy-and-paste, here.

It's still useful to build a repository of scipydata packages with metadata and
parsing code already done. If you are only concerned with distributing examples
with your packages, you may not use the scipydata packages in them directly, but
you can still use the repository as a resource when developing your examples.

>> If you want a declaration from me, I would say that the surrounding text and
>> code in scipydata packages should always be under the BSD license. This should
>> be noted using the "License :: OSI Approved :: BSD License" classifier in the
>> setup script and in a *comment* in the code following the copyright notice.
>> However, the copyright notice and license should be accompanied by a note that
>> the data does not fall under this license or copyright and the metadata to look
>> at to find the status of the data. I'm not good at legal boilerplate, but
>> something like the following would be fine, I think:
>>
>> # The code and descriptive text is copyrighted and offered under the terms of
>> # the BSD License from the authors; see below. However, the actual dataset may
>> # have a different origin and intellectual property status. See the SOURCE and
>> # COPYRIGHT variables for this information.
>> #
>> # Copyright (c) 2007 Enthought, Inc.
>> #
>> # Redistribution and use in source and binary forms, with or without
>> # modification, are permitted provided that the following conditions are met:
>> # ..., etc.
> Ok, I will prepare something in this spirit, then. Including it in 
> scikits is not possible ?

Including what in scikits?

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco