[SciPy-dev] Machine learning datasets (was Presentation of pymachine, a python package for machine learning)

Anne Archibald peridot.faceted at gmail.com
Wed May 30 23:04:08 EDT 2007


On 30/05/07, David Cournapeau <david at ar.media.kyoto-u.ac.jp> wrote:
> Bruce Southey wrote:
> > Hi,
> > An example, AirPassengers  is not under the GPL. If you do
> > help("AirPassengers") you will see the source:
> > " Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1976) _Time Series
> > Analysis, Forecasting and Control._ Third Edition.  Holden-Day. Series
> > G."
> >
> This is why I feel a bit uneasy about this: when you distribute
> something under alicense, said GPL, you can only apply it to the parts
> you own the copyright. But R obviously does not own the datasets they
> distribute. This goes well beyond my knowledge of copyright, and I don't
> know what I should do about "fair use" (this concept is pretty much
> specific to the americain copyright system anyway, no ?). Eg if the
> package is included in scipy, and tomorrow someone sells super visual
> scipy, can't they be in trouble because they use (distribute) datasets
> which they do not own ?
>
> For GPL, at least, nobody can "close back" the sources, but with BSD,
> this is not that clear, and I don't want to add code which I am not 100
> % sure they can distributed under scipy license.

It is good that you're concerned. I'm no expert on copyright - it's a
staggeringly complicated subject - but I would say that what R is
doing is probably illegal.

If a piece of code, or text, or an image, or practically anything(*)
has no explicit license, you basically have to assume it is owned by
the author and you can only legally make copies with their explicit
permission. If you follow up some of those R datasets and find the
authors explicitly offer them for wide use, great - although you can't
do anything they didn't say is permissible. If the authors haven't
said anything, they are on firm legal footing if they decide to sue
anyone who makes a copy of R.

There are a few exceptions to this, but they are nation-specific; fair
use, for example, only exists in the United States of America. It
allows the copying of material under certain extremely restrictive
conditions (of which intent is one, leaving one open to litigation
even if one is pure as the driven snow) which are incompatible with
inclusion in an open-source project.

Datasets published in academic papers are no less subject to these
restrictions; generally if you want to use one you must negotiate with
the author.

Yes, this is an ugly situation.

One important exception is that the US federal government has a policy
that it releases material it creates into the public domain, that is
for free use in any way by anybody. (This does not apply to work
created by contractors for the US government, which can be
copyrighted.) So if you're looking for free materials, US government
websites are a good place to start whether or not you live there. But
always look for an explicit copyright statement; sometimes you need to
email them to find out.

Anne

(*)For example, the arrangement of lights on the Eiffel tower is
copyright somebody or other, so filmmakers must pay them money to
include a scene of the Eiffel tower at night, and it is technically
illegal to include it in your holiday snaps. Brilliant, eh? -A



More information about the SciPy-Dev mailing list