[SciPy-dev] Machine learning datasets (was Presentation of pymachine, a python package for machine learning)

Sun Jun 3 22:48:58 EDT 2007

Robert Kern wrote:
> Anne Archibald wrote:
>
>> Datasets published in academic papers are no less subject to these
>> restrictions; generally if you want to use one you must negotiate with
>> the author.
>
> Not necessarily. There is another US-specific exception. Data is not
> copyrightable in the United States. For something to be copyrightable here, it
> must contain some creative content. Thus, while I may not photocopy a phone book
> and sell the copy (the arrangement, typography, etc. are deemed creative and
> copyrightable), I may write down all of the numbers and typeset my own phone book.
>
> Now, most other countries don't have this rule. Notably, countries in the EU
> tend to recognize "the sweat of the brow" expended in collecting the data as
> being worthy of copyright protection.
>
> IANAL, but my approach would be to get in touch with the original source of the
> data if possible, and ask. The biggest problem you'll face is that few of those
> sources have ever thought about their datasets in terms of copyright licenses,
> particularly *software* copyright licenses that permit modification to their
> precious data. If it's an American source and the data appears to be freely
> distributed, as in the UCI database, I would probably just take it as public
> domain according to US law.
Does that mean you would agree including those datasets into scipy ? (I 
sent an email to one of the author of the UCI database, waiting for his 
answer on the status of the data). Concerning data such as Iris of old 
faithful, which are in books of dead authors, is this public domain ? (I 
checked if by any chance it was available in the gutenberg project, but 
unfortunately not).

David