[SciPy-Dev] SciPy-Dev Digest, Vol 174, Issue 31

The Helmbolds helmrp at yahoo.com
Sun Apr 29 08:15:37 EDT 2018


How much of the DATASETS issues could be handled simply by references in the documentation to where users can find those datasets that are generally considered both "standard" and potentially useful, without "physically" incorporating those datasets into SciPy?
E.g, could the ECG dataset be handled that way?

"You won't find the right answers if you don't ask the right questions!" (Robert Helmbold, 2013) 

    On ‎Saturday‎, ‎April‎ ‎28‎, ‎2018‎ ‎11‎:‎42‎:‎46‎ ‎PM‎ ‎MST, scipy-dev-request at python.org <scipy-dev-request at python.org> wrote:  
 
 Send SciPy-Dev mailing list submissions to
    scipy-dev at python.org

To subscribe or unsubscribe via the World Wide Web, visit
    https://mail.python.org/mailman/listinfo/scipy-dev
or, via email, send a message with subject or body 'help' to
    scipy-dev-request at python.org

You can reach the person managing the list at
    scipy-dev-owner at python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of SciPy-Dev digest..."


Today's Topics:

  1. Re: New subpackage: scipy.data (Ralf Gommers)
  2. Re: New subpackage: scipy.data (Robert Kern)
  3. Re: New subpackage: scipy.data (Ralf Gommers)


----------------------------------------------------------------------

Message: 1
Date: Sat, 28 Apr 2018 22:58:44 -0700
From: Ralf Gommers <ralf.gommers at gmail.com>
To: SciPy Developers List <scipy-dev at python.org>
Subject: Re: [SciPy-Dev] New subpackage: scipy.data
Message-ID:
    <CABL7CQjuAKrHbVwSEWXd_V1uzLV-=XbokG=ZokMDt354hTpszw at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

On Tue, Apr 3, 2018 at 1:06 AM, Da?id <davidmenhur at gmail.com> wrote:

>
>
> On 31 March 2018 at 02:17, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
>>
>>
>> On Fri, Mar 30, 2018 at 12:03 PM, Eric Larson <larson.eric.d at gmail.com>
>> wrote:
>>
>>> Top-level module for them alone sounds overkill, and I'm not sure if
>>>> discoverability alone is enough.
>>>>
>>>
>>> Fine by me. And if we follow the idea that these should be added
>>> sparingly, we can maintain discoverability without it growing out of
>>> hand by populating the See Also sections of each function.
>>>
>>
>> I agree with this, the 2 images and 1 ECG signal (to be added) that we
>> have doesn't justify a top-level module. We don't want to grow more than
>> the absolute minimum of datasets. The package is already very large, which
>> is problematic in certain cases. E.g. numpy + scipy still fits in the AWS
>> Lambda limit of 50 MB, but there's not much margin.
>>
>
> The biggest subpackage is sparse, and there most of the space is taken by _
> sparsetools.cpython-35m-x86_64-linux-gnu.so According to size -A -d, the
> biggest sections are debug. The same goes for the second biggest, special.
> Can it run without those sections? On preliminary checks, it seems that
> stripping .debug_info and .debug_loc trim down the size from 38 to 3.7 MB,
> and the test suite still passes.
>

Should work. That's a lot more gain than I'd realized. Given that we hardly
ever get useful gdb tracebacks, it may be worth considering doing that for
releases.


>
> If we really need to trim down the size for installing in things like
> Lambda, could we have a scipy-lite for production environments, that is the
> same as scipy but without unnecessary debug? I imagine tracebacks would not
> be as informative, but that shouldn't matter for production environments.
> My first thought was to remove docstrings, comments, tests, and data, but
> maybe they don't amount to so much for the trouble.
>

Recipes for such things are floating around, and it makes sense to do that.
I'd rather not maintain an official scipy-lite package though, rather just
make choices within scipy that enable third parties to do that.

Ralf



>
>
> On the topic at hand, I would agree to having a few, small datasets to
> showcase functionality. I think a few kilobytes can go a long way to show
> and benchmark. As far as I can see, a top level module is free: it wouldn't
> add any maintenance burden, and would make them easier to find.
>
> /David.
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20180428/f1455f5b/attachment-0001.html>

------------------------------

Message: 2
Date: Sun, 29 Apr 2018 06:21:55 +0000
From: Robert Kern <robert.kern at gmail.com>
To: SciPy Developers List <scipy-dev at python.org>
Subject: Re: [SciPy-Dev] New subpackage: scipy.data
Message-ID:
    <CAF6FJitOZ11k+epRJ8kDS7X40RDTFKou65aCGxhfTE_p=tQTyA at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

On Sat, Apr 28, 2018 at 10:46 PM Ralf Gommers <ralf.gommers at gmail.com>
wrote:
>
> On Mon, Apr 2, 2018 at 11:50 AM, Warren Weckesser <
warren.weckesser at gmail.com> wrote:

>> (c) We actually *use* the dataset in one of *our* docstrings or
tutorials.  I don't think our datasets package should become a repository
of interesting scientific data with no connection to the scipy code.  Its
purpose should be to enrich our documentation.  (Note that by this
criterion, the recently added ECG signal would not qualify!)
>
> I'd add the criterion that we should *only* use any dataset in the docs.
Hence there are zero internal imports, and the whole datasets submodule can
then very simply be stripped for space-constrained usage scenarios. (in
those cases a separate package would help even)

I believe that one of the motivations for adding the ECG dataset was to
make some of the scipy.signal unit tests more realistic. Is that something
you'd like to forbid? On the one hand, if you're strapped for space, you
probably want to remove the test suites as well. On the other hand, you do
want to be able to test your stripped installation!

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20180429/4adb589f/attachment-0001.html>

------------------------------

Message: 3
Date: Sat, 28 Apr 2018 23:41:39 -0700
From: Ralf Gommers <ralf.gommers at gmail.com>
To: SciPy Developers List <scipy-dev at python.org>
Subject: Re: [SciPy-Dev] New subpackage: scipy.data
Message-ID:
    <CABL7CQgeuKMC2-o7LtJShvO2-EvV+3reKS1Tu_9N8GmRO0CvCA at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

On Sat, Apr 28, 2018 at 11:21 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Sat, Apr 28, 2018 at 10:46 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >
> > On Mon, Apr 2, 2018 at 11:50 AM, Warren Weckesser <
> warren.weckesser at gmail.com> wrote:
>
> >> (c) We actually *use* the dataset in one of *our* docstrings or
> tutorials.  I don't think our datasets package should become a repository
> of interesting scientific data with no connection to the scipy code.  Its
> purpose should be to enrich our documentation.  (Note that by this
> criterion, the recently added ECG signal would not qualify!)
> >
> > I'd add the criterion that we should *only* use any dataset in the docs.
> Hence there are zero internal imports, and the whole datasets submodule can
> then very simply be stripped for space-constrained usage scenarios. (in
> those cases a separate package would help even)
>
> I believe that one of the motivations for adding the ECG dataset was to
> make some of the scipy.signal unit tests more realistic. Is that something
> you'd like to forbid? On the one hand, if you're strapped for space, you
> probably want to remove the test suites as well. On the other hand, you do
> want to be able to test your stripped installation!
>

Hmm, tough question. Ideally I'd like to say yes, however we do need test
data in some cases. In practice I think one would want to strip the test
suite anyway; scipy/special/tests/data/*.npz is over 1 MB already. So let's
say that importing from within tests is okay.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20180428/bafa35fb/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
SciPy-Dev mailing list
SciPy-Dev at python.org
https://mail.python.org/mailman/listinfo/scipy-dev


------------------------------

End of SciPy-Dev Digest, Vol 174, Issue 31
******************************************
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20180429/87da15ac/attachment-0001.html>


More information about the SciPy-Dev mailing list