[SciPy-Dev] New subpackage: scipy.data

Ralf Gommers ralf.gommers at gmail.com
Sun Apr 29 01:58:44 EDT 2018

On Tue, Apr 3, 2018 at 1:06 AM, Daπid <davidmenhur at gmail.com> wrote:

> On 31 March 2018 at 02:17, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>> On Fri, Mar 30, 2018 at 12:03 PM, Eric Larson <larson.eric.d at gmail.com>
>> wrote:
>>> Top-level module for them alone sounds overkill, and I'm not sure if
>>>> discoverability alone is enough.
>>> Fine by me. And if we follow the idea that these should be added
>>> sparingly, we can maintain discoverability without it growing out of
>>> hand by populating the See Also sections of each function.
>> I agree with this, the 2 images and 1 ECG signal (to be added) that we
>> have doesn't justify a top-level module. We don't want to grow more than
>> the absolute minimum of datasets. The package is already very large, which
>> is problematic in certain cases. E.g. numpy + scipy still fits in the AWS
>> Lambda limit of 50 MB, but there's not much margin.
> The biggest subpackage is sparse, and there most of the space is taken by _
> sparsetools.cpython-35m-x86_64-linux-gnu.so According to size -A -d, the
> biggest sections are debug. The same goes for the second biggest, special.
> Can it run without those sections? On preliminary checks, it seems that
> stripping .debug_info and .debug_loc trim down the size from 38 to 3.7 MB,
> and the test suite still passes.

Should work. That's a lot more gain than I'd realized. Given that we hardly
ever get useful gdb tracebacks, it may be worth considering doing that for

> If we really need to trim down the size for installing in things like
> Lambda, could we have a scipy-lite for production environments, that is the
> same as scipy but without unnecessary debug? I imagine tracebacks would not
> be as informative, but that shouldn't matter for production environments.
> My first thought was to remove docstrings, comments, tests, and data, but
> maybe they don't amount to so much for the trouble.

Recipes for such things are floating around, and it makes sense to do that.
I'd rather not maintain an official scipy-lite package though, rather just
make choices within scipy that enable third parties to do that.


> On the topic at hand, I would agree to having a few, small datasets to
> showcase functionality. I think a few kilobytes can go a long way to show
> and benchmark. As far as I can see, a top level module is free: it wouldn't
> add any maintenance burden, and would make them easier to find.
> /David.
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20180428/f1455f5b/attachment.html>

More information about the SciPy-Dev mailing list