[SciPy-Dev] new tests in sparse

Ralf Gommers ralf.gommers at gmail.com
Mon Jun 3 14:39:01 EDT 2013


On Mon, Jun 3, 2013 at 6:04 PM, Blake Griffith
<blake.a.griffith at gmail.com>wrote:

>
>
>
> On Mon, Jun 3, 2013 at 1:20 AM, Ralf Gommers <ralf.gommers at gmail.com>wrote:
>
>>
>> On Mon, Jun 3, 2013 at 1:32 AM, Blake Griffith <
>> blake.a.griffith at gmail.com> wrote:
>>
>>> On Sun, Jun 2, 2013 at 3:02 PM, Ralf Gommers <ralf.gommers at gmail.com>wrote:
>>>
>>>>
>>>> On Fri, May 31, 2013 at 11:49 PM, Blake Griffith <
>>>> blake.a.griffith at gmail.com> wrote:
>>>>
>>>>> Hello scipy,
>>>>> I'm in the process of adding new tests for sparse matrices with
>>>>> dtype=bool, while I'm at it, I thought it would be a good idea to add tests
>>>>> for other dtypes. Currently all the tests are for int64 and float64.
>>>>>
>>>>> There is some canonical data defined in test_base.py. I use this to
>>>>> make data with all the supported dtypes. Then I change the tests that use
>>>>> this canonical data to iterate over this data with different dtypes
>>>>> (including bool).
>>>>>
>>>>> Does this sound ok? It will make running test_base.py a bit slower.
>>>>> You can see my work here:
>>>>>
>>>>
>>>> That makes sense to me. Depends on what you mean by "a bit" if it's a
>>>> problem that tests run slower. If the test adds more than a few hundred
>>>> milliseconds then you should think about what part of those tests have to
>>>> be labeled @slow.
>>>>
>>>> Ralf
>>>>
>>>
>>> I've been going through on a per test basis changing each test to test
>>> every dtype instead of just one, so each modified test takes ~14 times as
>>> long (14 for each supported dtype)... I think this will slow the test suite
>>> way too much.
>>>
>>
>> Right now test_base.py already takes ~7% of the time of scipy.test(), so
>> 14x increase of that file is too much. Maybe there's something that can be
>> done to make it run faster to start with. I'll try to have a look at what
>> you changed so far tonight.
>>
>> Ralf
>>
>>
>>
>>> Unfortunately, the way I am adding theses tests using @slow would
>>> prevent the whole test from being run. Ideally I would just decorate the
>>> new extra dtype tests with @slow and the test would still run for the
>>> standard int64/float64 dtypes. So I need to change the way I am doing this.
>>>
>>
> I think rewriting this test suite, with paramterized dtypes and data would
> be worth the effort I have a week before I start the next part of my GSoC
> timeline which I think is enough time. It also will make my life easier for
> the rest of the summer.
>
> Does this sound too ambitious to do in a week?
>

That doesn't sound too ambitious. I'm not sure yet it's necessary though.
What you added so for in your official-bool-support branch didn't cost a
large amount of time yet.

The main problem I see is that you're making the changes in _TestCommon,
which is inherited by tests for all 7 sparse matrix formats. So if you put
in a for-loop over all 15 supported dtypes in a test, you now have 7*15 =
105 versions of that test being executed. I don't think that's necessary
for a lot of operations - probably testing all dtypes only for CSR would
already help a lot.

Also, try to measure the impact. There are some nose plugins that are
perhaps better, but this is a good start:
    sparse.test(extra_argv=['--with-profile'])
You'll see that _TestCommon.setUp is called 613 times for example, and it
does more than it needs to.

I suggest to first get clear where exactly it would help to test all
dtypes, then estimate how much time it would take.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20130603/bbd1f8ef/attachment.html>


More information about the SciPy-Dev mailing list