[Pandas-dev] Unit test reorganization

Wed Jan 13 20:51:28 EST 2016

Another idea here I've been toying with to achieve better logical test
organization is to place all tests in the whole project under
pandas/tests. This way we can centralize all the tests relating to
some functional aspect of pandas in one place, rather than the status
quo where test code tends to be fairly close to its implementation
(but not always). A prime example of where I let this get disorganized
early on is time series functionality tests are somewhat scattered
across pandas/tests, pandas/tseries, etc. This way we can also collect
a single directory of "quarantined" pandas 0.x behavior that we are
contemplating changing in a 1.0 release.

Thoughts on this + other ideas how to help organize the tests to help
mentally in approaching refactoring and internal changes?

- Wes

On Wed, Jan 13, 2016 at 1:16 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
> OK, I got started with the biggest offender:
>
> https://github.com/pydata/pandas/pull/12032
>
> It would be great to take the same approach with the other large test
> modules, with a special eye for quarantining "leaky" internals code
> and segregating NumPy interoperability contracts. I didn't completely
> do this with test_frame.py but it's a good start.
>
> There's definitely plenty of code in the other top level test modules
> which may nest under tests/frame or tests/series
>
> - Wes
>
> On Mon, Jan 11, 2016 at 8:47 AM, Wes McKinney <wesmckinn at gmail.com> wrote:
>> On Sun, Jan 10, 2016 at 6:06 PM, Stephan Hoyer <shoyer at gmail.com> wrote:
>>> On Fri, Jan 8, 2016 at 5:34 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
>>>>
>>>> Big #1 question is, how strongly do you feel about *shipping* the test
>>>> suite in site-packages? Some other libraries with sprawling and
>>>> complex test suites have chosen not to ship them:
>>>> https://github.com/zzzeek/sqlalchemy
>>>
>>>
>>> I would prefer to include the test suite if possible, because the ability to
>>> type "nosetests pandas" makes it easy both for users to verify installations
>>> are working properly and for downstream distributors to identify and report
>>> bugs. The complete pandas test suite still runs in 20-30 minutes, so I think
>>> it's still fairly reasonable to use it for these purposes.
>>>
>>
>> Got it. I wasn't sure if this was something people still wanted to do
>> in practice with the burgeoning test suite.
>>
>>>>
>>>> Independently, I would support and help with starting a judicious
>>>> reorganization of the contents of pandas/tests. So I'm thinking like
>>>>
>>>> tests/
>>>>     dataframe/
>>>>     series/
>>>>     algorithms/
>>>>     internals/
>>>>     tseries/
>>>>
>>>> and so forth.
>>>
>>>
>>> This sounds like a great idea -- these files have really gotten out of
>>> control!
>>>
>>
>> Sounds good. I've been sorting through points of contact between
>> Series/DataFrame's implementation and internal matters (e.g. the
>> BlockManager) and figured it would be good to "quarantine" code that
>> makes assumptions about what's under the hood. I'll get the first
>> couple patches started and it can be a slow burn to break apart these
>> large files.
>>
>>> Cheers,
>>> Stephan