[Pandas-dev] Unit test reorganization

Wes McKinney wesmckinn at gmail.com
Mon Jan 25 11:47:42 EST 2016


hi all,

As part of code cleanup and reorganization, let's start creating a
"quarantine" of test code for functionality (like Panel classes) that
we are contemplating deprecating and later removing in 1.0, if that
sounds like a good idea?

- Wes

On Wed, Jan 13, 2016 at 6:06 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
> On Wed, Jan 13, 2016 at 6:01 PM, Jeff Reback <jeffreback at gmail.com> wrote:
>> so I agree +1 on moving all to pandas/tests
>>
>> - the indexing tests, which *mostly* are in test_indexing.py, though quite a
>> few are in test_series/test_frame.py, should ideally be
>> merged into a set of tests/indexing
>>
>> - io tests could be left alone I think
>>
>
> Yeah, I think pandas/io/tests is the one definite exception where
> there isn't much benefit
>
>> - stats tests are *mostly* deprecated
>>
>> - since going to deprecate panel + nd soon, I think makes sense to move
>> these tests & code to pandas/deprecated, to keep separate
>>
>> - test_tslib.py should be integrated into tseries/test_timeseries.py
>>
>> - almost all of the Index tests are now in test_index (which sub-class being
>> somewhat generically tested), but the time-series ones
>> are in tseries/test_base, so these could be merged as well.
>>
>
> Yep, it specifically would be good to collect 100% of the index data
> structure machinery (including Datetime/Timedelta/PeriodIndex) in one
> place (same for axis indexing as you said, since it got pretty
> scattered)
>
>>
>> Jeff
>>
>>
>>
>>
>> On Wed, Jan 13, 2016 at 8:51 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
>>>
>>> Another idea here I've been toying with to achieve better logical test
>>> organization is to place all tests in the whole project under
>>> pandas/tests. This way we can centralize all the tests relating to
>>> some functional aspect of pandas in one place, rather than the status
>>> quo where test code tends to be fairly close to its implementation
>>> (but not always). A prime example of where I let this get disorganized
>>> early on is time series functionality tests are somewhat scattered
>>> across pandas/tests, pandas/tseries, etc. This way we can also collect
>>> a single directory of "quarantined" pandas 0.x behavior that we are
>>> contemplating changing in a 1.0 release.
>>>
>>> Thoughts on this + other ideas how to help organize the tests to help
>>> mentally in approaching refactoring and internal changes?
>>>
>>> - Wes
>>>
>>> On Wed, Jan 13, 2016 at 1:16 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
>>> > OK, I got started with the biggest offender:
>>> >
>>> > https://github.com/pydata/pandas/pull/12032
>>> >
>>> > It would be great to take the same approach with the other large test
>>> > modules, with a special eye for quarantining "leaky" internals code
>>> > and segregating NumPy interoperability contracts. I didn't completely
>>> > do this with test_frame.py but it's a good start.
>>> >
>>> > There's definitely plenty of code in the other top level test modules
>>> > which may nest under tests/frame or tests/series
>>> >
>>> > - Wes
>>> >
>>> > On Mon, Jan 11, 2016 at 8:47 AM, Wes McKinney <wesmckinn at gmail.com>
>>> > wrote:
>>> >> On Sun, Jan 10, 2016 at 6:06 PM, Stephan Hoyer <shoyer at gmail.com>
>>> >> wrote:
>>> >>> On Fri, Jan 8, 2016 at 5:34 PM, Wes McKinney <wesmckinn at gmail.com>
>>> >>> wrote:
>>> >>>>
>>> >>>> Big #1 question is, how strongly do you feel about *shipping* the
>>> >>>> test
>>> >>>> suite in site-packages? Some other libraries with sprawling and
>>> >>>> complex test suites have chosen not to ship them:
>>> >>>> https://github.com/zzzeek/sqlalchemy
>>> >>>
>>> >>>
>>> >>> I would prefer to include the test suite if possible, because the
>>> >>> ability to
>>> >>> type "nosetests pandas" makes it easy both for users to verify
>>> >>> installations
>>> >>> are working properly and for downstream distributors to identify and
>>> >>> report
>>> >>> bugs. The complete pandas test suite still runs in 20-30 minutes, so I
>>> >>> think
>>> >>> it's still fairly reasonable to use it for these purposes.
>>> >>>
>>> >>
>>> >> Got it. I wasn't sure if this was something people still wanted to do
>>> >> in practice with the burgeoning test suite.
>>> >>
>>> >>>>
>>> >>>> Independently, I would support and help with starting a judicious
>>> >>>> reorganization of the contents of pandas/tests. So I'm thinking like
>>> >>>>
>>> >>>> tests/
>>> >>>>     dataframe/
>>> >>>>     series/
>>> >>>>     algorithms/
>>> >>>>     internals/
>>> >>>>     tseries/
>>> >>>>
>>> >>>> and so forth.
>>> >>>
>>> >>>
>>> >>> This sounds like a great idea -- these files have really gotten out of
>>> >>> control!
>>> >>>
>>> >>
>>> >> Sounds good. I've been sorting through points of contact between
>>> >> Series/DataFrame's implementation and internal matters (e.g. the
>>> >> BlockManager) and figured it would be good to "quarantine" code that
>>> >> makes assumptions about what's under the hood. I'll get the first
>>> >> couple patches started and it can be a slow burn to break apart these
>>> >> large files.
>>> >>
>>> >>> Cheers,
>>> >>> Stephan
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>
>>


More information about the Pandas-dev mailing list