[Pandas-dev] Unit test reorganization

Wes McKinney wesmckinn at gmail.com
Wed Jan 13 21:06:57 EST 2016


On Wed, Jan 13, 2016 at 6:01 PM, Jeff Reback <jeffreback at gmail.com> wrote:
> so I agree +1 on moving all to pandas/tests
>
> - the indexing tests, which *mostly* are in test_indexing.py, though quite a
> few are in test_series/test_frame.py, should ideally be
> merged into a set of tests/indexing
>
> - io tests could be left alone I think
>

Yeah, I think pandas/io/tests is the one definite exception where
there isn't much benefit

> - stats tests are *mostly* deprecated
>
> - since going to deprecate panel + nd soon, I think makes sense to move
> these tests & code to pandas/deprecated, to keep separate
>
> - test_tslib.py should be integrated into tseries/test_timeseries.py
>
> - almost all of the Index tests are now in test_index (which sub-class being
> somewhat generically tested), but the time-series ones
> are in tseries/test_base, so these could be merged as well.
>

Yep, it specifically would be good to collect 100% of the index data
structure machinery (including Datetime/Timedelta/PeriodIndex) in one
place (same for axis indexing as you said, since it got pretty
scattered)

>
> Jeff
>
>
>
>
> On Wed, Jan 13, 2016 at 8:51 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
>>
>> Another idea here I've been toying with to achieve better logical test
>> organization is to place all tests in the whole project under
>> pandas/tests. This way we can centralize all the tests relating to
>> some functional aspect of pandas in one place, rather than the status
>> quo where test code tends to be fairly close to its implementation
>> (but not always). A prime example of where I let this get disorganized
>> early on is time series functionality tests are somewhat scattered
>> across pandas/tests, pandas/tseries, etc. This way we can also collect
>> a single directory of "quarantined" pandas 0.x behavior that we are
>> contemplating changing in a 1.0 release.
>>
>> Thoughts on this + other ideas how to help organize the tests to help
>> mentally in approaching refactoring and internal changes?
>>
>> - Wes
>>
>> On Wed, Jan 13, 2016 at 1:16 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
>> > OK, I got started with the biggest offender:
>> >
>> > https://github.com/pydata/pandas/pull/12032
>> >
>> > It would be great to take the same approach with the other large test
>> > modules, with a special eye for quarantining "leaky" internals code
>> > and segregating NumPy interoperability contracts. I didn't completely
>> > do this with test_frame.py but it's a good start.
>> >
>> > There's definitely plenty of code in the other top level test modules
>> > which may nest under tests/frame or tests/series
>> >
>> > - Wes
>> >
>> > On Mon, Jan 11, 2016 at 8:47 AM, Wes McKinney <wesmckinn at gmail.com>
>> > wrote:
>> >> On Sun, Jan 10, 2016 at 6:06 PM, Stephan Hoyer <shoyer at gmail.com>
>> >> wrote:
>> >>> On Fri, Jan 8, 2016 at 5:34 PM, Wes McKinney <wesmckinn at gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> Big #1 question is, how strongly do you feel about *shipping* the
>> >>>> test
>> >>>> suite in site-packages? Some other libraries with sprawling and
>> >>>> complex test suites have chosen not to ship them:
>> >>>> https://github.com/zzzeek/sqlalchemy
>> >>>
>> >>>
>> >>> I would prefer to include the test suite if possible, because the
>> >>> ability to
>> >>> type "nosetests pandas" makes it easy both for users to verify
>> >>> installations
>> >>> are working properly and for downstream distributors to identify and
>> >>> report
>> >>> bugs. The complete pandas test suite still runs in 20-30 minutes, so I
>> >>> think
>> >>> it's still fairly reasonable to use it for these purposes.
>> >>>
>> >>
>> >> Got it. I wasn't sure if this was something people still wanted to do
>> >> in practice with the burgeoning test suite.
>> >>
>> >>>>
>> >>>> Independently, I would support and help with starting a judicious
>> >>>> reorganization of the contents of pandas/tests. So I'm thinking like
>> >>>>
>> >>>> tests/
>> >>>>     dataframe/
>> >>>>     series/
>> >>>>     algorithms/
>> >>>>     internals/
>> >>>>     tseries/
>> >>>>
>> >>>> and so forth.
>> >>>
>> >>>
>> >>> This sounds like a great idea -- these files have really gotten out of
>> >>> control!
>> >>>
>> >>
>> >> Sounds good. I've been sorting through points of contact between
>> >> Series/DataFrame's implementation and internal matters (e.g. the
>> >> BlockManager) and figured it would be good to "quarantine" code that
>> >> makes assumptions about what's under the hood. I'll get the first
>> >> couple patches started and it can be a slow burn to break apart these
>> >> large files.
>> >>
>> >>> Cheers,
>> >>> Stephan
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>
>


More information about the Pandas-dev mailing list