From wesmckinn at gmail.com  Fri Mar 15 02:21:25 2013
From: wesmckinn at gmail.com (Wes McKinney)
Date: Thu, 14 Mar 2013 21:21:25 -0400
Subject: [Pandas-dev] Welcome
Message-ID: <CAJPUwMBXLGWKUkY-EUxXnGXSEdNB50mr7SFwqmMAMNBrNxBdpw@mail.gmail.com>

I just had this new mailing list created for high level discussions
around pandas development. Hard to believe we've made it this long
without one. I'll make an announcement about the development mailing
list on PyData.

Thanks,
Wes

From wesmckinn at gmail.com  Wed Mar 20 00:00:48 2013
From: wesmckinn at gmail.com (Wes McKinney)
Date: Tue, 19 Mar 2013 19:00:48 -0400
Subject: [Pandas-dev] Managing the pandas firehose
Message-ID: <CAJPUwMAFcT4386X75E2ZgLfmPajj34REbF77bbeULMGL6karnA@mail.gmail.com>

Hi all,

Welcome to the new pandas developer list! I thought it would be good
to have a place for higher level discussions about the project and
other initiatives, so I made this.

One note that I wanted to pass on as we move toward the 0.11 release
and going forward-- if I could get your help classifying and
categorizing incoming issues, that would be a big help of staying on
top of things. What does this mean?

- Incoming issues: mark milestone as next release (bugs and other
"must fixes" or low hanging fruit), next next release at your
discretion. "Someday" otherwise. On GitHub you can see there are
30-something issues that have no milestone-- in January there were
over 100 and I had a "milestone classification" binge. Would be great
to not be the only one =P
- Pull requests: also mark with a milestone please! This helps keep
track of what release pull requests were a part of later on.
- Label accordingly-- you all have been doing a good job with this.

Code review and pull requests:
- For one or two commits that aren't likely to be controversial (e.g.
Jeff has been doing a lot of little doc additions), I don't mind if
you push directly to master. If you think having someone else (doesn't
need to be me necessarily) sign off would be good, then leave until
that happens.
- I don't mind if you use the green button-- I waffle between regular
merges and cherry-picks when the number of commits is small.

My main concern with ongoing development is making sure that things
don't fall through the cracks and that bugs that come into the issue
tracker get promptly classified. Any other thoughts?

At some point we'll have to think about release management-- I have
been carrying that torch since pandas 0.1, but at some point maybe
someone else will do it. Part of it relies on having access to a
fully-equipped Windows VM with 32 and 64 bit versions across all
Python versions-- I have a virtualbox image that should get hosted
someplace that is not a physical box in my apartment at some point.

- Wes

From changshe at gmail.com  Wed Mar 20 00:18:43 2013
From: changshe at gmail.com (Chang She)
Date: Tue, 19 Mar 2013 16:18:43 -0700
Subject: [Pandas-dev] Managing the pandas firehose
In-Reply-To: <CAJPUwMAFcT4386X75E2ZgLfmPajj34REbF77bbeULMGL6karnA@mail.gmail.com>
References: <CAJPUwMAFcT4386X75E2ZgLfmPajj34REbF77bbeULMGL6karnA@mail.gmail.com>
Message-ID: <F952A42B-1A7B-4387-93E1-CC1F0EB96E12@gmail.com>

Just to tack on to this email, I've started talking to some folks about applying for a grant to fund pandas development for the next year or so and wanted to get your thoughts on hiring someone to spend substantial time on pandas.

There are several big questions here:

1. What are the main things that need done in the next year?
2. What exactly would that person be responsible for? Would he/she be full-time or part-time?
3. How much money would that take?
4. What organization would the money be funneled through (needs to be a non-profit)?
5. What metrics can we track over the next year or so to show whether the grant was successful?
6. How/who do we hire?


Some of the stuff that Wes outlined in his email can definitely fall on this hypothetical person. Since we're all volunteers, having someone hired to make sure things don't fall through the cracks would give us a peace of mind and save us some stress. 

In any case, your thoughts would be appreciated (alternative funding ideas are also very welcome!)


On Mar 19, 2013, at 4:00 PM, Wes McKinney <wesmckinn at gmail.com> wrote:

> Hi all,
> 
> Welcome to the new pandas developer list! I thought it would be good
> to have a place for higher level discussions about the project and
> other initiatives, so I made this.
> 
> One note that I wanted to pass on as we move toward the 0.11 release
> and going forward-- if I could get your help classifying and
> categorizing incoming issues, that would be a big help of staying on
> top of things. What does this mean?
> 
> - Incoming issues: mark milestone as next release (bugs and other
> "must fixes" or low hanging fruit), next next release at your
> discretion. "Someday" otherwise. On GitHub you can see there are
> 30-something issues that have no milestone-- in January there were
> over 100 and I had a "milestone classification" binge. Would be great
> to not be the only one =P
> - Pull requests: also mark with a milestone please! This helps keep
> track of what release pull requests were a part of later on.
> - Label accordingly-- you all have been doing a good job with this.
> 
> Code review and pull requests:
> - For one or two commits that aren't likely to be controversial (e.g.
> Jeff has been doing a lot of little doc additions), I don't mind if
> you push directly to master. If you think having someone else (doesn't
> need to be me necessarily) sign off would be good, then leave until
> that happens.
> - I don't mind if you use the green button-- I waffle between regular
> merges and cherry-picks when the number of commits is small.
> 
> My main concern with ongoing development is making sure that things
> don't fall through the cracks and that bugs that come into the issue
> tracker get promptly classified. Any other thoughts?
> 
> At some point we'll have to think about release management-- I have
> been carrying that torch since pandas 0.1, but at some point maybe
> someone else will do it. Part of it relies on having access to a
> fully-equipped Windows VM with 32 and 64 bit versions across all
> Python versions-- I have a virtualbox image that should get hosted
> someplace that is not a physical box in my apartment at some point.
> 
> - Wes
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> http://mail.python.org/mailman/listinfo/pandas-dev


From jeffreback at gmail.com  Wed Mar 20 00:42:24 2013
From: jeffreback at gmail.com (Jeff Reback)
Date: Tue, 19 Mar 2013 19:42:24 -0400
Subject: [Pandas-dev] Managing the pandas firehose
In-Reply-To: <F952A42B-1A7B-4387-93E1-CC1F0EB96E12@gmail.com>
References: <CAJPUwMAFcT4386X75E2ZgLfmPajj34REbF77bbeULMGL6karnA@mail.gmail.com>
	<F952A42B-1A7B-4387-93E1-CC1F0EB96E12@gmail.com>
Message-ID: <CAHMnJKgo5AxWeiL1zB=ozH9NoDBZ+TMW5HNXxLnSCx1xz8F0uw@mail.gmail.com>

So from the developer page, here is the roadmap

   1. DONE numpy.datetime64 integration, scikits.timeseries codebase
   integration. Substantially improved time series functionality*.*
   2. Improved PyTables (HDF5) integration
   3. Tools for working with data sets that do not fit into memory
   4. Improved SQL / relational database tools
   5. Better statistical graphics using matplotlib
   6. Integration with D3.js <https://github.com/mikedewar/D3py>
   7. NDFrame data structure for arbitrarily high-dimensional labeled data
   8. Extend GroupBy functionality to regular ndarrays, record arrays
   9. Better support for NumPy dtype hierarchy without sacrificing usability
   10. *DONE Add a Factor data type (in R parlance)*
   11. Better support for integer NA values
   12. (0.10) Better memory usage and performance when reading very large
   CSV files

blue = done < 0.11
orange = 0.11
yellow = some support, more needed

IMHO
I think 8 is prob more trouble than its worth
out-of-core (3) is very important
5,6 pretty useful
11 a toss-up, depends on if pandas waits for numpy support or roll your own

any other items that should be on this list?


On Tue, Mar 19, 2013 at 7:18 PM, Chang She <changshe at gmail.com> wrote:


> Just to tack on to this email, I've started talking to some folks about
> applying for a grant to fund pandas development for the next year or so and
> wanted to get your thoughts on hiring someone to spend substantial time on
> pandas.
>
> There are several big questions here:
>
> 1. What are the main things that need done in the next year?
> 2. What exactly would that person be responsible for? Would he/she be
> full-time or part-time?
> 3. How much money would that take?
> 4. What organization would the money be funneled through (needs to be a
> non-profit)?
> 5. What metrics can we track over the next year or so to show whether the
> grant was successful?
> 6. How/who do we hire?
>
>
> Some of the stuff that Wes outlined in his email can definitely fall on
> this hypothetical person. Since we're all volunteers, having someone hired
> to make sure things don't fall through the cracks would give us a peace of
> mind and save us some stress.
>
> In any case, your thoughts would be appreciated (alternative funding ideas
> are also very welcome!)
>
>
>
> On Mar 19, 2013, at 4:00 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
>
> > Hi all,
> >
> > Welcome to the new pandas developer list! I thought it would be good
> > to have a place for higher level discussions about the project and
> > other initiatives, so I made this.
> >
> > One note that I wanted to pass on as we move toward the 0.11 release
> > and going forward-- if I could get your help classifying and
> > categorizing incoming issues, that would be a big help of staying on
> > top of things. What does this mean?
> >
> > - Incoming issues: mark milestone as next release (bugs and other
> > "must fixes" or low hanging fruit), next next release at your
> > discretion. "Someday" otherwise. On GitHub you can see there are
> > 30-something issues that have no milestone-- in January there were
> > over 100 and I had a "milestone classification" binge. Would be great
> > to not be the only one =P
> > - Pull requests: also mark with a milestone please! This helps keep
> > track of what release pull requests were a part of later on.
> > - Label accordingly-- you all have been doing a good job with this.
> >
> > Code review and pull requests:
> > - For one or two commits that aren't likely to be controversial (e.g.
> > Jeff has been doing a lot of little doc additions), I don't mind if
> > you push directly to master. If you think having someone else (doesn't
> > need to be me necessarily) sign off would be good, then leave until
> > that happens.
> > - I don't mind if you use the green button-- I waffle between regular
> > merges and cherry-picks when the number of commits is small.
> >
> > My main concern with ongoing development is making sure that things
> > don't fall through the cracks and that bugs that come into the issue
> > tracker get promptly classified. Any other thoughts?
> >
> > At some point we'll have to think about release management-- I have
> > been carrying that torch since pandas 0.1, but at some point maybe
> > someone else will do it. Part of it relies on having access to a
> > fully-equipped Windows VM with 32 and 64 bit versions across all
> > Python versions-- I have a virtualbox image that should get hosted
> > someplace that is not a physical box in my apartment at some point.
> >
> > - Wes
> > _______________________________________________
> > Pandas-dev mailing list
> > Pandas-dev at python.org
> > http://mail.python.org/mailman/listinfo/pandas-dev
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> http://mail.python.org/mailman/listinfo/pandas-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20130319/456209e1/attachment-0001.html>

From swlin at post.harvard.edu  Wed Mar 20 06:01:35 2013
From: swlin at post.harvard.edu (Stephen Lin)
Date: Wed, 20 Mar 2013 01:01:35 -0400
Subject: [Pandas-dev] #3089 [PERF: regression from 0.10.1] discussion
Message-ID: <CAP2HvmBoN9Cua6mLJysAPi=wknbFg+JYePDz7dTySE7VTfgbhQ@mail.gmail.com>

As per the "we're getting too chatty on GitHub" comment, should we be
moving extended issue discussion about bugs to this list whenever
possible?

I posted a few comments on #3089 just now but realized maybe starting
an e-mail chain would be better..

Anyway, I'm looking into the issue, I suspect it's a corner case due
to an array that's very large in one dimension but small in another,
and possibly that there's compiler and architecture differences
causing different results as well....Jeff, do you mind sending me your
the output of "gcc -dumpmachine" and "gcc -dumpspecs" on the machine
you ran vb_suite on?

I'll set up a 64-bit dev machine going forward so I can test on both platforms.

Thanks,
Stephen

From swlin at post.harvard.edu  Wed Mar 20 06:25:08 2013
From: swlin at post.harvard.edu (Stephen Lin)
Date: Wed, 20 Mar 2013 01:25:08 -0400
Subject: [Pandas-dev] #3089 [PERF: regression from 0.10.1] discussion
In-Reply-To: <CAP2HvmBoN9Cua6mLJysAPi=wknbFg+JYePDz7dTySE7VTfgbhQ@mail.gmail.com>
References: <CAP2HvmBoN9Cua6mLJysAPi=wknbFg+JYePDz7dTySE7VTfgbhQ@mail.gmail.com>
Message-ID: <CAP2HvmAd+ND7MbP--yDvEuwG=PhovfoHyS00Z4KSF0bXmX1uVw@mail.gmail.com>

Ahh! I figured it out...the platform issue is part of it, but mostly
it's that two (independently tested) commits had a weird effect when
merged.

And the reason they did so is because this particular test turns out
all of our reindexing tests are testing something very
non-representative, because of the way they're constructed, so we're
not really getting representative performance data unfortunately (it
has to do with the DataFrame constructor and c-contiguity vs
f-contiguity). We should probably write new tests to fix this issue.

I'll write up a fuller explanation when I get a chance. Anyway, sorry
for sending you on a git bisect goose chase, Jeff.

Stephen

On Wed, Mar 20, 2013 at 1:01 AM, Stephen Lin <swlin at post.harvard.edu> wrote:
> As per the "we're getting too chatty on GitHub" comment, should we be
> moving extended issue discussion about bugs to this list whenever
> possible?
>
> I posted a few comments on #3089 just now but realized maybe starting
> an e-mail chain would be better..
>
> Anyway, I'm looking into the issue, I suspect it's a corner case due
> to an array that's very large in one dimension but small in another,
> and possibly that there's compiler and architecture differences
> causing different results as well....Jeff, do you mind sending me your
> the output of "gcc -dumpmachine" and "gcc -dumpspecs" on the machine
> you ran vb_suite on?
>
> I'll set up a 64-bit dev machine going forward so I can test on both platforms.
>
> Thanks,
> Stephen

From jeffreback at gmail.com  Wed Mar 20 11:03:17 2013
From: jeffreback at gmail.com (Jeff Reback)
Date: Wed, 20 Mar 2013 06:03:17 -0400
Subject: [Pandas-dev] Managing the pandas firehose
In-Reply-To: <F952A42B-1A7B-4387-93E1-CC1F0EB96E12@gmail.com>
References: <CAJPUwMAFcT4386X75E2ZgLfmPajj34REbF77bbeULMGL6karnA@mail.gmail.com>
	<F952A42B-1A7B-4387-93E1-CC1F0EB96E12@gmail.com>
Message-ID: <2E6A1D4A-5883-4C82-895B-8B2BBAE7D2D2@gmail.com>

it seems that a detailed roadmap (with links to issues) could
be easily setup at

https://github.com/pydata/pandas/wiki

then link this back to the developer page

has this been thought about already?

I can be reached on my cell 917-971-6387

On Mar 19, 2013, at 7:18 PM, Chang She <changshe at gmail.com> wrote:

> Just to tack on to this email, I've started talking to some folks about applying for a grant to fund pandas development for the next year or so and wanted to get your thoughts on hiring someone to spend substantial time on pandas.
> 
> There are several big questions here:
> 
> 1. What are the main things that need done in the next year?
> 2. What exactly would that person be responsible for? Would he/she be full-time or part-time?
> 3. How much money would that take?
> 4. What organization would the money be funneled through (needs to be a non-profit)?
> 5. What metrics can we track over the next year or so to show whether the grant was successful?
> 6. How/who do we hire?
> 
> 
> Some of the stuff that Wes outlined in his email can definitely fall on this hypothetical person. Since we're all volunteers, having someone hired to make sure things don't fall through the cracks would give us a peace of mind and save us some stress. 
> 
> In any case, your thoughts would be appreciated (alternative funding ideas are also very welcome!)
> 
> 
> 
> On Mar 19, 2013, at 4:00 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
> 
>> Hi all,
>> 
>> Welcome to the new pandas developer list! I thought it would be good
>> to have a place for higher level discussions about the project and
>> other initiatives, so I made this.
>> 
>> One note that I wanted to pass on as we move toward the 0.11 release
>> and going forward-- if I could get your help classifying and
>> categorizing incoming issues, that would be a big help of staying on
>> top of things. What does this mean?
>> 
>> - Incoming issues: mark milestone as next release (bugs and other
>> "must fixes" or low hanging fruit), next next release at your
>> discretion. "Someday" otherwise. On GitHub you can see there are
>> 30-something issues that have no milestone-- in January there were
>> over 100 and I had a "milestone classification" binge. Would be great
>> to not be the only one =P
>> - Pull requests: also mark with a milestone please! This helps keep
>> track of what release pull requests were a part of later on.
>> - Label accordingly-- you all have been doing a good job with this.
>> 
>> Code review and pull requests:
>> - For one or two commits that aren't likely to be controversial (e.g.
>> Jeff has been doing a lot of little doc additions), I don't mind if
>> you push directly to master. If you think having someone else (doesn't
>> need to be me necessarily) sign off would be good, then leave until
>> that happens.
>> - I don't mind if you use the green button-- I waffle between regular
>> merges and cherry-picks when the number of commits is small.
>> 
>> My main concern with ongoing development is making sure that things
>> don't fall through the cracks and that bugs that come into the issue
>> tracker get promptly classified. Any other thoughts?
>> 
>> At some point we'll have to think about release management-- I have
>> been carrying that torch since pandas 0.1, but at some point maybe
>> someone else will do it. Part of it relies on having access to a
>> fully-equipped Windows VM with 32 and 64 bit versions across all
>> Python versions-- I have a virtualbox image that should get hosted
>> someplace that is not a physical box in my apartment at some point.
>> 
>> - Wes
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> http://mail.python.org/mailman/listinfo/pandas-dev
> 
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> http://mail.python.org/mailman/listinfo/pandas-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20130320/282c6224/attachment.html>

From jeffreback at gmail.com  Wed Mar 20 11:14:21 2013
From: jeffreback at gmail.com (Jeff Reback)
Date: Wed, 20 Mar 2013 06:14:21 -0400
Subject: [Pandas-dev] #3089 [PERF: regression from 0.10.1] discussion
In-Reply-To: <CAP2HvmAd+ND7MbP--yDvEuwG=PhovfoHyS00Z4KSF0bXmX1uVw@mail.gmail.com>
References: <CAP2HvmBoN9Cua6mLJysAPi=wknbFg+JYePDz7dTySE7VTfgbhQ@mail.gmail.com>
	<CAP2HvmAd+ND7MbP--yDvEuwG=PhovfoHyS00Z4KSF0bXmX1uVw@mail.gmail.com>
Message-ID: <8C98ED00-AC93-46E9-80DB-F6E7E8F85CFA@gmail.com>

It was an academic exercise :)

not that these are actual quotes....

"premature optimization is the root of all evil"

"benchmarking to widgets just helps you make better widgets"

On Mar 20, 2013, at 1:25 AM, Stephen Lin <swlin at post.harvard.edu> wrote:

> Ahh! I figured it out...the platform issue is part of it, but mostly
> it's that two (independently tested) commits had a weird effect when
> merged.
> 
> And the reason they did so is because this particular test turns out
> all of our reindexing tests are testing something very
> non-representative, because of the way they're constructed, so we're
> not really getting representative performance data unfortunately (it
> has to do with the DataFrame constructor and c-contiguity vs
> f-contiguity). We should probably write new tests to fix this issue.
> 
> I'll write up a fuller explanation when I get a chance. Anyway, sorry
> for sending you on a git bisect goose chase, Jeff.
> 
> Stephen
> 
> On Wed, Mar 20, 2013 at 1:01 AM, Stephen Lin <swlin at post.harvard.edu> wrote:
>> As per the "we're getting too chatty on GitHub" comment, should we be
>> moving extended issue discussion about bugs to this list whenever
>> possible?
>> 
>> I posted a few comments on #3089 just now but realized maybe starting
>> an e-mail chain would be better..
>> 
>> Anyway, I'm looking into the issue, I suspect it's a corner case due
>> to an array that's very large in one dimension but small in another,
>> and possibly that there's compiler and architecture differences
>> causing different results as well....Jeff, do you mind sending me your
>> the output of "gcc -dumpmachine" and "gcc -dumpspecs" on the machine
>> you ran vb_suite on?
>> 
>> I'll set up a 64-bit dev machine going forward so I can test on both platforms.
>> 
>> Thanks,
>> Stephen
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> http://mail.python.org/mailman/listinfo/pandas-dev

From swlin at post.harvard.edu  Wed Mar 20 19:24:24 2013
From: swlin at post.harvard.edu (Stephen Lin)
Date: Wed, 20 Mar 2013 14:24:24 -0400
Subject: [Pandas-dev] #3089 [PERF: regression from 0.10.1] discussion
In-Reply-To: <CAP2HvmAd+ND7MbP--yDvEuwG=PhovfoHyS00Z4KSF0bXmX1uVw@mail.gmail.com>
References: <CAP2HvmBoN9Cua6mLJysAPi=wknbFg+JYePDz7dTySE7VTfgbhQ@mail.gmail.com>
	<CAP2HvmAd+ND7MbP--yDvEuwG=PhovfoHyS00Z4KSF0bXmX1uVw@mail.gmail.com>
Message-ID: <CAP2HvmAvkituZjrmhnsP+BpxSf1xsBim=7vCbA9W6jfFQAEyxQ@mail.gmail.com>

OK, here goes, the issue is the following...

The optimization is question optimizes to row-by-row or
column-by-column copying for 2-d arrays when possible, namely when:

1. the input array (where the array in question is Block.values) is
c-contiguous for takes along axis0 or f-contiguous for takes along
axis1 of the array, and
2. the contiguity of the output array matches the contiguity of the input

Almost all the time, Block.values is stored c-contiguously, such that
each row of the Block corresponds to a column of the DataFrame. So the
optimization only really kicks in, effectively, when reindexing along
the column axis of the DataFrame (i.e. axis 0 of the Block); it
basically means we call memmove once per DataFrame column rather than
iterating in a loop and copying elements.  This is good because most
sane DataFrame objects are have more rows than columns, so we call
memmove few times (i.e. once per column) for a large block of values
(i.e. all rows for that column at a time), so any overhead from
calling memmove will be outweighed by the benefit of a hand optimized
copy (which probably involves vectorization, alignment/cache
optimization, loop unrolling, etc.)

C-contiguous blocks result from basically every Pandas operation that
operates on blocks, with the only exceptions of (as far as I can tell)
creating a DataFrame directly from a 2-d ndarray or creating the
transpose of a homogenous DataFrame (but not a heterogenous one)
without copying; this is basically an optimization to avoid creating
the c-contigous version of an array when the f-contiguous one is
already available, but it's the exception rather than the rule and
pretty any modification of the DataFrame will immediately require
reallocation and copying to a new c-contiguous block.

Unfortunately many of the DataFrame tests, including the two in
question here, are (for simplicity) only testing the case where a
homogenous 2-d data is passed to the DataFrame, which results in
(non-representative) f-contiguous blocks. An additional issue with
this test is that it's creating a very long but thin array (10,000
long, 4 wide) and reindexing along the index dimension, so row-by-row
(from the DataFrame perspective) copying is done over and over using
memmove on 4 element arrays. Furthermore, the alignment and width in
bytes of each 4 element array happens to be a convenient multiple of
128bits, which is the multiple required for vectorized SIMD
instructions, so it turns out the element-by-element copying is fairly
efficient when such operations are available (as is guaranteed on
x86-64, but not necessarily x86-32), and the call to memmove has more
overhead than element-by-element copying.

So the issue is basically only happening because all the following are true:

1. The DataFrame is constructed directly by a 2-d homogenous ndarray
(which has the default c-contiguous continuity, so the block becomes
f-contiguous).
2. There has been no operation after construction of the DataFrame
requiring reallocation of any sort (otherwise the block would become
c-contiguous).
3. The reindexing is done on the index axis (otherwise no optimization
would be triggered, since it requires the right axis/contiguity
combination).
4. The DataFrame is long but thin (otherwise memmove would not be
called repeatedly to do small copies).
5. The C compiler is not inlining memmove properly, for whatever reason, and
6. (possibly) The alignment/width of the data happens to be such that
SIMD operations can be used directly, so the overhead of the eliding
the loop is not very great and exceeded by the overhead of the
memmove.

To be honest, it's common C practice to call memmove/memcpy (the
performance of the two don't really differ from my testing in this
case) even for very small arrays and assuming that the implementation
is sane enough to inline it and do the right thing either way, so I'm
really surprised about #5: I would not have thought it to be an issue
with a modern compiler, since calling memcpy can't do anything but
provide the compiler more, not less, information about your intentions
(and the overhead of the memmove aliasing check is not significant
here).

Anyway, so it's a corner case, and I didn't catch it originally
because I tested independently the effect of 1) allocates the output
array to be f-contiguous instead of c-contiguous by default when the
input array is f-contiguous and 2) converting loops into memmove when
possible, both of which have a positive performance effect
independently but combine to adversely affect these two tests.

I can revert the change that "allocates the output array to be
f-contiguous instead of c-contiguous by default when the input array
is f-contiguous", meaning that this optimization will almost never be
triggered for an f-contiguous input array (unless the caller
explicitly provides an output array as f-contiguous), but I'd rather
not because the optimization is actually kind of useful in less
degenerate cases when you want to quickly produce a reindexed version
of a f-contiguous array, for whatever reason, even though the cases
are rarer.

So I think what I'm going to do instead, to avoid the degenerate case
above, is to trigger the optimization only when the take operation is
done along the shorter of the two dimensions (i.e. so the copied
dimension is the longer of the two): that will definitely fix this
test (since it'll avoid this optimization completely) but I suppose
there might be other degenerate cases I haven't thought about it. I'll
submit a PR later today for this, if no one finds any objection to the
idea.

However, I think it might be skewed our performance results to be
testing DataFrame objects constructed by 2-d ndarrays, since they're
not representative; in addition to the issue above, it means that many
tests are actually incorporating the cost of converting an
f-contiguous array into a c-contiguous array on top of what they're
actually trying to test. Two possible solutions are:

1. Change DataFrame constructor (and possibly DataFrame.T) to
normalize all blocks as c-contiguous.
2. Leave DataFrame constructor as-is but either change existing tests
to exercise the more common use case (c-contiguous blocks) or add them
in addition to the current ones.

I think #2 is probably best, since #1 will have a performance impact
for the use cases (however rare) where an entire workflow can avoid
triggering conversion from f-contiguous blocks to c-contiguous blocks.

Let me know what you all think,
Stephen

On Wed, Mar 20, 2013 at 1:25 AM, Stephen Lin <swlin at post.harvard.edu> wrote:
> Ahh! I figured it out...the platform issue is part of it, but mostly
> it's that two (independently tested) commits had a weird effect when
> merged.
>
> And the reason they did so is because this particular test turns out
> all of our reindexing tests are testing something very
> non-representative, because of the way they're constructed, so we're
> not really getting representative performance data unfortunately (it
> has to do with the DataFrame constructor and c-contiguity vs
> f-contiguity). We should probably write new tests to fix this issue.
>
> I'll write up a fuller explanation when I get a chance. Anyway, sorry
> for sending you on a git bisect goose chase, Jeff.
>
> Stephen
>
> On Wed, Mar 20, 2013 at 1:01 AM, Stephen Lin <swlin at post.harvard.edu> wrote:
>> As per the "we're getting too chatty on GitHub" comment, should we be
>> moving extended issue discussion about bugs to this list whenever
>> possible?
>>
>> I posted a few comments on #3089 just now but realized maybe starting
>> an e-mail chain would be better..
>>
>> Anyway, I'm looking into the issue, I suspect it's a corner case due
>> to an array that's very large in one dimension but small in another,
>> and possibly that there's compiler and architecture differences
>> causing different results as well....Jeff, do you mind sending me your
>> the output of "gcc -dumpmachine" and "gcc -dumpspecs" on the machine
>> you ran vb_suite on?
>>
>> I'll set up a 64-bit dev machine going forward so I can test on both platforms.
>>
>> Thanks,
>> Stephen

From swlin at post.harvard.edu  Wed Mar 20 19:46:25 2013
From: swlin at post.harvard.edu (Stephen Lin)
Date: Wed, 20 Mar 2013 14:46:25 -0400
Subject: [Pandas-dev] #3089 [PERF: regression from 0.10.1] discussion
In-Reply-To: <CAP2HvmAd+ND7MbP--yDvEuwG=PhovfoHyS00Z4KSF0bXmX1uVw@mail.gmail.com>
References: <CAP2HvmBoN9Cua6mLJysAPi=wknbFg+JYePDz7dTySE7VTfgbhQ@mail.gmail.com>
	<CAP2HvmAd+ND7MbP--yDvEuwG=PhovfoHyS00Z4KSF0bXmX1uVw@mail.gmail.com>
Message-ID: <CAP2HvmBwFwX46JUQj_u9US9p0pLeMh2Cv3FH4V+3DLZh=285cQ@mail.gmail.com>

p.s. also, "triggering the optimization only when the take operation
is done along the shorter of the two dimensions" is probably more
restrictive than it has to be, but I'm not comfortable hardcoding a
lower-limit size for calling memmove (I searched for guidance on
setting such a limit appropriately online, but couldn't find any: I
think the presumption is usually that it doesn't matter if the
compiler does the right thing)

On Wed, Mar 20, 2013 at 1:25 AM, Stephen Lin <swlin at post.harvard.edu> wrote:
> Ahh! I figured it out...the platform issue is part of it, but mostly
> it's that two (independently tested) commits had a weird effect when
> merged.
>
> And the reason they did so is because this particular test turns out
> all of our reindexing tests are testing something very
> non-representative, because of the way they're constructed, so we're
> not really getting representative performance data unfortunately (it
> has to do with the DataFrame constructor and c-contiguity vs
> f-contiguity). We should probably write new tests to fix this issue.
>
> I'll write up a fuller explanation when I get a chance. Anyway, sorry
> for sending you on a git bisect goose chase, Jeff.
>
> Stephen
>
> On Wed, Mar 20, 2013 at 1:01 AM, Stephen Lin <swlin at post.harvard.edu> wrote:
>> As per the "we're getting too chatty on GitHub" comment, should we be
>> moving extended issue discussion about bugs to this list whenever
>> possible?
>>
>> I posted a few comments on #3089 just now but realized maybe starting
>> an e-mail chain would be better..
>>
>> Anyway, I'm looking into the issue, I suspect it's a corner case due
>> to an array that's very large in one dimension but small in another,
>> and possibly that there's compiler and architecture differences
>> causing different results as well....Jeff, do you mind sending me your
>> the output of "gcc -dumpmachine" and "gcc -dumpspecs" on the machine
>> you ran vb_suite on?
>>
>> I'll set up a 64-bit dev machine going forward so I can test on both platforms.
>>
>> Thanks,
>> Stephen

From jeffreback at gmail.com  Wed Mar 20 19:56:17 2013
From: jeffreback at gmail.com (Jeff Reback)
Date: Wed, 20 Mar 2013 14:56:17 -0400
Subject: [Pandas-dev] #3089 [PERF: regression from 0.10.1] discussion
In-Reply-To: <CAP2HvmAvkituZjrmhnsP+BpxSf1xsBim=7vCbA9W6jfFQAEyxQ@mail.gmail.com>
References: <CAP2HvmBoN9Cua6mLJysAPi=wknbFg+JYePDz7dTySE7VTfgbhQ@mail.gmail.com>
	<CAP2HvmAd+ND7MbP--yDvEuwG=PhovfoHyS00Z4KSF0bXmX1uVw@mail.gmail.com>
	<CAP2HvmAvkituZjrmhnsP+BpxSf1xsBim=7vCbA9W6jfFQAEyxQ@mail.gmail.com>
Message-ID: <CAHMnJKj98pYyr7nEozyRoMWDmph+A5-CuaQj3nzbaaReKY2nqg@mail.gmail.com>

awesome explanation Stephen!

I'd vote for #2

essentially create a testing constructor (kind of like y-p's mkdf),
but creates only a numpy random array, that by default is c-continguous
(with option for f ), and then use that where we have (EVERYWHERE)!
np.random.randn.......

and second I guess if it helps, look at the c/f contiguous ness
of the ops where appropriate...

my 2c


On Wed, Mar 20, 2013 at 2:24 PM, Stephen Lin <swlin at post.harvard.edu> wrote:

> OK, here goes, the issue is the following...
>
> The optimization is question optimizes to row-by-row or
> column-by-column copying for 2-d arrays when possible, namely when:
>
> 1. the input array (where the array in question is Block.values) is
> c-contiguous for takes along axis0 or f-contiguous for takes along
> axis1 of the array, and
> 2. the contiguity of the output array matches the contiguity of the input
>
> Almost all the time, Block.values is stored c-contiguously, such that
> each row of the Block corresponds to a column of the DataFrame. So the
> optimization only really kicks in, effectively, when reindexing along
> the column axis of the DataFrame (i.e. axis 0 of the Block); it
> basically means we call memmove once per DataFrame column rather than
> iterating in a loop and copying elements.  This is good because most
> sane DataFrame objects are have more rows than columns, so we call
> memmove few times (i.e. once per column) for a large block of values
> (i.e. all rows for that column at a time), so any overhead from
> calling memmove will be outweighed by the benefit of a hand optimized
> copy (which probably involves vectorization, alignment/cache
> optimization, loop unrolling, etc.)
>
> C-contiguous blocks result from basically every Pandas operation that
> operates on blocks, with the only exceptions of (as far as I can tell)
> creating a DataFrame directly from a 2-d ndarray or creating the
> transpose of a homogenous DataFrame (but not a heterogenous one)
> without copying; this is basically an optimization to avoid creating
> the c-contigous version of an array when the f-contiguous one is
> already available, but it's the exception rather than the rule and
> pretty any modification of the DataFrame will immediately require
> reallocation and copying to a new c-contiguous block.
>
> Unfortunately many of the DataFrame tests, including the two in
> question here, are (for simplicity) only testing the case where a
> homogenous 2-d data is passed to the DataFrame, which results in
> (non-representative) f-contiguous blocks. An additional issue with
> this test is that it's creating a very long but thin array (10,000
> long, 4 wide) and reindexing along the index dimension, so row-by-row
> (from the DataFrame perspective) copying is done over and over using
> memmove on 4 element arrays. Furthermore, the alignment and width in
> bytes of each 4 element array happens to be a convenient multiple of
> 128bits, which is the multiple required for vectorized SIMD
> instructions, so it turns out the element-by-element copying is fairly
> efficient when such operations are available (as is guaranteed on
> x86-64, but not necessarily x86-32), and the call to memmove has more
> overhead than element-by-element copying.
>
> So the issue is basically only happening because all the following are
> true:
>
> 1. The DataFrame is constructed directly by a 2-d homogenous ndarray
> (which has the default c-contiguous continuity, so the block becomes
> f-contiguous).
> 2. There has been no operation after construction of the DataFrame
> requiring reallocation of any sort (otherwise the block would become
> c-contiguous).
> 3. The reindexing is done on the index axis (otherwise no optimization
> would be triggered, since it requires the right axis/contiguity
> combination).
> 4. The DataFrame is long but thin (otherwise memmove would not be
> called repeatedly to do small copies).
> 5. The C compiler is not inlining memmove properly, for whatever reason,
> and
> 6. (possibly) The alignment/width of the data happens to be such that
> SIMD operations can be used directly, so the overhead of the eliding
> the loop is not very great and exceeded by the overhead of the
> memmove.
>
> To be honest, it's common C practice to call memmove/memcpy (the
> performance of the two don't really differ from my testing in this
> case) even for very small arrays and assuming that the implementation
> is sane enough to inline it and do the right thing either way, so I'm
> really surprised about #5: I would not have thought it to be an issue
> with a modern compiler, since calling memcpy can't do anything but
> provide the compiler more, not less, information about your intentions
> (and the overhead of the memmove aliasing check is not significant
> here).
>
> Anyway, so it's a corner case, and I didn't catch it originally
> because I tested independently the effect of 1) allocates the output
> array to be f-contiguous instead of c-contiguous by default when the
> input array is f-contiguous and 2) converting loops into memmove when
> possible, both of which have a positive performance effect
> independently but combine to adversely affect these two tests.
>
> I can revert the change that "allocates the output array to be
> f-contiguous instead of c-contiguous by default when the input array
> is f-contiguous", meaning that this optimization will almost never be
> triggered for an f-contiguous input array (unless the caller
> explicitly provides an output array as f-contiguous), but I'd rather
> not because the optimization is actually kind of useful in less
> degenerate cases when you want to quickly produce a reindexed version
> of a f-contiguous array, for whatever reason, even though the cases
> are rarer.
>
> So I think what I'm going to do instead, to avoid the degenerate case
> above, is to trigger the optimization only when the take operation is
> done along the shorter of the two dimensions (i.e. so the copied
> dimension is the longer of the two): that will definitely fix this
> test (since it'll avoid this optimization completely) but I suppose
> there might be other degenerate cases I haven't thought about it. I'll
> submit a PR later today for this, if no one finds any objection to the
> idea.
>
> However, I think it might be skewed our performance results to be
> testing DataFrame objects constructed by 2-d ndarrays, since they're
> not representative; in addition to the issue above, it means that many
> tests are actually incorporating the cost of converting an
> f-contiguous array into a c-contiguous array on top of what they're
> actually trying to test. Two possible solutions are:
>
> 1. Change DataFrame constructor (and possibly DataFrame.T) to
> normalize all blocks as c-contiguous.
> 2. Leave DataFrame constructor as-is but either change existing tests
> to exercise the more common use case (c-contiguous blocks) or add them
> in addition to the current ones.
>
> I think #2 is probably best, since #1 will have a performance impact
> for the use cases (however rare) where an entire workflow can avoid
> triggering conversion from f-contiguous blocks to c-contiguous blocks.
>
> Let me know what you all think,
> Stephen
>
> On Wed, Mar 20, 2013 at 1:25 AM, Stephen Lin <swlin at post.harvard.edu>
> wrote:
> > Ahh! I figured it out...the platform issue is part of it, but mostly
> > it's that two (independently tested) commits had a weird effect when
> > merged.
> >
> > And the reason they did so is because this particular test turns out
> > all of our reindexing tests are testing something very
> > non-representative, because of the way they're constructed, so we're
> > not really getting representative performance data unfortunately (it
> > has to do with the DataFrame constructor and c-contiguity vs
> > f-contiguity). We should probably write new tests to fix this issue.
> >
> > I'll write up a fuller explanation when I get a chance. Anyway, sorry
> > for sending you on a git bisect goose chase, Jeff.
> >
> > Stephen
> >
> > On Wed, Mar 20, 2013 at 1:01 AM, Stephen Lin <swlin at post.harvard.edu>
> wrote:
> >> As per the "we're getting too chatty on GitHub" comment, should we be
> >> moving extended issue discussion about bugs to this list whenever
> >> possible?
> >>
> >> I posted a few comments on #3089 just now but realized maybe starting
> >> an e-mail chain would be better..
> >>
> >> Anyway, I'm looking into the issue, I suspect it's a corner case due
> >> to an array that's very large in one dimension but small in another,
> >> and possibly that there's compiler and architecture differences
> >> causing different results as well....Jeff, do you mind sending me your
> >> the output of "gcc -dumpmachine" and "gcc -dumpspecs" on the machine
> >> you ran vb_suite on?
> >>
> >> I'll set up a 64-bit dev machine going forward so I can test on both
> platforms.
> >>
> >> Thanks,
> >> Stephen
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> http://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20130320/353c7276/attachment-0001.html>

From swlin at post.harvard.edu  Wed Mar 20 20:56:35 2013
From: swlin at post.harvard.edu (Stephen Lin)
Date: Wed, 20 Mar 2013 15:56:35 -0400
Subject: [Pandas-dev] #3089 [PERF: regression from 0.10.1] discussion
In-Reply-To: <CAHMnJKj98pYyr7nEozyRoMWDmph+A5-CuaQj3nzbaaReKY2nqg@mail.gmail.com>
References: <CAP2HvmBoN9Cua6mLJysAPi=wknbFg+JYePDz7dTySE7VTfgbhQ@mail.gmail.com>
	<CAP2HvmAd+ND7MbP--yDvEuwG=PhovfoHyS00Z4KSF0bXmX1uVw@mail.gmail.com>
	<CAP2HvmAvkituZjrmhnsP+BpxSf1xsBim=7vCbA9W6jfFQAEyxQ@mail.gmail.com>
	<CAHMnJKj98pYyr7nEozyRoMWDmph+A5-CuaQj3nzbaaReKY2nqg@mail.gmail.com>
Message-ID: <CAP2HvmAcsDPMD9k5++j-_9nSdiO=9FKkZ-TbAeuD=L_=8TBOZA@mail.gmail.com>

Thanks Jeff!

So ignoring the testing methodology issue for now, I've done the small
fix suggested but apparently it *is* too restrictive because it
negatively affects two other tests that were previously improved (so
the two "degenerate" tests improved 25% by adding the restriction
while these two tests regressed 25%). I will do some more testing to
see if I can find a justifiable way of avoiding this degenerate case,
(hopefully) without hardcoding a magic number... (But maybe we should
just not bother with this degenerate case anyway, perhaps? I'm a fan
of making all improvements monotonic, so I'd prefer not to have to
regress this case even if it's degenerate, but I don't know yet how
reliably I can do that for situations and all processor/compiler/OS
combinations...)

Also, Jeff, I reviewed my vbenches vs the ones you published on GitHub
for this issue, and I think the reason that some of my larger
performance impacts are not shown in your results is because of the
vectorization issue (you ARE on 64-bit, right?)...I'm not 100% sure
but I really think it's likely that it's because x86-64 allows more
vectorization optimizations even without memmove, so the effect of
this optimization is not that great. However, there's plenty of people
still using 32-bit OSes (I have a 64-bit machine but just never
bothered to install 64-bit Ubuntu), so it's definitely worthwhile
still to do this.

In any case, I believe that VC++9 (i.e. 2008) (which still hosts the
pre-built binary windows build still, I think? correct me if I'm
wrong) does rather poorly on vectorization, even when it's allowed.
Worse, though, it's usually not allowed because Windows 32-bit builds
generally have to assume lowest-common-denominator hardware (SSE,
which is from Pentium III, and SSE2, from Pentium IV, only became a
requirements to install Windows with Windows *8* :D) since they are
not compiled on the user machine. (You can only avoid this by
abandoning compatibility with older machines or going through hoops to
detect CPUID at runtime and modifying program behavior accordingly,
which I don't think Cython does.)

Anyway, I'll fill in with more info when I have some.

Stephen

On Wed, Mar 20, 2013 at 2:56 PM, Jeff Reback <jeffreback at gmail.com> wrote:
> awesome explanation Stephen!
>
> I'd vote for #2
>
> essentially create a testing constructor (kind of like y-p's mkdf),
> but creates only a numpy random array, that by default is c-continguous
> (with option for f ), and then use that where we have (EVERYWHERE)!
> np.random.randn.......
>
> and second I guess if it helps, look at the c/f contiguous ness
> of the ops where appropriate...
>
> my 2c
>
>
>
>
> On Wed, Mar 20, 2013 at 2:24 PM, Stephen Lin <swlin at post.harvard.edu> wrote:
>>
>> OK, here goes, the issue is the following...
>>
>> The optimization is question optimizes to row-by-row or
>> column-by-column copying for 2-d arrays when possible, namely when:
>>
>> 1. the input array (where the array in question is Block.values) is
>> c-contiguous for takes along axis0 or f-contiguous for takes along
>> axis1 of the array, and
>> 2. the contiguity of the output array matches the contiguity of the input
>>
>> Almost all the time, Block.values is stored c-contiguously, such that
>> each row of the Block corresponds to a column of the DataFrame. So the
>> optimization only really kicks in, effectively, when reindexing along
>> the column axis of the DataFrame (i.e. axis 0 of the Block); it
>> basically means we call memmove once per DataFrame column rather than
>> iterating in a loop and copying elements.  This is good because most
>> sane DataFrame objects are have more rows than columns, so we call
>> memmove few times (i.e. once per column) for a large block of values
>> (i.e. all rows for that column at a time), so any overhead from
>> calling memmove will be outweighed by the benefit of a hand optimized
>> copy (which probably involves vectorization, alignment/cache
>> optimization, loop unrolling, etc.)
>>
>> C-contiguous blocks result from basically every Pandas operation that
>> operates on blocks, with the only exceptions of (as far as I can tell)
>> creating a DataFrame directly from a 2-d ndarray or creating the
>> transpose of a homogenous DataFrame (but not a heterogenous one)
>> without copying; this is basically an optimization to avoid creating
>> the c-contigous version of an array when the f-contiguous one is
>> already available, but it's the exception rather than the rule and
>> pretty any modification of the DataFrame will immediately require
>> reallocation and copying to a new c-contiguous block.
>>
>> Unfortunately many of the DataFrame tests, including the two in
>> question here, are (for simplicity) only testing the case where a
>> homogenous 2-d data is passed to the DataFrame, which results in
>> (non-representative) f-contiguous blocks. An additional issue with
>> this test is that it's creating a very long but thin array (10,000
>> long, 4 wide) and reindexing along the index dimension, so row-by-row
>> (from the DataFrame perspective) copying is done over and over using
>> memmove on 4 element arrays. Furthermore, the alignment and width in
>> bytes of each 4 element array happens to be a convenient multiple of
>> 128bits, which is the multiple required for vectorized SIMD
>> instructions, so it turns out the element-by-element copying is fairly
>> efficient when such operations are available (as is guaranteed on
>> x86-64, but not necessarily x86-32), and the call to memmove has more
>> overhead than element-by-element copying.
>>
>> So the issue is basically only happening because all the following are
>> true:
>>
>> 1. The DataFrame is constructed directly by a 2-d homogenous ndarray
>> (which has the default c-contiguous continuity, so the block becomes
>> f-contiguous).
>> 2. There has been no operation after construction of the DataFrame
>> requiring reallocation of any sort (otherwise the block would become
>> c-contiguous).
>> 3. The reindexing is done on the index axis (otherwise no optimization
>> would be triggered, since it requires the right axis/contiguity
>> combination).
>> 4. The DataFrame is long but thin (otherwise memmove would not be
>> called repeatedly to do small copies).
>> 5. The C compiler is not inlining memmove properly, for whatever reason,
>> and
>> 6. (possibly) The alignment/width of the data happens to be such that
>> SIMD operations can be used directly, so the overhead of the eliding
>> the loop is not very great and exceeded by the overhead of the
>> memmove.
>>
>> To be honest, it's common C practice to call memmove/memcpy (the
>> performance of the two don't really differ from my testing in this
>> case) even for very small arrays and assuming that the implementation
>> is sane enough to inline it and do the right thing either way, so I'm
>> really surprised about #5: I would not have thought it to be an issue
>> with a modern compiler, since calling memcpy can't do anything but
>> provide the compiler more, not less, information about your intentions
>> (and the overhead of the memmove aliasing check is not significant
>> here).
>>
>> Anyway, so it's a corner case, and I didn't catch it originally
>> because I tested independently the effect of 1) allocates the output
>> array to be f-contiguous instead of c-contiguous by default when the
>> input array is f-contiguous and 2) converting loops into memmove when
>> possible, both of which have a positive performance effect
>> independently but combine to adversely affect these two tests.
>>
>> I can revert the change that "allocates the output array to be
>> f-contiguous instead of c-contiguous by default when the input array
>> is f-contiguous", meaning that this optimization will almost never be
>> triggered for an f-contiguous input array (unless the caller
>> explicitly provides an output array as f-contiguous), but I'd rather
>> not because the optimization is actually kind of useful in less
>> degenerate cases when you want to quickly produce a reindexed version
>> of a f-contiguous array, for whatever reason, even though the cases
>> are rarer.
>>
>> So I think what I'm going to do instead, to avoid the degenerate case
>> above, is to trigger the optimization only when the take operation is
>> done along the shorter of the two dimensions (i.e. so the copied
>> dimension is the longer of the two): that will definitely fix this
>> test (since it'll avoid this optimization completely) but I suppose
>> there might be other degenerate cases I haven't thought about it. I'll
>> submit a PR later today for this, if no one finds any objection to the
>> idea.
>>
>> However, I think it might be skewed our performance results to be
>> testing DataFrame objects constructed by 2-d ndarrays, since they're
>> not representative; in addition to the issue above, it means that many
>> tests are actually incorporating the cost of converting an
>> f-contiguous array into a c-contiguous array on top of what they're
>> actually trying to test. Two possible solutions are:
>>
>> 1. Change DataFrame constructor (and possibly DataFrame.T) to
>> normalize all blocks as c-contiguous.
>> 2. Leave DataFrame constructor as-is but either change existing tests
>> to exercise the more common use case (c-contiguous blocks) or add them
>> in addition to the current ones.
>>
>> I think #2 is probably best, since #1 will have a performance impact
>> for the use cases (however rare) where an entire workflow can avoid
>> triggering conversion from f-contiguous blocks to c-contiguous blocks.
>>
>> Let me know what you all think,
>> Stephen
>>
>> On Wed, Mar 20, 2013 at 1:25 AM, Stephen Lin <swlin at post.harvard.edu>
>> wrote:
>> > Ahh! I figured it out...the platform issue is part of it, but mostly
>> > it's that two (independently tested) commits had a weird effect when
>> > merged.
>> >
>> > And the reason they did so is because this particular test turns out
>> > all of our reindexing tests are testing something very
>> > non-representative, because of the way they're constructed, so we're
>> > not really getting representative performance data unfortunately (it
>> > has to do with the DataFrame constructor and c-contiguity vs
>> > f-contiguity). We should probably write new tests to fix this issue.
>> >
>> > I'll write up a fuller explanation when I get a chance. Anyway, sorry
>> > for sending you on a git bisect goose chase, Jeff.
>> >
>> > Stephen
>> >
>> > On Wed, Mar 20, 2013 at 1:01 AM, Stephen Lin <swlin at post.harvard.edu>
>> > wrote:
>> >> As per the "we're getting too chatty on GitHub" comment, should we be
>> >> moving extended issue discussion about bugs to this list whenever
>> >> possible?
>> >>
>> >> I posted a few comments on #3089 just now but realized maybe starting
>> >> an e-mail chain would be better..
>> >>
>> >> Anyway, I'm looking into the issue, I suspect it's a corner case due
>> >> to an array that's very large in one dimension but small in another,
>> >> and possibly that there's compiler and architecture differences
>> >> causing different results as well....Jeff, do you mind sending me your
>> >> the output of "gcc -dumpmachine" and "gcc -dumpspecs" on the machine
>> >> you ran vb_suite on?
>> >>
>> >> I'll set up a 64-bit dev machine going forward so I can test on both
>> >> platforms.
>> >>
>> >> Thanks,
>> >> Stephen
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> http://mail.python.org/mailman/listinfo/pandas-dev
>
>
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> http://mail.python.org/mailman/listinfo/pandas-dev
>

From yoval at gmx.com  Wed Mar 20 21:05:14 2013
From: yoval at gmx.com (yoval p.)
Date: Wed, 20 Mar 2013 21:05:14 +0100
Subject: [Pandas-dev] Fast py2/py3 testing, fast vbench
Message-ID: <20130320200515.28610@gmx.com>

I've made some improvement to the tooling we have
for development, just making sure everyone is
aware of what's available.

- closed GH3099, caching cython build artifacts
when running setup.py.
- setup.py now checks for BUILD_CACHE_DIR envar
so you can enable it without touch the source code
- Once enabled, with a warm cache testing py26/27/32/33
takes only a couple of minutes compares with travis' ~15
on a quad core machine
- if caching is enabled (for future commits, the envar is sufficiant)
test_perf.sh will run much faster.
- i've added an option to filter vbench by regex when running
test_perf.sh.

Quick iteration makes everything easier, I hope these
changes do that.

Here's an example of all of the above, comparing two adjacent
commits on a reduced set of vbenches in 1min flat:

? export BUILD_CACHE_DIR="/tmp/.pandas_build_cache/"
? time ./test_perf.sh -b 18c7e6c -t 18c7e6c^ -r reindex
...
Results:
 t_head t_baseline ratio
name 
dataframe_reindex 0.3726 0.3726 1.0000
reindex_fillna_backfill_float32 0.0961 0.0961 1.0000
reindex_fillna_pad_float32 0.0959 0.0959 1.0000
frame_reindex_upcast 17.7334 17.7334 1.0000
reindex_daterange_backfill 0.1649 0.1649 1.0000
reindex_fillna_pad 0.1052 0.1052 1.0000
reindex_daterange_pad 0.1757 0.1757 1.0000
reindex_frame_level_align 1.0109 1.0109 1.0000
reindex_fillna_backfill 0.1035 0.1035 1.0000
reindex_frame_level_reindex 0.9586 0.9586 1.0000
frame_reindex_columns 0.3101 0.3101 1.0000
reindex_multiindex 1.1427 1.1427 1.0000

Columns: test_name | target_duration [ms] | baseline_duration [ms] | ratio

- a Ratio of 1.30 means the target commit is 30% slower then the baseline.

Target [18c7e6c] : BLD: check for BUILD_CACHE_DIR envar in setup.py
Baseline [18c7e6c] : BLD: check for BUILD_CACHE_DIR envar in setup.py


*** Results were also written to the logfile at '/home/user1/src/pandas/vb_suite.log'


real 0m58.561s
user 0m52.699s
sys 0m1.645s
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20130320/13d38ebd/attachment.html>

From swlin at post.harvard.edu  Wed Mar 20 21:07:15 2013
From: swlin at post.harvard.edu (Stephen Lin)
Date: Wed, 20 Mar 2013 16:07:15 -0400
Subject: [Pandas-dev] #3089 [PERF: regression from 0.10.1] discussion
In-Reply-To: <25AE4CC2-1087-4332-92F2-CD11B8080F03@yahoo.com>
References: <CAP2HvmBoN9Cua6mLJysAPi=wknbFg+JYePDz7dTySE7VTfgbhQ@mail.gmail.com>
	<CAP2HvmAd+ND7MbP--yDvEuwG=PhovfoHyS00Z4KSF0bXmX1uVw@mail.gmail.com>
	<CAP2HvmAvkituZjrmhnsP+BpxSf1xsBim=7vCbA9W6jfFQAEyxQ@mail.gmail.com>
	<CAHMnJKj98pYyr7nEozyRoMWDmph+A5-CuaQj3nzbaaReKY2nqg@mail.gmail.com>
	<CAP2HvmAcsDPMD9k5++j-_9nSdiO=9FKkZ-TbAeuD=L_=8TBOZA@mail.gmail.com>
	<25AE4CC2-1087-4332-92F2-CD11B8080F03@yahoo.com>
Message-ID: <CAP2HvmDuJ967PP77zk4TFhr+3=cjFLqOq1fxY7-ReJWkHWOGTA@mail.gmail.com>

So we can just ignore this and write new tests, if everyone thinks
that's appropriate. Anyone else have thoughts?

I'd prefer monotonic improvement myself, honestly...but it might not
be a reasonable ideal to strive for in this case.

Stephen

On Wed, Mar 20, 2013 at 4:02 PM, Jeff Reback <jreback at yahoo.com> wrote:
> I am on 64bit Linux (I use windows too, but try to avoid whenever possible!)
>
> I agree with your assessment wrt 32/64
> and perf -
>
> I am not sure that these corner cases r that big a deal, more important I think is that we test the most common cases for perf
>
>
> On Mar 20, 2013, at 3:56 PM, Stephen Lin <swlin at post.harvard.edu> wrote:
>
>> Thanks Jeff!
>>
>> So ignoring the testing methodology issue for now, I've done the small
>> fix suggested but apparently it *is* too restrictive because it
>> negatively affects two other tests that were previously improved (so
>> the two "degenerate" tests improved 25% by adding the restriction
>> while these two tests regressed 25%). I will do some more testing to
>> see if I can find a justifiable way of avoiding this degenerate case,
>> (hopefully) without hardcoding a magic number... (But maybe we should
>> just not bother with this degenerate case anyway, perhaps? I'm a fan
>> of making all improvements monotonic, so I'd prefer not to have to
>> regress this case even if it's degenerate, but I don't know yet how
>> reliably I can do that for situations and all processor/compiler/OS
>> combinations...)
>>
>> Also, Jeff, I reviewed my vbenches vs the ones you published on GitHub
>> for this issue, and I think the reason that some of my larger
>> performance impacts are not shown in your results is because of the
>> vectorization issue (you ARE on 64-bit, right?)...I'm not 100% sure
>> but I really think it's likely that it's because x86-64 allows more
>> vectorization optimizations even without memmove, so the effect of
>> this optimization is not that great. However, there's plenty of people
>> still using 32-bit OSes (I have a 64-bit machine but just never
>> bothered to install 64-bit Ubuntu), so it's definitely worthwhile
>> still to do this.
>>
>> In any case, I believe that VC++9 (i.e. 2008) (which still hosts the
>> pre-built binary windows build still, I think? correct me if I'm
>> wrong) does rather poorly on vectorization, even when it's allowed.
>> Worse, though, it's usually not allowed because Windows 32-bit builds
>> generally have to assume lowest-common-denominator hardware (SSE,
>> which is from Pentium III, and SSE2, from Pentium IV, only became a
>> requirements to install Windows with Windows *8* :D) since they are
>> not compiled on the user machine. (You can only avoid this by
>> abandoning compatibility with older machines or going through hoops to
>> detect CPUID at runtime and modifying program behavior accordingly,
>> which I don't think Cython does.)
>>
>> Anyway, I'll fill in with more info when I have some.
>>
>> Stephen
>>
>> On Wed, Mar 20, 2013 at 2:56 PM, Jeff Reback <jeffreback at gmail.com> wrote:
>>> awesome explanation Stephen!
>>>
>>> I'd vote for #2
>>>
>>> essentially create a testing constructor (kind of like y-p's mkdf),
>>> but creates only a numpy random array, that by default is c-continguous
>>> (with option for f ), and then use that where we have (EVERYWHERE)!
>>> np.random.randn.......
>>>
>>> and second I guess if it helps, look at the c/f contiguous ness
>>> of the ops where appropriate...
>>>
>>> my 2c
>>>
>>>
>>>
>>>
>>> On Wed, Mar 20, 2013 at 2:24 PM, Stephen Lin <swlin at post.harvard.edu> wrote:
>>>>
>>>> OK, here goes, the issue is the following...
>>>>
>>>> The optimization is question optimizes to row-by-row or
>>>> column-by-column copying for 2-d arrays when possible, namely when:
>>>>
>>>> 1. the input array (where the array in question is Block.values) is
>>>> c-contiguous for takes along axis0 or f-contiguous for takes along
>>>> axis1 of the array, and
>>>> 2. the contiguity of the output array matches the contiguity of the input
>>>>
>>>> Almost all the time, Block.values is stored c-contiguously, such that
>>>> each row of the Block corresponds to a column of the DataFrame. So the
>>>> optimization only really kicks in, effectively, when reindexing along
>>>> the column axis of the DataFrame (i.e. axis 0 of the Block); it
>>>> basically means we call memmove once per DataFrame column rather than
>>>> iterating in a loop and copying elements.  This is good because most
>>>> sane DataFrame objects are have more rows than columns, so we call
>>>> memmove few times (i.e. once per column) for a large block of values
>>>> (i.e. all rows for that column at a time), so any overhead from
>>>> calling memmove will be outweighed by the benefit of a hand optimized
>>>> copy (which probably involves vectorization, alignment/cache
>>>> optimization, loop unrolling, etc.)
>>>>
>>>> C-contiguous blocks result from basically every Pandas operation that
>>>> operates on blocks, with the only exceptions of (as far as I can tell)
>>>> creating a DataFrame directly from a 2-d ndarray or creating the
>>>> transpose of a homogenous DataFrame (but not a heterogenous one)
>>>> without copying; this is basically an optimization to avoid creating
>>>> the c-contigous version of an array when the f-contiguous one is
>>>> already available, but it's the exception rather than the rule and
>>>> pretty any modification of the DataFrame will immediately require
>>>> reallocation and copying to a new c-contiguous block.
>>>>
>>>> Unfortunately many of the DataFrame tests, including the two in
>>>> question here, are (for simplicity) only testing the case where a
>>>> homogenous 2-d data is passed to the DataFrame, which results in
>>>> (non-representative) f-contiguous blocks. An additional issue with
>>>> this test is that it's creating a very long but thin array (10,000
>>>> long, 4 wide) and reindexing along the index dimension, so row-by-row
>>>> (from the DataFrame perspective) copying is done over and over using
>>>> memmove on 4 element arrays. Furthermore, the alignment and width in
>>>> bytes of each 4 element array happens to be a convenient multiple of
>>>> 128bits, which is the multiple required for vectorized SIMD
>>>> instructions, so it turns out the element-by-element copying is fairly
>>>> efficient when such operations are available (as is guaranteed on
>>>> x86-64, but not necessarily x86-32), and the call to memmove has more
>>>> overhead than element-by-element copying.
>>>>
>>>> So the issue is basically only happening because all the following are
>>>> true:
>>>>
>>>> 1. The DataFrame is constructed directly by a 2-d homogenous ndarray
>>>> (which has the default c-contiguous continuity, so the block becomes
>>>> f-contiguous).
>>>> 2. There has been no operation after construction of the DataFrame
>>>> requiring reallocation of any sort (otherwise the block would become
>>>> c-contiguous).
>>>> 3. The reindexing is done on the index axis (otherwise no optimization
>>>> would be triggered, since it requires the right axis/contiguity
>>>> combination).
>>>> 4. The DataFrame is long but thin (otherwise memmove would not be
>>>> called repeatedly to do small copies).
>>>> 5. The C compiler is not inlining memmove properly, for whatever reason,
>>>> and
>>>> 6. (possibly) The alignment/width of the data happens to be such that
>>>> SIMD operations can be used directly, so the overhead of the eliding
>>>> the loop is not very great and exceeded by the overhead of the
>>>> memmove.
>>>>
>>>> To be honest, it's common C practice to call memmove/memcpy (the
>>>> performance of the two don't really differ from my testing in this
>>>> case) even for very small arrays and assuming that the implementation
>>>> is sane enough to inline it and do the right thing either way, so I'm
>>>> really surprised about #5: I would not have thought it to be an issue
>>>> with a modern compiler, since calling memcpy can't do anything but
>>>> provide the compiler more, not less, information about your intentions
>>>> (and the overhead of the memmove aliasing check is not significant
>>>> here).
>>>>
>>>> Anyway, so it's a corner case, and I didn't catch it originally
>>>> because I tested independently the effect of 1) allocates the output
>>>> array to be f-contiguous instead of c-contiguous by default when the
>>>> input array is f-contiguous and 2) converting loops into memmove when
>>>> possible, both of which have a positive performance effect
>>>> independently but combine to adversely affect these two tests.
>>>>
>>>> I can revert the change that "allocates the output array to be
>>>> f-contiguous instead of c-contiguous by default when the input array
>>>> is f-contiguous", meaning that this optimization will almost never be
>>>> triggered for an f-contiguous input array (unless the caller
>>>> explicitly provides an output array as f-contiguous), but I'd rather
>>>> not because the optimization is actually kind of useful in less
>>>> degenerate cases when you want to quickly produce a reindexed version
>>>> of a f-contiguous array, for whatever reason, even though the cases
>>>> are rarer.
>>>>
>>>> So I think what I'm going to do instead, to avoid the degenerate case
>>>> above, is to trigger the optimization only when the take operation is
>>>> done along the shorter of the two dimensions (i.e. so the copied
>>>> dimension is the longer of the two): that will definitely fix this
>>>> test (since it'll avoid this optimization completely) but I suppose
>>>> there might be other degenerate cases I haven't thought about it. I'll
>>>> submit a PR later today for this, if no one finds any objection to the
>>>> idea.
>>>>
>>>> However, I think it might be skewed our performance results to be
>>>> testing DataFrame objects constructed by 2-d ndarrays, since they're
>>>> not representative; in addition to the issue above, it means that many
>>>> tests are actually incorporating the cost of converting an
>>>> f-contiguous array into a c-contiguous array on top of what they're
>>>> actually trying to test. Two possible solutions are:
>>>>
>>>> 1. Change DataFrame constructor (and possibly DataFrame.T) to
>>>> normalize all blocks as c-contiguous.
>>>> 2. Leave DataFrame constructor as-is but either change existing tests
>>>> to exercise the more common use case (c-contiguous blocks) or add them
>>>> in addition to the current ones.
>>>>
>>>> I think #2 is probably best, since #1 will have a performance impact
>>>> for the use cases (however rare) where an entire workflow can avoid
>>>> triggering conversion from f-contiguous blocks to c-contiguous blocks.
>>>>
>>>> Let me know what you all think,
>>>> Stephen
>>>>
>>>> On Wed, Mar 20, 2013 at 1:25 AM, Stephen Lin <swlin at post.harvard.edu>
>>>> wrote:
>>>>> Ahh! I figured it out...the platform issue is part of it, but mostly
>>>>> it's that two (independently tested) commits had a weird effect when
>>>>> merged.
>>>>>
>>>>> And the reason they did so is because this particular test turns out
>>>>> all of our reindexing tests are testing something very
>>>>> non-representative, because of the way they're constructed, so we're
>>>>> not really getting representative performance data unfortunately (it
>>>>> has to do with the DataFrame constructor and c-contiguity vs
>>>>> f-contiguity). We should probably write new tests to fix this issue.
>>>>>
>>>>> I'll write up a fuller explanation when I get a chance. Anyway, sorry
>>>>> for sending you on a git bisect goose chase, Jeff.
>>>>>
>>>>> Stephen
>>>>>
>>>>> On Wed, Mar 20, 2013 at 1:01 AM, Stephen Lin <swlin at post.harvard.edu>
>>>>> wrote:
>>>>>> As per the "we're getting too chatty on GitHub" comment, should we be
>>>>>> moving extended issue discussion about bugs to this list whenever
>>>>>> possible?
>>>>>>
>>>>>> I posted a few comments on #3089 just now but realized maybe starting
>>>>>> an e-mail chain would be better..
>>>>>>
>>>>>> Anyway, I'm looking into the issue, I suspect it's a corner case due
>>>>>> to an array that's very large in one dimension but small in another,
>>>>>> and possibly that there's compiler and architecture differences
>>>>>> causing different results as well....Jeff, do you mind sending me your
>>>>>> the output of "gcc -dumpmachine" and "gcc -dumpspecs" on the machine
>>>>>> you ran vb_suite on?
>>>>>>
>>>>>> I'll set up a 64-bit dev machine going forward so I can test on both
>>>>>> platforms.
>>>>>>
>>>>>> Thanks,
>>>>>> Stephen
>>>> _______________________________________________
>>>> Pandas-dev mailing list
>>>> Pandas-dev at python.org
>>>> http://mail.python.org/mailman/listinfo/pandas-dev
>>>
>>>
>>>
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> http://mail.python.org/mailman/listinfo/pandas-dev
>>>

From swlin at post.harvard.edu  Wed Mar 20 21:12:19 2013
From: swlin at post.harvard.edu (Stephen Lin)
Date: Wed, 20 Mar 2013 16:12:19 -0400
Subject: [Pandas-dev] #3089 [PERF: regression from 0.10.1] discussion
In-Reply-To: <CAP2HvmDuJ967PP77zk4TFhr+3=cjFLqOq1fxY7-ReJWkHWOGTA@mail.gmail.com>
References: <CAP2HvmBoN9Cua6mLJysAPi=wknbFg+JYePDz7dTySE7VTfgbhQ@mail.gmail.com>
	<CAP2HvmAd+ND7MbP--yDvEuwG=PhovfoHyS00Z4KSF0bXmX1uVw@mail.gmail.com>
	<CAP2HvmAvkituZjrmhnsP+BpxSf1xsBim=7vCbA9W6jfFQAEyxQ@mail.gmail.com>
	<CAHMnJKj98pYyr7nEozyRoMWDmph+A5-CuaQj3nzbaaReKY2nqg@mail.gmail.com>
	<CAP2HvmAcsDPMD9k5++j-_9nSdiO=9FKkZ-TbAeuD=L_=8TBOZA@mail.gmail.com>
	<25AE4CC2-1087-4332-92F2-CD11B8080F03@yahoo.com>
	<CAP2HvmDuJ967PP77zk4TFhr+3=cjFLqOq1fxY7-ReJWkHWOGTA@mail.gmail.com>
Message-ID: <CAP2HvmD_S9au=LePWHAEp_71x6=1Lub+_Hu+P8MY1RZ1BGOTOQ@mail.gmail.com>

(also, just one more thing to note...native numpy take operations ARE
using memmove, so it's possible that prior to using memmove ourselves,
we were actually performing more poorly than numpy with our Cython
takes in some cases, if the host platform/compiler/OS optimizes
memmove much better than normal array ops...which is not out of the
question for Windows builds)

On Wed, Mar 20, 2013 at 4:07 PM, Stephen Lin <swlin at post.harvard.edu> wrote:
> So we can just ignore this and write new tests, if everyone thinks
> that's appropriate. Anyone else have thoughts?
>
> I'd prefer monotonic improvement myself, honestly...but it might not
> be a reasonable ideal to strive for in this case.
>
> Stephen
>
> On Wed, Mar 20, 2013 at 4:02 PM, Jeff Reback <jreback at yahoo.com> wrote:
>> I am on 64bit Linux (I use windows too, but try to avoid whenever possible!)
>>
>> I agree with your assessment wrt 32/64
>> and perf -
>>
>> I am not sure that these corner cases r that big a deal, more important I think is that we test the most common cases for perf
>>
>>
>> On Mar 20, 2013, at 3:56 PM, Stephen Lin <swlin at post.harvard.edu> wrote:
>>
>>> Thanks Jeff!
>>>
>>> So ignoring the testing methodology issue for now, I've done the small
>>> fix suggested but apparently it *is* too restrictive because it
>>> negatively affects two other tests that were previously improved (so
>>> the two "degenerate" tests improved 25% by adding the restriction
>>> while these two tests regressed 25%). I will do some more testing to
>>> see if I can find a justifiable way of avoiding this degenerate case,
>>> (hopefully) without hardcoding a magic number... (But maybe we should
>>> just not bother with this degenerate case anyway, perhaps? I'm a fan
>>> of making all improvements monotonic, so I'd prefer not to have to
>>> regress this case even if it's degenerate, but I don't know yet how
>>> reliably I can do that for situations and all processor/compiler/OS
>>> combinations...)
>>>
>>> Also, Jeff, I reviewed my vbenches vs the ones you published on GitHub
>>> for this issue, and I think the reason that some of my larger
>>> performance impacts are not shown in your results is because of the
>>> vectorization issue (you ARE on 64-bit, right?)...I'm not 100% sure
>>> but I really think it's likely that it's because x86-64 allows more
>>> vectorization optimizations even without memmove, so the effect of
>>> this optimization is not that great. However, there's plenty of people
>>> still using 32-bit OSes (I have a 64-bit machine but just never
>>> bothered to install 64-bit Ubuntu), so it's definitely worthwhile
>>> still to do this.
>>>
>>> In any case, I believe that VC++9 (i.e. 2008) (which still hosts the
>>> pre-built binary windows build still, I think? correct me if I'm
>>> wrong) does rather poorly on vectorization, even when it's allowed.
>>> Worse, though, it's usually not allowed because Windows 32-bit builds
>>> generally have to assume lowest-common-denominator hardware (SSE,
>>> which is from Pentium III, and SSE2, from Pentium IV, only became a
>>> requirements to install Windows with Windows *8* :D) since they are
>>> not compiled on the user machine. (You can only avoid this by
>>> abandoning compatibility with older machines or going through hoops to
>>> detect CPUID at runtime and modifying program behavior accordingly,
>>> which I don't think Cython does.)
>>>
>>> Anyway, I'll fill in with more info when I have some.
>>>
>>> Stephen
>>>
>>> On Wed, Mar 20, 2013 at 2:56 PM, Jeff Reback <jeffreback at gmail.com> wrote:
>>>> awesome explanation Stephen!
>>>>
>>>> I'd vote for #2
>>>>
>>>> essentially create a testing constructor (kind of like y-p's mkdf),
>>>> but creates only a numpy random array, that by default is c-continguous
>>>> (with option for f ), and then use that where we have (EVERYWHERE)!
>>>> np.random.randn.......
>>>>
>>>> and second I guess if it helps, look at the c/f contiguous ness
>>>> of the ops where appropriate...
>>>>
>>>> my 2c
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 2:24 PM, Stephen Lin <swlin at post.harvard.edu> wrote:
>>>>>
>>>>> OK, here goes, the issue is the following...
>>>>>
>>>>> The optimization is question optimizes to row-by-row or
>>>>> column-by-column copying for 2-d arrays when possible, namely when:
>>>>>
>>>>> 1. the input array (where the array in question is Block.values) is
>>>>> c-contiguous for takes along axis0 or f-contiguous for takes along
>>>>> axis1 of the array, and
>>>>> 2. the contiguity of the output array matches the contiguity of the input
>>>>>
>>>>> Almost all the time, Block.values is stored c-contiguously, such that
>>>>> each row of the Block corresponds to a column of the DataFrame. So the
>>>>> optimization only really kicks in, effectively, when reindexing along
>>>>> the column axis of the DataFrame (i.e. axis 0 of the Block); it
>>>>> basically means we call memmove once per DataFrame column rather than
>>>>> iterating in a loop and copying elements.  This is good because most
>>>>> sane DataFrame objects are have more rows than columns, so we call
>>>>> memmove few times (i.e. once per column) for a large block of values
>>>>> (i.e. all rows for that column at a time), so any overhead from
>>>>> calling memmove will be outweighed by the benefit of a hand optimized
>>>>> copy (which probably involves vectorization, alignment/cache
>>>>> optimization, loop unrolling, etc.)
>>>>>
>>>>> C-contiguous blocks result from basically every Pandas operation that
>>>>> operates on blocks, with the only exceptions of (as far as I can tell)
>>>>> creating a DataFrame directly from a 2-d ndarray or creating the
>>>>> transpose of a homogenous DataFrame (but not a heterogenous one)
>>>>> without copying; this is basically an optimization to avoid creating
>>>>> the c-contigous version of an array when the f-contiguous one is
>>>>> already available, but it's the exception rather than the rule and
>>>>> pretty any modification of the DataFrame will immediately require
>>>>> reallocation and copying to a new c-contiguous block.
>>>>>
>>>>> Unfortunately many of the DataFrame tests, including the two in
>>>>> question here, are (for simplicity) only testing the case where a
>>>>> homogenous 2-d data is passed to the DataFrame, which results in
>>>>> (non-representative) f-contiguous blocks. An additional issue with
>>>>> this test is that it's creating a very long but thin array (10,000
>>>>> long, 4 wide) and reindexing along the index dimension, so row-by-row
>>>>> (from the DataFrame perspective) copying is done over and over using
>>>>> memmove on 4 element arrays. Furthermore, the alignment and width in
>>>>> bytes of each 4 element array happens to be a convenient multiple of
>>>>> 128bits, which is the multiple required for vectorized SIMD
>>>>> instructions, so it turns out the element-by-element copying is fairly
>>>>> efficient when such operations are available (as is guaranteed on
>>>>> x86-64, but not necessarily x86-32), and the call to memmove has more
>>>>> overhead than element-by-element copying.
>>>>>
>>>>> So the issue is basically only happening because all the following are
>>>>> true:
>>>>>
>>>>> 1. The DataFrame is constructed directly by a 2-d homogenous ndarray
>>>>> (which has the default c-contiguous continuity, so the block becomes
>>>>> f-contiguous).
>>>>> 2. There has been no operation after construction of the DataFrame
>>>>> requiring reallocation of any sort (otherwise the block would become
>>>>> c-contiguous).
>>>>> 3. The reindexing is done on the index axis (otherwise no optimization
>>>>> would be triggered, since it requires the right axis/contiguity
>>>>> combination).
>>>>> 4. The DataFrame is long but thin (otherwise memmove would not be
>>>>> called repeatedly to do small copies).
>>>>> 5. The C compiler is not inlining memmove properly, for whatever reason,
>>>>> and
>>>>> 6. (possibly) The alignment/width of the data happens to be such that
>>>>> SIMD operations can be used directly, so the overhead of the eliding
>>>>> the loop is not very great and exceeded by the overhead of the
>>>>> memmove.
>>>>>
>>>>> To be honest, it's common C practice to call memmove/memcpy (the
>>>>> performance of the two don't really differ from my testing in this
>>>>> case) even for very small arrays and assuming that the implementation
>>>>> is sane enough to inline it and do the right thing either way, so I'm
>>>>> really surprised about #5: I would not have thought it to be an issue
>>>>> with a modern compiler, since calling memcpy can't do anything but
>>>>> provide the compiler more, not less, information about your intentions
>>>>> (and the overhead of the memmove aliasing check is not significant
>>>>> here).
>>>>>
>>>>> Anyway, so it's a corner case, and I didn't catch it originally
>>>>> because I tested independently the effect of 1) allocates the output
>>>>> array to be f-contiguous instead of c-contiguous by default when the
>>>>> input array is f-contiguous and 2) converting loops into memmove when
>>>>> possible, both of which have a positive performance effect
>>>>> independently but combine to adversely affect these two tests.
>>>>>
>>>>> I can revert the change that "allocates the output array to be
>>>>> f-contiguous instead of c-contiguous by default when the input array
>>>>> is f-contiguous", meaning that this optimization will almost never be
>>>>> triggered for an f-contiguous input array (unless the caller
>>>>> explicitly provides an output array as f-contiguous), but I'd rather
>>>>> not because the optimization is actually kind of useful in less
>>>>> degenerate cases when you want to quickly produce a reindexed version
>>>>> of a f-contiguous array, for whatever reason, even though the cases
>>>>> are rarer.
>>>>>
>>>>> So I think what I'm going to do instead, to avoid the degenerate case
>>>>> above, is to trigger the optimization only when the take operation is
>>>>> done along the shorter of the two dimensions (i.e. so the copied
>>>>> dimension is the longer of the two): that will definitely fix this
>>>>> test (since it'll avoid this optimization completely) but I suppose
>>>>> there might be other degenerate cases I haven't thought about it. I'll
>>>>> submit a PR later today for this, if no one finds any objection to the
>>>>> idea.
>>>>>
>>>>> However, I think it might be skewed our performance results to be
>>>>> testing DataFrame objects constructed by 2-d ndarrays, since they're
>>>>> not representative; in addition to the issue above, it means that many
>>>>> tests are actually incorporating the cost of converting an
>>>>> f-contiguous array into a c-contiguous array on top of what they're
>>>>> actually trying to test. Two possible solutions are:
>>>>>
>>>>> 1. Change DataFrame constructor (and possibly DataFrame.T) to
>>>>> normalize all blocks as c-contiguous.
>>>>> 2. Leave DataFrame constructor as-is but either change existing tests
>>>>> to exercise the more common use case (c-contiguous blocks) or add them
>>>>> in addition to the current ones.
>>>>>
>>>>> I think #2 is probably best, since #1 will have a performance impact
>>>>> for the use cases (however rare) where an entire workflow can avoid
>>>>> triggering conversion from f-contiguous blocks to c-contiguous blocks.
>>>>>
>>>>> Let me know what you all think,
>>>>> Stephen
>>>>>
>>>>> On Wed, Mar 20, 2013 at 1:25 AM, Stephen Lin <swlin at post.harvard.edu>
>>>>> wrote:
>>>>>> Ahh! I figured it out...the platform issue is part of it, but mostly
>>>>>> it's that two (independently tested) commits had a weird effect when
>>>>>> merged.
>>>>>>
>>>>>> And the reason they did so is because this particular test turns out
>>>>>> all of our reindexing tests are testing something very
>>>>>> non-representative, because of the way they're constructed, so we're
>>>>>> not really getting representative performance data unfortunately (it
>>>>>> has to do with the DataFrame constructor and c-contiguity vs
>>>>>> f-contiguity). We should probably write new tests to fix this issue.
>>>>>>
>>>>>> I'll write up a fuller explanation when I get a chance. Anyway, sorry
>>>>>> for sending you on a git bisect goose chase, Jeff.
>>>>>>
>>>>>> Stephen
>>>>>>
>>>>>> On Wed, Mar 20, 2013 at 1:01 AM, Stephen Lin <swlin at post.harvard.edu>
>>>>>> wrote:
>>>>>>> As per the "we're getting too chatty on GitHub" comment, should we be
>>>>>>> moving extended issue discussion about bugs to this list whenever
>>>>>>> possible?
>>>>>>>
>>>>>>> I posted a few comments on #3089 just now but realized maybe starting
>>>>>>> an e-mail chain would be better..
>>>>>>>
>>>>>>> Anyway, I'm looking into the issue, I suspect it's a corner case due
>>>>>>> to an array that's very large in one dimension but small in another,
>>>>>>> and possibly that there's compiler and architecture differences
>>>>>>> causing different results as well....Jeff, do you mind sending me your
>>>>>>> the output of "gcc -dumpmachine" and "gcc -dumpspecs" on the machine
>>>>>>> you ran vb_suite on?
>>>>>>>
>>>>>>> I'll set up a 64-bit dev machine going forward so I can test on both
>>>>>>> platforms.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Stephen
>>>>> _______________________________________________
>>>>> Pandas-dev mailing list
>>>>> Pandas-dev at python.org
>>>>> http://mail.python.org/mailman/listinfo/pandas-dev
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pandas-dev mailing list
>>>> Pandas-dev at python.org
>>>> http://mail.python.org/mailman/listinfo/pandas-dev
>>>>

From jreback at yahoo.com  Wed Mar 20 21:02:39 2013
From: jreback at yahoo.com (Jeff Reback)
Date: Wed, 20 Mar 2013 16:02:39 -0400
Subject: [Pandas-dev] #3089 [PERF: regression from 0.10.1] discussion
In-Reply-To: <CAP2HvmAcsDPMD9k5++j-_9nSdiO=9FKkZ-TbAeuD=L_=8TBOZA@mail.gmail.com>
References: <CAP2HvmBoN9Cua6mLJysAPi=wknbFg+JYePDz7dTySE7VTfgbhQ@mail.gmail.com>
	<CAP2HvmAd+ND7MbP--yDvEuwG=PhovfoHyS00Z4KSF0bXmX1uVw@mail.gmail.com>
	<CAP2HvmAvkituZjrmhnsP+BpxSf1xsBim=7vCbA9W6jfFQAEyxQ@mail.gmail.com>
	<CAHMnJKj98pYyr7nEozyRoMWDmph+A5-CuaQj3nzbaaReKY2nqg@mail.gmail.com>
	<CAP2HvmAcsDPMD9k5++j-_9nSdiO=9FKkZ-TbAeuD=L_=8TBOZA@mail.gmail.com>
Message-ID: <25AE4CC2-1087-4332-92F2-CD11B8080F03@yahoo.com>

I am on 64bit Linux (I use windows too, but try to avoid whenever possible!)

I agree with your assessment wrt 32/64
and perf - 

I am not sure that these corner cases r that big a deal, more important I think is that we test the most common cases for perf


On Mar 20, 2013, at 3:56 PM, Stephen Lin <swlin at post.harvard.edu> wrote:

> Thanks Jeff!
> 
> So ignoring the testing methodology issue for now, I've done the small
> fix suggested but apparently it *is* too restrictive because it
> negatively affects two other tests that were previously improved (so
> the two "degenerate" tests improved 25% by adding the restriction
> while these two tests regressed 25%). I will do some more testing to
> see if I can find a justifiable way of avoiding this degenerate case,
> (hopefully) without hardcoding a magic number... (But maybe we should
> just not bother with this degenerate case anyway, perhaps? I'm a fan
> of making all improvements monotonic, so I'd prefer not to have to
> regress this case even if it's degenerate, but I don't know yet how
> reliably I can do that for situations and all processor/compiler/OS
> combinations...)
> 
> Also, Jeff, I reviewed my vbenches vs the ones you published on GitHub
> for this issue, and I think the reason that some of my larger
> performance impacts are not shown in your results is because of the
> vectorization issue (you ARE on 64-bit, right?)...I'm not 100% sure
> but I really think it's likely that it's because x86-64 allows more
> vectorization optimizations even without memmove, so the effect of
> this optimization is not that great. However, there's plenty of people
> still using 32-bit OSes (I have a 64-bit machine but just never
> bothered to install 64-bit Ubuntu), so it's definitely worthwhile
> still to do this.
> 
> In any case, I believe that VC++9 (i.e. 2008) (which still hosts the
> pre-built binary windows build still, I think? correct me if I'm
> wrong) does rather poorly on vectorization, even when it's allowed.
> Worse, though, it's usually not allowed because Windows 32-bit builds
> generally have to assume lowest-common-denominator hardware (SSE,
> which is from Pentium III, and SSE2, from Pentium IV, only became a
> requirements to install Windows with Windows *8* :D) since they are
> not compiled on the user machine. (You can only avoid this by
> abandoning compatibility with older machines or going through hoops to
> detect CPUID at runtime and modifying program behavior accordingly,
> which I don't think Cython does.)
> 
> Anyway, I'll fill in with more info when I have some.
> 
> Stephen
> 
> On Wed, Mar 20, 2013 at 2:56 PM, Jeff Reback <jeffreback at gmail.com> wrote:
>> awesome explanation Stephen!
>> 
>> I'd vote for #2
>> 
>> essentially create a testing constructor (kind of like y-p's mkdf),
>> but creates only a numpy random array, that by default is c-continguous
>> (with option for f ), and then use that where we have (EVERYWHERE)!
>> np.random.randn.......
>> 
>> and second I guess if it helps, look at the c/f contiguous ness
>> of the ops where appropriate...
>> 
>> my 2c
>> 
>> 
>> 
>> 
>> On Wed, Mar 20, 2013 at 2:24 PM, Stephen Lin <swlin at post.harvard.edu> wrote:
>>> 
>>> OK, here goes, the issue is the following...
>>> 
>>> The optimization is question optimizes to row-by-row or
>>> column-by-column copying for 2-d arrays when possible, namely when:
>>> 
>>> 1. the input array (where the array in question is Block.values) is
>>> c-contiguous for takes along axis0 or f-contiguous for takes along
>>> axis1 of the array, and
>>> 2. the contiguity of the output array matches the contiguity of the input
>>> 
>>> Almost all the time, Block.values is stored c-contiguously, such that
>>> each row of the Block corresponds to a column of the DataFrame. So the
>>> optimization only really kicks in, effectively, when reindexing along
>>> the column axis of the DataFrame (i.e. axis 0 of the Block); it
>>> basically means we call memmove once per DataFrame column rather than
>>> iterating in a loop and copying elements.  This is good because most
>>> sane DataFrame objects are have more rows than columns, so we call
>>> memmove few times (i.e. once per column) for a large block of values
>>> (i.e. all rows for that column at a time), so any overhead from
>>> calling memmove will be outweighed by the benefit of a hand optimized
>>> copy (which probably involves vectorization, alignment/cache
>>> optimization, loop unrolling, etc.)
>>> 
>>> C-contiguous blocks result from basically every Pandas operation that
>>> operates on blocks, with the only exceptions of (as far as I can tell)
>>> creating a DataFrame directly from a 2-d ndarray or creating the
>>> transpose of a homogenous DataFrame (but not a heterogenous one)
>>> without copying; this is basically an optimization to avoid creating
>>> the c-contigous version of an array when the f-contiguous one is
>>> already available, but it's the exception rather than the rule and
>>> pretty any modification of the DataFrame will immediately require
>>> reallocation and copying to a new c-contiguous block.
>>> 
>>> Unfortunately many of the DataFrame tests, including the two in
>>> question here, are (for simplicity) only testing the case where a
>>> homogenous 2-d data is passed to the DataFrame, which results in
>>> (non-representative) f-contiguous blocks. An additional issue with
>>> this test is that it's creating a very long but thin array (10,000
>>> long, 4 wide) and reindexing along the index dimension, so row-by-row
>>> (from the DataFrame perspective) copying is done over and over using
>>> memmove on 4 element arrays. Furthermore, the alignment and width in
>>> bytes of each 4 element array happens to be a convenient multiple of
>>> 128bits, which is the multiple required for vectorized SIMD
>>> instructions, so it turns out the element-by-element copying is fairly
>>> efficient when such operations are available (as is guaranteed on
>>> x86-64, but not necessarily x86-32), and the call to memmove has more
>>> overhead than element-by-element copying.
>>> 
>>> So the issue is basically only happening because all the following are
>>> true:
>>> 
>>> 1. The DataFrame is constructed directly by a 2-d homogenous ndarray
>>> (which has the default c-contiguous continuity, so the block becomes
>>> f-contiguous).
>>> 2. There has been no operation after construction of the DataFrame
>>> requiring reallocation of any sort (otherwise the block would become
>>> c-contiguous).
>>> 3. The reindexing is done on the index axis (otherwise no optimization
>>> would be triggered, since it requires the right axis/contiguity
>>> combination).
>>> 4. The DataFrame is long but thin (otherwise memmove would not be
>>> called repeatedly to do small copies).
>>> 5. The C compiler is not inlining memmove properly, for whatever reason,
>>> and
>>> 6. (possibly) The alignment/width of the data happens to be such that
>>> SIMD operations can be used directly, so the overhead of the eliding
>>> the loop is not very great and exceeded by the overhead of the
>>> memmove.
>>> 
>>> To be honest, it's common C practice to call memmove/memcpy (the
>>> performance of the two don't really differ from my testing in this
>>> case) even for very small arrays and assuming that the implementation
>>> is sane enough to inline it and do the right thing either way, so I'm
>>> really surprised about #5: I would not have thought it to be an issue
>>> with a modern compiler, since calling memcpy can't do anything but
>>> provide the compiler more, not less, information about your intentions
>>> (and the overhead of the memmove aliasing check is not significant
>>> here).
>>> 
>>> Anyway, so it's a corner case, and I didn't catch it originally
>>> because I tested independently the effect of 1) allocates the output
>>> array to be f-contiguous instead of c-contiguous by default when the
>>> input array is f-contiguous and 2) converting loops into memmove when
>>> possible, both of which have a positive performance effect
>>> independently but combine to adversely affect these two tests.
>>> 
>>> I can revert the change that "allocates the output array to be
>>> f-contiguous instead of c-contiguous by default when the input array
>>> is f-contiguous", meaning that this optimization will almost never be
>>> triggered for an f-contiguous input array (unless the caller
>>> explicitly provides an output array as f-contiguous), but I'd rather
>>> not because the optimization is actually kind of useful in less
>>> degenerate cases when you want to quickly produce a reindexed version
>>> of a f-contiguous array, for whatever reason, even though the cases
>>> are rarer.
>>> 
>>> So I think what I'm going to do instead, to avoid the degenerate case
>>> above, is to trigger the optimization only when the take operation is
>>> done along the shorter of the two dimensions (i.e. so the copied
>>> dimension is the longer of the two): that will definitely fix this
>>> test (since it'll avoid this optimization completely) but I suppose
>>> there might be other degenerate cases I haven't thought about it. I'll
>>> submit a PR later today for this, if no one finds any objection to the
>>> idea.
>>> 
>>> However, I think it might be skewed our performance results to be
>>> testing DataFrame objects constructed by 2-d ndarrays, since they're
>>> not representative; in addition to the issue above, it means that many
>>> tests are actually incorporating the cost of converting an
>>> f-contiguous array into a c-contiguous array on top of what they're
>>> actually trying to test. Two possible solutions are:
>>> 
>>> 1. Change DataFrame constructor (and possibly DataFrame.T) to
>>> normalize all blocks as c-contiguous.
>>> 2. Leave DataFrame constructor as-is but either change existing tests
>>> to exercise the more common use case (c-contiguous blocks) or add them
>>> in addition to the current ones.
>>> 
>>> I think #2 is probably best, since #1 will have a performance impact
>>> for the use cases (however rare) where an entire workflow can avoid
>>> triggering conversion from f-contiguous blocks to c-contiguous blocks.
>>> 
>>> Let me know what you all think,
>>> Stephen
>>> 
>>> On Wed, Mar 20, 2013 at 1:25 AM, Stephen Lin <swlin at post.harvard.edu>
>>> wrote:
>>>> Ahh! I figured it out...the platform issue is part of it, but mostly
>>>> it's that two (independently tested) commits had a weird effect when
>>>> merged.
>>>> 
>>>> And the reason they did so is because this particular test turns out
>>>> all of our reindexing tests are testing something very
>>>> non-representative, because of the way they're constructed, so we're
>>>> not really getting representative performance data unfortunately (it
>>>> has to do with the DataFrame constructor and c-contiguity vs
>>>> f-contiguity). We should probably write new tests to fix this issue.
>>>> 
>>>> I'll write up a fuller explanation when I get a chance. Anyway, sorry
>>>> for sending you on a git bisect goose chase, Jeff.
>>>> 
>>>> Stephen
>>>> 
>>>> On Wed, Mar 20, 2013 at 1:01 AM, Stephen Lin <swlin at post.harvard.edu>
>>>> wrote:
>>>>> As per the "we're getting too chatty on GitHub" comment, should we be
>>>>> moving extended issue discussion about bugs to this list whenever
>>>>> possible?
>>>>> 
>>>>> I posted a few comments on #3089 just now but realized maybe starting
>>>>> an e-mail chain would be better..
>>>>> 
>>>>> Anyway, I'm looking into the issue, I suspect it's a corner case due
>>>>> to an array that's very large in one dimension but small in another,
>>>>> and possibly that there's compiler and architecture differences
>>>>> causing different results as well....Jeff, do you mind sending me your
>>>>> the output of "gcc -dumpmachine" and "gcc -dumpspecs" on the machine
>>>>> you ran vb_suite on?
>>>>> 
>>>>> I'll set up a 64-bit dev machine going forward so I can test on both
>>>>> platforms.
>>>>> 
>>>>> Thanks,
>>>>> Stephen
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> http://mail.python.org/mailman/listinfo/pandas-dev
>> 
>> 
>> 
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> http://mail.python.org/mailman/listinfo/pandas-dev
>> 

From yoval at gmx.com  Sat Mar 23 00:44:20 2013
From: yoval at gmx.com (yoval p.)
Date: Sat, 23 Mar 2013 00:44:20 +0100
Subject: [Pandas-dev] I fought travis and won (sort of).
Message-ID: <20130322234420.28600@gmx.com>

Hi guys,

I've been frustrated with the turn-around time on travis, as it's
become less a "CI" service then a "ATFMLI" (about 25 minutes later integration).
Much of that build time is taken up by cythonizing and compilation even
though the majority of PRs don't touch the cython code, and that's
all wasted work.

I hacked out a POC using network storage to cache build results
bringing a complete run down to about ~8 minutes. Admittedly not
amazing, but 2.5X-3X all the same.

If any of you have an S3 API key you're willing to throw my way,
I can set this up so anyone can opt in, via a magic incantation
included in the commit message.

Cheers,
yoval
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20130323/2e86f036/attachment.html>

From jreback at yahoo.com  Sat Mar 23 01:56:12 2013
From: jreback at yahoo.com (Jeff Reback)
Date: Fri, 22 Mar 2013 20:56:12 -0400
Subject: [Pandas-dev] docs & builds
Message-ID: <0F58569D-CD9B-4ECE-99B5-D83B7FA041F2@yahoo.com>

Wes/Chang 

not sure exactly how the doc builds happen, though I usually see updated by 5pm est, working?

also windows dev builds stopped as of 3/14

thanks

Jeff


I can be reached on my cell 917-971-6387

From wesmckinn at gmail.com  Sat Mar 23 17:01:07 2013
From: wesmckinn at gmail.com (Wes McKinney)
Date: Sat, 23 Mar 2013 12:01:07 -0400
Subject: [Pandas-dev] docs & builds
In-Reply-To: <0F58569D-CD9B-4ECE-99B5-D83B7FA041F2@yahoo.com>
References: <0F58569D-CD9B-4ECE-99B5-D83B7FA041F2@yahoo.com>
Message-ID: <CAJPUwMBSetBnL_xBfRaJPCou0UpoCxQivWGBohQtKXagib8w+A@mail.gmail.com>

On Fri, Mar 22, 2013 at 8:56 PM, Jeff Reback <jreback at yahoo.com> wrote:
> Wes/Chang
>
> not sure exactly how the doc builds happen, though I usually see updated by 5pm est, working?
>
> also windows dev builds stopped as of 3/14
>
> thanks
>
> Jeff
>
>
> I can be reached on my cell 917-971-6387
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> http://mail.python.org/mailman/listinfo/pandas-dev

Windows dev builds are updated. I had turned off the VM before I left
for PyCon fully intending to run it on my laptop but forgot to do so.
It will be offline a couple days next week due to moving.

- Wes

From changshe at gmail.com  Sat Mar 23 18:08:04 2013
From: changshe at gmail.com (Chang She)
Date: Sat, 23 Mar 2013 10:08:04 -0700
Subject: [Pandas-dev] docs & builds
In-Reply-To: <CAJPUwMBSetBnL_xBfRaJPCou0UpoCxQivWGBohQtKXagib8w+A@mail.gmail.com>
References: <0F58569D-CD9B-4ECE-99B5-D83B7FA041F2@yahoo.com>
	<CAJPUwMBSetBnL_xBfRaJPCou0UpoCxQivWGBohQtKXagib8w+A@mail.gmail.com>
Message-ID: <2FB2162E-D142-4CFF-9329-9C12918EA8C7@gmail.com>

The docs built are kicked off everyday on the same machine that was running the VM, but thank god we're not building the docs in a windows environment :)

On Mar 23, 2013, at 9:01 AM, Wes McKinney <wesmckinn at gmail.com> wrote:

> On Fri, Mar 22, 2013 at 8:56 PM, Jeff Reback <jreback at yahoo.com> wrote:
>> Wes/Chang
>> 
>> not sure exactly how the doc builds happen, though I usually see updated by 5pm est, working?
>> 
>> also windows dev builds stopped as of 3/14
>> 
>> thanks
>> 
>> Jeff
>> 
>> 
>> I can be reached on my cell 917-971-6387
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> http://mail.python.org/mailman/listinfo/pandas-dev
> 
> Windows dev builds are updated. I had turned off the VM before I left
> for PyCon fully intending to run it on my laptop but forgot to do so.
> It will be offline a couple days next week due to moving.
> 
> - Wes
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> http://mail.python.org/mailman/listinfo/pandas-dev


From jeffreback at gmail.com  Sun Mar 24 03:28:56 2013
From: jeffreback at gmail.com (Jeff Reback)
Date: Sat, 23 Mar 2013 22:28:56 -0400
Subject: [Pandas-dev] docs & builds
In-Reply-To: <2FB2162E-D142-4CFF-9329-9C12918EA8C7@gmail.com>
References: <0F58569D-CD9B-4ECE-99B5-D83B7FA041F2@yahoo.com>
	<CAJPUwMBSetBnL_xBfRaJPCou0UpoCxQivWGBohQtKXagib8w+A@mail.gmail.com>
	<2FB2162E-D142-4CFF-9329-9C12918EA8C7@gmail.com>
Message-ID: <1DBC5FFC-0D41-4241-B5F6-6F6AA1542588@gmail.com>

looks like they r updated

thxs!

I can be reached on my cell 917-971-6387

On Mar 23, 2013, at 1:08 PM, Chang She <changshe at gmail.com> wrote:

> The docs built are kicked off everyday on the same machine that was running the VM, but thank god we're not building the docs in a windows environment :)
> 
> On Mar 23, 2013, at 9:01 AM, Wes McKinney <wesmckinn at gmail.com> wrote:
> 
>> On Fri, Mar 22, 2013 at 8:56 PM, Jeff Reback <jreback at yahoo.com> wrote:
>>> Wes/Chang
>>> 
>>> not sure exactly how the doc builds happen, though I usually see updated by 5pm est, working?
>>> 
>>> also windows dev builds stopped as of 3/14
>>> 
>>> thanks
>>> 
>>> Jeff
>>> 
>>> 
>>> I can be reached on my cell 917-971-6387
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> http://mail.python.org/mailman/listinfo/pandas-dev
>> 
>> Windows dev builds are updated. I had turned off the VM before I left
>> for PyCon fully intending to run it on my laptop but forgot to do so.
>> It will be offline a couple days next week due to moving.
>> 
>> - Wes
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> http://mail.python.org/mailman/listinfo/pandas-dev
> 
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> http://mail.python.org/mailman/listinfo/pandas-dev

From wesmckinn at gmail.com  Tue Mar 26 02:58:59 2013
From: wesmckinn at gmail.com (Wes McKinney)
Date: Mon, 25 Mar 2013 21:58:59 -0400
Subject: [Pandas-dev] Fast py2/py3 testing, fast vbench
In-Reply-To: <20130320200515.28610@gmx.com>
References: <20130320200515.28610@gmx.com>
Message-ID: <CAJPUwMCZEWyJFHX-rRmStgyVpsjH4GmtHG8CTNhD8WFwwc8vAg@mail.gmail.com>

On Wed, Mar 20, 2013 at 4:05 PM, yoval p. <yoval at gmx.com> wrote:
> I've made some improvement to the tooling we have
> for development, just making sure everyone is
> aware of what's available.
>
> - closed GH3099, caching cython build artifacts
> when running setup.py.
> - setup.py now checks for BUILD_CACHE_DIR envar
> so you can enable it without touch the source code
> - Once enabled, with a warm cache testing py26/27/32/33
> takes only a couple of minutes compares with travis' ~15
> on a quad core machine
> - if caching is enabled (for future commits, the envar is sufficiant)
> test_perf.sh will run much faster.
> - i've added an option to filter vbench by regex when running
> test_perf.sh.
>
> Quick iteration makes everything easier, I hope these
> changes do that.
>
> Here's an example of all of the above, comparing two adjacent
> commits on a reduced set of vbenches in 1min flat:
>
> ? export BUILD_CACHE_DIR="/tmp/.pandas_build_cache/"
> ? time ./test_perf.sh -b 18c7e6c -t 18c7e6c^ -r reindex
> ...
> Results:
>                                     t_head  t_baseline      ratio
> name
> dataframe_reindex                   0.3726      0.3726     1.0000
> reindex_fillna_backfill_float32     0.0961      0.0961     1.0000
> reindex_fillna_pad_float32          0.0959      0.0959     1.0000
> frame_reindex_upcast               17.7334     17.7334     1.0000
> reindex_daterange_backfill          0.1649      0.1649     1.0000
> reindex_fillna_pad                  0.1052      0.1052     1.0000
> reindex_daterange_pad               0.1757      0.1757     1.0000
> reindex_frame_level_align           1.0109      1.0109     1.0000
> reindex_fillna_backfill             0.1035      0.1035     1.0000
> reindex_frame_level_reindex         0.9586      0.9586     1.0000
> frame_reindex_columns               0.3101      0.3101     1.0000
> reindex_multiindex                  1.1427      1.1427     1.0000
>
> Columns: test_name | target_duration [ms] | baseline_duration [ms] | ratio
>
> - a Ratio of 1.30 means the target commit is 30% slower then the baseline.
>
> Target [18c7e6c] : BLD: check for BUILD_CACHE_DIR envar in setup.py
> Baseline [18c7e6c] : BLD: check for BUILD_CACHE_DIR envar in setup.py
>
>
> *** Results were also written to the logfile at
> '/home/user1/src/pandas/vb_suite.log'
>
>
> real    0m58.561s
> user    0m52.699s
> sys    0m1.645s
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> http://mail.python.org/mailman/listinfo/pandas-dev
>

This is really great. Caching of builds is a no-brainer and the vbench
suite has gotten quite large (surprised it's not more popular? we are
avant garde). Thanks y-p!

From swlin at post.harvard.edu  Wed Mar 27 01:39:30 2013
From: swlin at post.harvard.edu (Stephen Lin)
Date: Tue, 26 Mar 2013 20:39:30 -0400
Subject: [Pandas-dev] Small question about vb_suite
Message-ID: <CAP2HvmC__jEza=35tv9kYyLF=uvrr15zxYLBBkZ5Q=U0k0q_Jg@mail.gmail.com>

Hey guys,

Just curious, Is there a convenient way to run just a particular set
of benchmarks using one's locally checked out copy, right than going
through the whole rigamorale of checking out two commits and comparing
them? I'm modifying some tests to see if I can improve their
stability, and I just want to check for syntax errors and such
quickly...

Stephen

From wesmckinn at gmail.com  Wed Mar 27 02:12:26 2013
From: wesmckinn at gmail.com (Wes McKinney)
Date: Tue, 26 Mar 2013 21:12:26 -0400
Subject: [Pandas-dev] Small question about vb_suite
In-Reply-To: <CAP2HvmC__jEza=35tv9kYyLF=uvrr15zxYLBBkZ5Q=U0k0q_Jg@mail.gmail.com>
References: <CAP2HvmC__jEza=35tv9kYyLF=uvrr15zxYLBBkZ5Q=U0k0q_Jg@mail.gmail.com>
Message-ID: <CAJPUwMCK4vBVoA9g_HUqD=1Sg3BHzdE8H1jc6JTDFAiw0_1Qxg@mail.gmail.com>

On Tue, Mar 26, 2013 at 8:39 PM, Stephen Lin <swlin at post.harvard.edu> wrote:
> Hey guys,
>
> Just curious, Is there a convenient way to run just a particular set
> of benchmarks using one's locally checked out copy, right than going
> through the whole rigamorale of checking out two commits and comparing
> them? I'm modifying some tests to see if I can improve their
> stability, and I just want to check for syntax errors and such
> quickly...
>
> Stephen
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> http://mail.python.org/mailman/listinfo/pandas-dev

The Benchmark objects have a "run" method:

In [2]: import reindex

In [3]: reindex.reindex_fillna_pad.run()
Out[3]:
{'loops': 1000,
 'repeat': 3,
 'succeeded': True,
 'timing': 0.12795305252075195,
 'units': 'ms'}

Make a list of benchmarks of interest, any, with a little getattr
action, you should be in business.

- Wes

From yoval at gmx.com  Wed Mar 27 11:48:49 2013
From: yoval at gmx.com (yoval p.)
Date: Wed, 27 Mar 2013 11:48:49 +0100
Subject: [Pandas-dev] Small question about vb_suite
Message-ID: <20130327104850.37450@gmx.com>

----- Original Message -----
From: Stephen Lin
Sent: 03/27/13 02:39 AM
To: pandas-dev at python.org
Subject: [Pandas-dev] Small question about vb_suite

Hey guys, Just curious, Is there a convenient way to run just a particular set of benchmarks using one's locally checked out copy, right than going through the whole rigamorale of checking out two commits and comparing them? I'm modifying some tests to see if I can improve their stability, and I just want to check for syntax errors and such quickly... Stephen _______________________________________________ Pandas-dev mailing list Pandas-dev at python.org http://mail.python.org/mailman/listinfo/pandas-dev
That would be vb_suite/perf_HEAD, which really should be part of test_perf.
...So now it is.

```
test_perf -H -r frame_.+
```

Yoval
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20130327/11d0303f/attachment.html>