From tom.augspurger88 at gmail.com Wed Sep 6 08:17:40 2017 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Wed, 6 Sep 2017 07:17:40 -0500 Subject: [Pandas-dev] Benchmark updates In-Reply-To: References: Message-ID: I added a page to the wiki: https://github.com/pandas-dev/pandas/wiki/Benchmark-Machine In theory, bootstrapping a new machine is as simple as an "ansible playbook tests/full.yml", but I've probably made some changes manually that aren't in the playbook. Agreed that we should get the system to be as stable as possible. https://haypo.github.io/category/benchmark.html has some useful information I think, starting with https://haypo.github.io/journey-to-stable-benchmark-system.html Tom On Tue, Aug 29, 2017 at 2:58 PM, Wes McKinney wrote: > hey Tom, > > This is really great. Any chance we can create a wiki or README about > configuration in the event that the Airflow config needs to be > recreated or changed? > > Thanks again for setting this up, and keeping my coat closet (where > the machine is located) toasty. > > As one minor thing with the benchmarking, I've noticed that default > Linux configs can be a little bit aggressive about throttling the CPU > frequency. This can be edited in the cpufrequtils script, but at least > on my laptop and desktop (Ubuntu 14.04) I find myself having to run > "/etc/init.d/cpufrequtils restart" to get it to disable frequency > scaling. This should probably happen at boot time, but I'm not sure > yet how to do it. So we might want to document this so that we are > getting the best quality performance data out of the machine. > > - Wes > > On Sun, Aug 20, 2017 at 8:21 AM, Tom Augspurger > wrote: > > Hi everyone, > > > > I did some work on the benchmark running & publishing yesterday. The > results > > are now hosted at http://pandas.pydata.org/speed/, so pandas' are at > > http://pandas.pydata.org/speed/pandas. > > An RSS feed of regressions is available at > > http://pandas.pydata.org/speed/pandas/regressions.xml. I plan to track > that > > and manually open issues if they seem legitimate. > > > > The runs are now triggered and monitored by Apache Airflow (instead of > the > > cron job I had setup). This gives us a nice dashboard with the ability to > > view logs and see when benchmarks fail (and eventually email alerts). The > > dashboard isn't exposed publicly. If you have SSH access to the benchmark > > server, you can create a tunnel to port 8080 > > > > ssh -L 8080:localhost:8080 pandas at panda.likescandy.com > > > > Tom > > > > _______________________________________________ > > Pandas-dev mailing list > > Pandas-dev at python.org > > https://mail.python.org/mailman/listinfo/pandas-dev > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Wed Sep 6 08:20:59 2017 From: wesmckinn at gmail.com (Wes McKinney) Date: Wed, 6 Sep 2017 08:20:59 -0400 Subject: [Pandas-dev] Benchmark updates In-Reply-To: References: Message-ID: Thanks! It can be hard to avoid "snowflake" setups, but when possible it's very nice. Conceivably in the future we could have other benchmark machines. On Wed, Sep 6, 2017 at 8:17 AM, Tom Augspurger wrote: > I added a page to the wiki: > https://github.com/pandas-dev/pandas/wiki/Benchmark-Machine > > In theory, bootstrapping a new machine is as simple as an "ansible playbook > tests/full.yml", but > I've probably made some changes manually that aren't in the playbook. > > Agreed that we should get the system to be as stable as possible. > https://haypo.github.io/category/benchmark.html > has some useful information I think, starting with > https://haypo.github.io/journey-to-stable-benchmark-system.html > > Tom > > On Tue, Aug 29, 2017 at 2:58 PM, Wes McKinney wrote: >> >> hey Tom, >> >> This is really great. Any chance we can create a wiki or README about >> configuration in the event that the Airflow config needs to be >> recreated or changed? >> >> Thanks again for setting this up, and keeping my coat closet (where >> the machine is located) toasty. >> >> As one minor thing with the benchmarking, I've noticed that default >> Linux configs can be a little bit aggressive about throttling the CPU >> frequency. This can be edited in the cpufrequtils script, but at least >> on my laptop and desktop (Ubuntu 14.04) I find myself having to run >> "/etc/init.d/cpufrequtils restart" to get it to disable frequency >> scaling. This should probably happen at boot time, but I'm not sure >> yet how to do it. So we might want to document this so that we are >> getting the best quality performance data out of the machine. >> >> - Wes >> >> On Sun, Aug 20, 2017 at 8:21 AM, Tom Augspurger >> wrote: >> > Hi everyone, >> > >> > I did some work on the benchmark running & publishing yesterday. The >> > results >> > are now hosted at http://pandas.pydata.org/speed/, so pandas' are at >> > http://pandas.pydata.org/speed/pandas. >> > An RSS feed of regressions is available at >> > http://pandas.pydata.org/speed/pandas/regressions.xml. I plan to track >> > that >> > and manually open issues if they seem legitimate. >> > >> > The runs are now triggered and monitored by Apache Airflow (instead of >> > the >> > cron job I had setup). This gives us a nice dashboard with the ability >> > to >> > view logs and see when benchmarks fail (and eventually email alerts). >> > The >> > dashboard isn't exposed publicly. If you have SSH access to the >> > benchmark >> > server, you can create a tunnel to port 8080 >> > >> > ssh -L 8080:localhost:8080 pandas at panda.likescandy.com >> > >> > Tom >> > >> > _______________________________________________ >> > Pandas-dev mailing list >> > Pandas-dev at python.org >> > https://mail.python.org/mailman/listinfo/pandas-dev >> > > >