From tom.augspurger88 at gmail.com Fri Aug 18 06:57:29 2017 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Fri, 18 Aug 2017 05:57:29 -0500 Subject: [Pandas-dev] August 2017 Developer Meeting Message-ID: Hi all, We're holding a developer meeting at 2:00 PM EST / 6:00 PM UTC today. You're welcome to join at https://plus.google.com/hangouts/_/calendar/dG9tLmF1Z3NwdXJnZXI4OEBnbWFpbC5jb20.6i17sn8l2g2js2tog1q8cgrqts?authuser=0 or view the minutes at https://docs.google.com/document/d/1tGbTiYORHiSPgVMXawiweGJlBw5dOkVJLY-licoBmBU/edit Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.augspurger88 at gmail.com Sun Aug 20 08:21:12 2017 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Sun, 20 Aug 2017 07:21:12 -0500 Subject: [Pandas-dev] Benchmark updates Message-ID: Hi everyone, I did some work on the benchmark running & publishing yesterday. The results are now hosted at http://pandas.pydata.org/speed/, so pandas' are at http://pandas.pydata.org/speed/pandas. An RSS feed of regressions is available at http://pandas.pydata.org/speed/pandas/regressions.xml. I plan to track that and manually open issues if they seem legitimate. The runs are now triggered and monitored by Apache Airflow (instead of the cron job I had setup). This gives us a nice dashboard with the ability to view logs and see when benchmarks fail (and eventually email alerts). The dashboard isn't exposed publicly. If you have SSH access to the benchmark server, you can create a tunnel to port 8080 ssh -L 8080:localhost:8080 pandas at panda.likescandy.com Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Tue Aug 29 15:58:59 2017 From: wesmckinn at gmail.com (Wes McKinney) Date: Tue, 29 Aug 2017 15:58:59 -0400 Subject: [Pandas-dev] Benchmark updates In-Reply-To: References: Message-ID: hey Tom, This is really great. Any chance we can create a wiki or README about configuration in the event that the Airflow config needs to be recreated or changed? Thanks again for setting this up, and keeping my coat closet (where the machine is located) toasty. As one minor thing with the benchmarking, I've noticed that default Linux configs can be a little bit aggressive about throttling the CPU frequency. This can be edited in the cpufrequtils script, but at least on my laptop and desktop (Ubuntu 14.04) I find myself having to run "/etc/init.d/cpufrequtils restart" to get it to disable frequency scaling. This should probably happen at boot time, but I'm not sure yet how to do it. So we might want to document this so that we are getting the best quality performance data out of the machine. - Wes On Sun, Aug 20, 2017 at 8:21 AM, Tom Augspurger wrote: > Hi everyone, > > I did some work on the benchmark running & publishing yesterday. The results > are now hosted at http://pandas.pydata.org/speed/, so pandas' are at > http://pandas.pydata.org/speed/pandas. > An RSS feed of regressions is available at > http://pandas.pydata.org/speed/pandas/regressions.xml. I plan to track that > and manually open issues if they seem legitimate. > > The runs are now triggered and monitored by Apache Airflow (instead of the > cron job I had setup). This gives us a nice dashboard with the ability to > view logs and see when benchmarks fail (and eventually email alerts). The > dashboard isn't exposed publicly. If you have SSH access to the benchmark > server, you can create a tunnel to port 8080 > > ssh -L 8080:localhost:8080 pandas at panda.likescandy.com > > Tom > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > From nesdis at gmail.com Thu Aug 31 09:12:19 2017 From: nesdis at gmail.com (Siddarth Sen) Date: Thu, 31 Aug 2017 13:12:19 -0000 Subject: [Pandas-dev] Make the underlying data structure of a sparse dataframe an sparse matrix instead of sparse series Message-ID: Hi I would like to consider the option of converting the underlying structure of a sparse Dataframe to a sparse matrix instead of multiple sparse series in case all the columns of the Dataframe are of the same dtype. This will make row/column slicing of the Dataframe much faster that what it is currently. -------------- next part -------------- An HTML attachment was scrubbed... URL: