From stephenwlin at gmail.com Wed Apr 3 22:14:01 2013 From: stephenwlin at gmail.com (Stephen Lin) Date: Wed, 3 Apr 2013 16:14:01 -0400 Subject: [Pandas-dev] Script to set up fresh Python 2.7/3.2 virtual build environments on Ubuntu (tested on 12.04.2 LTS) Message-ID: Hi all, I've set up a new development machine lately and decided to write all the steps down as a script in order to make it easier to reproduce later. I've decided to contribute it to the list in case anyone else finds it helpful. It's tested to work unattended on a fresh Ubuntu 12.04.2 LTS installs (32-bit and 64-bit), and should be useful for other distributions as well with some modifications. There are two optional parameters, first the username for the pandas fork to check out, and second the username for the vbench fork to check out; pydata versions are used if the latter or both are omitted. Also, it sets up .env2.7 and .env3.2 virtualenv directories as subdirectories of the checked out pandas directory and installs necessary dependencies into them in the correct order (which should all succeed since the necessary system-wide c libraries are first installed via apt-get.) Some optional dependencies are omitted in the 3.2 environment since they don't seem to build correctly and I didn't spend much time trying to fix them. There's basically no error checking, but it shouldn't do anything that could clobber anything in case any intermediate steps fail: probably best to run it with stdout and stderr both tee'ed to a log file just in case, though, and/or more checks if you're not using directly on 12.04.2 LTS. Stephen ---------- Forwarded message ---------- From: Stephen Lin Date: Wed, Apr 3, 2013 at 4:02 PM Subject: To: stephenwlin at gmail.com -------------- next part -------------- A non-text attachment was scrubbed... Name: checkout-pandas-script Type: application/octet-stream Size: 1814 bytes Desc: not available URL: From wesmckinn at gmail.com Sat Apr 6 02:16:08 2013 From: wesmckinn at gmail.com (Wes McKinney) Date: Fri, 5 Apr 2013 17:16:08 -0700 Subject: [Pandas-dev] Script to set up fresh Python 2.7/3.2 virtual build environments on Ubuntu (tested on 12.04.2 LTS) In-Reply-To: References: Message-ID: On Wed, Apr 3, 2013 at 1:14 PM, Stephen Lin wrote: > Hi all, > > I've set up a new development machine lately and decided to write all > the steps down as a script in order to make it easier to reproduce > later. > > I've decided to contribute it to the list in case anyone else finds it > helpful. It's tested to work unattended on a fresh Ubuntu 12.04.2 LTS > installs (32-bit and 64-bit), and should be useful for other > distributions as well with some modifications. > > There are two optional parameters, first the username for the pandas > fork to check out, and second the username for the vbench fork to > check out; pydata versions are used if the latter or both are omitted. > > Also, it sets up .env2.7 and .env3.2 virtualenv directories as > subdirectories of the checked out pandas directory and installs > necessary dependencies into them in the correct order (which should > all succeed since the necessary system-wide c libraries are first > installed via apt-get.) Some optional dependencies are omitted in the > 3.2 environment since they don't seem to build correctly and I didn't > spend much time trying to fix them. > > There's basically no error checking, but it shouldn't do anything that > could clobber anything in case any intermediate steps fail: probably > best to run it with stdout and stderr both tee'ed to a log file just > in case, though, and/or more checks if you're not using directly on > 12.04.2 LTS. > > Stephen > > ---------- Forwarded message ---------- > From: Stephen Lin > Date: Wed, Apr 3, 2013 at 4:02 PM > Subject: > To: stephenwlin at gmail.com > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev > > Maybe put this on the pandas github wiki? -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephenwlin at gmail.com Sat Apr 6 05:28:12 2013 From: stephenwlin at gmail.com (Stephen Lin) Date: Fri, 5 Apr 2013 23:28:12 -0400 Subject: [Pandas-dev] Script to set up fresh Python 2.7/3.2 virtual build environments on Ubuntu (tested on 12.04.2 LTS) In-Reply-To: References: Message-ID: hmm, ok, maybe after cleaning it up a bit and making it more robust / customizable; possibly I can support other freshly installed distributions as well On Fri, Apr 5, 2013 at 8:16 PM, Wes McKinney wrote: > > > > On Wed, Apr 3, 2013 at 1:14 PM, Stephen Lin wrote: >> >> Hi all, >> >> I've set up a new development machine lately and decided to write all >> the steps down as a script in order to make it easier to reproduce >> later. >> >> I've decided to contribute it to the list in case anyone else finds it >> helpful. It's tested to work unattended on a fresh Ubuntu 12.04.2 LTS >> installs (32-bit and 64-bit), and should be useful for other >> distributions as well with some modifications. >> >> There are two optional parameters, first the username for the pandas >> fork to check out, and second the username for the vbench fork to >> check out; pydata versions are used if the latter or both are omitted. >> >> Also, it sets up .env2.7 and .env3.2 virtualenv directories as >> subdirectories of the checked out pandas directory and installs >> necessary dependencies into them in the correct order (which should >> all succeed since the necessary system-wide c libraries are first >> installed via apt-get.) Some optional dependencies are omitted in the >> 3.2 environment since they don't seem to build correctly and I didn't >> spend much time trying to fix them. >> >> There's basically no error checking, but it shouldn't do anything that >> could clobber anything in case any intermediate steps fail: probably >> best to run it with stdout and stderr both tee'ed to a log file just >> in case, though, and/or more checks if you're not using directly on >> 12.04.2 LTS. >> >> Stephen >> >> ---------- Forwarded message ---------- >> From: Stephen Lin >> Date: Wed, Apr 3, 2013 at 4:02 PM >> Subject: >> To: stephenwlin at gmail.com >> >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> http://mail.python.org/mailman/listinfo/pandas-dev >> > > > Maybe put this on the pandas github wiki? From swlin at post.harvard.edu Sat Apr 6 06:07:22 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Sat, 6 Apr 2013 00:07:22 -0400 Subject: [Pandas-dev] Anyone tested pandas on ARM ever? Message-ID: Just curious -- I am setting up an ARM environment for an ARM-specific llvm/clang patch I'm working on. Might be interesting to port pandas over, too, if it doesn't work already...the architecture is getting more and more important these days. Stephen From stephenwlin at gmail.com Sun Apr 7 01:46:28 2013 From: stephenwlin at gmail.com (Stephen Lin) Date: Sat, 6 Apr 2013 19:46:28 -0400 Subject: [Pandas-dev] Script to set up fresh Python 2.7/3.2 virtual build environments on Ubuntu (tested on 12.04.2 LTS) In-Reply-To: References: Message-ID: Hmm, actually, would this be something that would be ok to check into the main tree if it were made more robust and able to support more distributions/versions? I'm doing a lot of cross-platform testing for other stuff anyway, so I don't mind working on that if there's any demand. (I know that other people use Enthought for this sort of thing, but I didn't have much luck with that last time I tried it...lots of version mismatch issues and no easy way of downgrading packages as necessary to fix them.) Assuming this is a resonable idea, anyone have any requests for supported distributions/versions? Other than RHEL (which costs money), the cloud hosting I use for this sort of thing (Rackspace) supports the following distributions: Arch 2012.08 CentOS 5.6 CentOS 5.8 CentOS 6.0 CentOS 6.2 CentOS 6.3 Debian 6 (Squeeze) Fedora 16 (Verne) Fedora 17 (Beefy Miracle) FreeBSD 9 Gentoo 12.3 openSUSE 12.1 Ubuntu 10.04 LTS (Lucid Lynx) Ubuntu 11.04 (Natty Narwhal) Ubuntu 11.10 (Oneiric Oncelot) Ubuntu 12.04 LTS (Precise Pangolin) Ubuntu 12.10 (Quantal Quetzal) I'll can try initially covering the most recent Debian, Ubuntu, Fedora, and CentOS versions. Anything else worth prioritizing? On Fri, Apr 5, 2013 at 11:28 PM, Stephen Lin wrote: > hmm, ok, maybe after cleaning it up a bit and making it more robust / > customizable; possibly I can support other freshly installed > distributions as well > > On Fri, Apr 5, 2013 at 8:16 PM, Wes McKinney wrote: >> >> >> >> On Wed, Apr 3, 2013 at 1:14 PM, Stephen Lin wrote: >>> >>> Hi all, >>> >>> I've set up a new development machine lately and decided to write all >>> the steps down as a script in order to make it easier to reproduce >>> later. >>> >>> I've decided to contribute it to the list in case anyone else finds it >>> helpful. It's tested to work unattended on a fresh Ubuntu 12.04.2 LTS >>> installs (32-bit and 64-bit), and should be useful for other >>> distributions as well with some modifications. >>> >>> There are two optional parameters, first the username for the pandas >>> fork to check out, and second the username for the vbench fork to >>> check out; pydata versions are used if the latter or both are omitted. >>> >>> Also, it sets up .env2.7 and .env3.2 virtualenv directories as >>> subdirectories of the checked out pandas directory and installs >>> necessary dependencies into them in the correct order (which should >>> all succeed since the necessary system-wide c libraries are first >>> installed via apt-get.) Some optional dependencies are omitted in the >>> 3.2 environment since they don't seem to build correctly and I didn't >>> spend much time trying to fix them. >>> >>> There's basically no error checking, but it shouldn't do anything that >>> could clobber anything in case any intermediate steps fail: probably >>> best to run it with stdout and stderr both tee'ed to a log file just >>> in case, though, and/or more checks if you're not using directly on >>> 12.04.2 LTS. >>> >>> Stephen >>> >>> ---------- Forwarded message ---------- >>> From: Stephen Lin >>> Date: Wed, Apr 3, 2013 at 4:02 PM >>> Subject: >>> To: stephenwlin at gmail.com >>> >>> _______________________________________________ >>> Pandas-dev mailing list >>> Pandas-dev at python.org >>> http://mail.python.org/mailman/listinfo/pandas-dev >>> >> >> >> Maybe put this on the pandas github wiki? From stephenwlin at gmail.com Sun Apr 7 03:06:11 2013 From: stephenwlin at gmail.com (Stephen Lin) Date: Sat, 6 Apr 2013 21:06:11 -0400 Subject: [Pandas-dev] Script to set up fresh Python 2.7/3.2 virtual build environments on Ubuntu (tested on 12.04.2 LTS) In-Reply-To: References: Message-ID: (hmm, just realized what I tried using last time was Anaconda CE, rather than Enthought, actually) > (I know that other people use Enthought for this sort of > thing, but I didn't have much luck with that last time I tried > it...lots of version mismatch issues and no easy way of downgrading > packages as necessary to fix them.) From swlin at post.harvard.edu Sun Apr 7 04:44:25 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Sat, 6 Apr 2013 22:44:25 -0400 Subject: [Pandas-dev] Anyone tested pandas on ARM ever? In-Reply-To: References: Message-ID: OK, happy to report that pandas master and all dependencies (hard and soft) build fine on ARM :) Tests are too slow to run within QEMU though... Will test on a Raspberry Pi when mine gets delivered, at some point. On Sat, Apr 6, 2013 at 12:07 AM, Stephen Lin wrote: > Just curious -- I am setting up an ARM environment for an ARM-specific > llvm/clang patch I'm working on. Might be interesting to port pandas > over, too, if it doesn't work already...the architecture is getting > more and more important these days. > > Stephen From jreback at yahoo.com Sun Apr 7 16:56:57 2013 From: jreback at yahoo.com (Jeff Reback) Date: Sun, 7 Apr 2013 10:56:57 -0400 Subject: [Pandas-dev] 0.11 timetable Message-ID: Wes any thoughts on timetable? a number d issues left but most can prob be bumped to 0.12? From wesmckinn at gmail.com Sun Apr 7 18:55:22 2013 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 7 Apr 2013 09:55:22 -0700 Subject: [Pandas-dev] 0.11 timetable In-Reply-To: References: Message-ID: On Sun, Apr 7, 2013 at 7:56 AM, Jeff Reback wrote: > Wes > > any thoughts on timetable? > > a number d issues left but most can prob be bumped to 0.12? > > > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev > I'll look through open issues and PRs today and clean things up. I think we can should be able to cut the release this week. I also have my Jenkins box again so I'll get that up and running for the windows builds. - Wes -------------- next part -------------- An HTML attachment was scrubbed... URL: From jreback at yahoo.com Sun Apr 7 19:00:16 2013 From: jreback at yahoo.com (Jeff Reback) Date: Sun, 7 Apr 2013 13:00:16 -0400 Subject: [Pandas-dev] 0.11 timetable In-Reply-To: References: Message-ID: <2B3EC9CB-3843-41AF-B0E1-D9593EE6C249@yahoo.com> since this is a big release maybe do a 0.11.0rc1 ? just in case something crops up? I can be reached on my cell 917-971-6387 On Apr 7, 2013, at 12:55 PM, Wes McKinney wrote: > > > > On Sun, Apr 7, 2013 at 7:56 AM, Jeff Reback wrote: >> Wes >> >> any thoughts on timetable? >> >> a number d issues left but most can prob be bumped to 0.12? >> >> >> >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> http://mail.python.org/mailman/listinfo/pandas-dev > > > I'll look through open issues and PRs today and clean things up. I think we can should be able to cut the release this week. I also have my Jenkins box again so I'll get that up and running for the windows builds. > > - Wes -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Sun Apr 7 20:58:20 2013 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 7 Apr 2013 11:58:20 -0700 Subject: [Pandas-dev] Anyone tested pandas on ARM ever? In-Reply-To: References: Message-ID: On Sat, Apr 6, 2013 at 7:44 PM, Stephen Lin wrote: > OK, happy to report that pandas master and all dependencies (hard and > soft) build fine on ARM :) > Tests are too slow to run within QEMU though... > Will test on a Raspberry Pi when mine gets delivered, at some point. > > On Sat, Apr 6, 2013 at 12:07 AM, Stephen Lin > wrote: > > Just curious -- I am setting up an ARM environment for an ARM-specific > > llvm/clang patch I'm working on. Might be interesting to port pandas > > over, too, if it doesn't work already...the architecture is getting > > more and more important these days. > > > > Stephen > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev > Count on pandas taking a couple hours to compile on the raspi =) -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Sun Apr 7 20:58:48 2013 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 7 Apr 2013 11:58:48 -0700 Subject: [Pandas-dev] 0.11 timetable In-Reply-To: <2B3EC9CB-3843-41AF-B0E1-D9593EE6C249@yahoo.com> References: <2B3EC9CB-3843-41AF-B0E1-D9593EE6C249@yahoo.com> Message-ID: On Sun, Apr 7, 2013 at 10:00 AM, Jeff Reback wrote: > since this is a big release maybe do > a 0.11.0rc1 ? > > just in case something crops up? > > I can be reached on my cell 917-971-6387 > > On Apr 7, 2013, at 12:55 PM, Wes McKinney wrote: > > > > > On Sun, Apr 7, 2013 at 7:56 AM, Jeff Reback wrote: > >> Wes >> >> any thoughts on timetable? >> >> a number d issues left but most can prob be bumped to 0.12? >> >> >> >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> http://mail.python.org/mailman/listinfo/pandas-dev >> > > > I'll look through open issues and PRs today and clean things up. I think > we can should be able to cut the release this week. I also have my Jenkins > box again so I'll get that up and running for the windows builds. > > - Wes > > Yeah that's my thinking. Maybe make the RC tomorrow or Tuesday with release a week later ish -------------- next part -------------- An HTML attachment was scrubbed... URL: From yoval at gmx.com Sun Apr 7 21:20:21 2013 From: yoval at gmx.com (yoval p.) Date: Sun, 07 Apr 2013 21:20:21 +0200 Subject: [Pandas-dev] On merging new APIs into pandas Message-ID: <20130407192021.210360@gmx.com> Hi, >From time to time, new ideas for user APIs pop up on the issue tracker. some are wacky and experimental, some a bade idea, and some genuinely useful to users. Adding user APIs is possibly the most sensitive type of change because you only get one shot at getting it right before it becomes legacy you have to support in existing code, so getting a form/functionality/signature right is a real concern. Also, whenever you add a new one , you "burn" a verb, since you replace it's meaning down the line without unacceptable confusion, so if you're targetting a "big one" like "choose" or "pick" or "grep", you had better be damn sure you got it right. Because of these reasons, I'm obviously hesitant to introduce new APIs without a reasonable amount of discussion and concensus, and to be honest, an ok from wes. OTOH, I think there are good ideas stagnating on the issue tracker because we don't have an accepted way (ad-hoc, light, but known and sanctioned by wes) for making these types of "crucial" changes. I'd like a couple of things to happen: 1) we should introduce a "sandbox" formalism, for shipping experimental new features in a release while retaining the freedom to make breaking changes when/if they are rolled back into the "official" API, this could be backed by a pd.option.sandbox.enable_foo = True mechanism 2) Would be glad to hear from wes on what he considers an acceptable way to get new APIs in. Cheers, Yoval -------------- next part -------------- An HTML attachment was scrubbed... URL: From swlin at post.harvard.edu Sun Apr 7 22:32:22 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Sun, 7 Apr 2013 16:32:22 -0400 Subject: [Pandas-dev] Anyone tested pandas on ARM ever? In-Reply-To: References: Message-ID: better than 12 hours or so within QEMU :D On Sun, Apr 7, 2013 at 2:58 PM, Wes McKinney wrote: > > > > On Sat, Apr 6, 2013 at 7:44 PM, Stephen Lin wrote: >> >> OK, happy to report that pandas master and all dependencies (hard and >> soft) build fine on ARM :) >> Tests are too slow to run within QEMU though... >> Will test on a Raspberry Pi when mine gets delivered, at some point. >> >> On Sat, Apr 6, 2013 at 12:07 AM, Stephen Lin >> wrote: >> > Just curious -- I am setting up an ARM environment for an ARM-specific >> > llvm/clang patch I'm working on. Might be interesting to port pandas >> > over, too, if it doesn't work already...the architecture is getting >> > more and more important these days. >> > >> > Stephen >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> http://mail.python.org/mailman/listinfo/pandas-dev > > > > Count on pandas taking a couple hours to compile on the raspi =) > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev > From wesmckinn at gmail.com Mon Apr 8 05:11:22 2013 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 7 Apr 2013 20:11:22 -0700 Subject: [Pandas-dev] On merging new APIs into pandas In-Reply-To: <20130407192021.210360@gmx.com> References: <20130407192021.210360@gmx.com> Message-ID: On Sun, Apr 7, 2013 at 12:20 PM, yoval p. wrote: > Hi, > > From time to time, new ideas for user APIs pop up on the issue > tracker. some are wacky and experimental, some a bade idea, > and some genuinely useful to users. > > Adding user APIs is possibly the most sensitive type of change > because you only get one shot at getting it right before it > becomes legacy you have to support in existing code, so getting > a form/functionality/signature right is a real concern. Also, whenever > you add a new one , you "burn" a verb, since you replace > it's meaning down the line without unacceptable confusion, > so if you're targetting a "big one" like "choose" or "pick" > or "grep", you had better be damn sure you got it right. > > Because of these reasons, I'm obviously hesitant to introduce > new APIs without a reasonable amount of discussion and concensus, > and to be honest, an ok from wes. > OTOH, I think there are good ideas stagnating on the issue tracker > because we don't have an accepted way (ad-hoc, light, but known > and sanctioned by wes) for making these types of "crucial" changes. > > I'd like a couple of things to happen: > 1) we should introduce a "sandbox" formalism, for shipping > experimental new features in a release while retaining the freedom to > make breaking changes when/if they are rolled back into > the "official" API, this could be backed by a pd.option.sandbox.enable_foo > = True > mechanism > 2) Would be glad to hear from wes on what he considers an acceptable > way to get new APIs in. > > Cheers, > Yoval > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev > > I don't have too many rigid thoughts about this. Placing code in the sandbox until we can develop consensus about the API makes sense (I like the experimental configuration option to the extent that it isn't onerous for you the developer). I guess I've sort of had free reign to add APIs as I please for a long time and now we should be a bit more conservative about API sprawl (here you've been stuck maintaining all my APIs!). Beyond the typical bikeshed discussions around argument names and default options, I'm comfortable with you guys practicing common sense so let's just see how it goes -- for new APIs the pull request initially is the right place to hash(things out). - Wes -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Fri Apr 12 20:08:54 2013 From: wesmckinn at gmail.com (Wes McKinney) Date: Fri, 12 Apr 2013 11:08:54 -0700 Subject: [Pandas-dev] Cutting an 0.11 RC today? Message-ID: Any objections? I'll wait a few hours for some of the perf issues to shake out. - Wes -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeffreback at gmail.com Fri Apr 12 23:16:38 2013 From: jeffreback at gmail.com (Jeff Reback) Date: Fri, 12 Apr 2013 17:16:38 -0400 Subject: [Pandas-dev] Cutting an 0.11 RC today? In-Reply-To: References: Message-ID: <36A43C57-670B-4CB0-BA11-615258DF6BC6@gmail.com> good to go I think y-p / and lodagro are chasing the last open perf issue I can be reached on my cell 917-971-6387 On Apr 12, 2013, at 2:08 PM, Wes McKinney wrote: > Any objections? I'll wait a few hours for some of the perf issues to shake out. > > - Wes > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev From jreback at yahoo.com Sun Apr 14 23:20:29 2013 From: jreback at yahoo.com (Jeff Reback) Date: Sun, 14 Apr 2013 17:20:29 -0400 Subject: [Pandas-dev] Fwd: [Pytables-users] ANN: numexpr 2.1 RC1 available! References: <516B0F53.7070309@gmail.com> Message-ID: <32EA0071-168A-4885-BE0C-B37379B5D312@yahoo.com> when this is final will want to pick this up in the 3.2 full dep Travis build (if its not already) Begin forwarded message: > From: Francesc Alted > Date: April 14, 2013, 4:19:31 PM EDT > To: numexpr at googlegroups.com > Cc: Discussion list for PyTables > Subject: [Pytables-users] ANN: numexpr 2.1 RC1 available! > Reply-To: Discussion list for PyTables > > ============================ > Announcing Numexpr 2.1RC1 > ============================ > > Numexpr is a fast numerical expression evaluator for NumPy. With it, > expressions that operate on arrays (like "3*a+4*b") are accelerated > and use less memory than doing the same calculation in Python. > > It wears multi-threaded capabilities, as well as support for Intel's > VML library, which allows for squeezing the last drop of performance > out of your multi-core processors. > > What's new > ========== > > This version adds compatibility for Python 3. A bunch of thanks to > Antonio Valentino for his excelent work on this.I apologize for taking > so long in releasing his contributions. > > In case you want to know more in detail what has changed in this > version, see: > > http://code.google.com/p/numexpr/wiki/ReleaseNotes > > or have a look at RELEASE_NOTES.txt in the tarball. > > Where I can find Numexpr? > ========================= > > The project is hosted at Google code in: > > http://code.google.com/p/numexpr/ > > This is a release candidate 1, so it will not be available on the PyPi > repository. I'll post it there when the final version will released. > > Share your experience > ===================== > > Let us know of any bugs, suggestions, gripes, kudos, etc. you may > have. > > > Enjoy! > > -- > Francesc Alted > > > ------------------------------------------------------------------------------ > Precog is a next-generation analytics platform capable of advanced > analytics on semi-structured data. The platform includes APIs for building > apps and a phenomenal toolset for data science. Developers can use > our toolset for easy data analysis & visualization. Get a free account! > http://www2.precog.com/precogplatform/slashdotnewsletter > _______________________________________________ > Pytables-users mailing list > Pytables-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From yoval at gmx.com Mon Apr 15 06:45:52 2013 From: yoval at gmx.com (yoval p.) Date: Mon, 15 Apr 2013 07:45:52 +0300 Subject: [Pandas-dev] Fwd: [Pytables-users] ANN: numexpr 2.1 RC1 available! In-Reply-To: <32EA0071-168A-4885-BE0C-B37379B5D312@yahoo.com> References: <516B0F53.7070309@gmail.com> <32EA0071-168A-4885-BE0C-B37379B5D312@yahoo.com> Message-ID: <516B8600.3050204@gmx.com> The Travis-ci setup just uses pip to install the latest version. So it should be picked up automatically when the new version is available on pypi. It worked fine for recent release of mpl 1.2.1. Someday, it would be nice to do a full build matrix across supported dep versions to, but we'd need more CI muscle then travis currently provides. On 04/15/2013 12:20 AM, Jeff Reback wrote: > when this is final will want to pick this up in the 3.2 full dep Travis > build (if its not already) From yoval at gmx.com Wed Apr 17 16:43:08 2013 From: yoval at gmx.com (yoval p.) Date: Wed, 17 Apr 2013 17:43:08 +0300 Subject: [Pandas-dev] Congratulations! You may have already saved 8 minutes per travis build. Message-ID: <516EB4FC.4020201@gmx.com> Hi, I just merged PR #3383, which allows whitelisted forks to use a network cache server for speeding up travis. All commit-bit bearing persons have has their fork whitelisted. The upstream pydata repo is also whitelisted, naturally. Proving that often all you really need to do is ask nicely, if you include the magic incantation "PLEASE_TRAVIS_FASTER" (just "PTF" also works) anywhere in the commit message, the travis build time should drop significantly, to under 4 minutes in the common case. Example: https://travis-ci.org/y-p/pandas/builds/6416089 Cheers, y-p From wesmckinn at gmail.com Thu Apr 18 00:46:26 2013 From: wesmckinn at gmail.com (Wes McKinney) Date: Wed, 17 Apr 2013 15:46:26 -0700 Subject: [Pandas-dev] Congratulations! You may have already saved 8 minutes per travis build. In-Reply-To: <516EB4FC.4020201@gmx.com> References: <516EB4FC.4020201@gmx.com> Message-ID: On Wed, Apr 17, 2013 at 7:43 AM, yoval p. wrote: > Hi, > > I just merged PR #3383, which allows whitelisted forks > to use a network cache server for speeding up travis. > All commit-bit bearing persons have has their fork whitelisted. > The upstream pydata repo is also whitelisted, naturally. > > Proving that often all you really need to do is ask nicely, > if you include the magic incantation "PLEASE_TRAVIS_FASTER" > (just "PTF" also works) anywhere in the commit message, the > travis build time should drop significantly, to under 4 minutes > in the common case. > > Example: > https://travis-ci.org/y-p/pandas/builds/6416089 > > Cheers, > y-p > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev Outstanding work! Faster builds --> get moar work done. - Wes From jeffreback at gmail.com Mon Apr 22 00:01:33 2013 From: jeffreback at gmail.com (Jeff Reback) Date: Sun, 21 Apr 2013 18:01:33 -0400 Subject: [Pandas-dev] pickle is evil Message-ID: I thought I'd share a particularly evil pickle issue. In my refactor of Series to not subclass ndarray, the new pickling tests were breaking. No suprise because I changed __getstate__ to pickle via the BlockManager. In order to ensure compat I thought I could just fix __setstate__ and figure out what to do based on the return state (e.g. the len of the state returned as a tuple or dict or whatever). But no...apparently the reconstruction algorithm takes the class name that it see and tries to create it w/o using __new__ (or anything else that you can intercept), it uses a builtin method called _reconstruct (which is a builtin, but I can't figure out how to override it at all, must be only c-code). And then numpy gets ahold of it (as its an extension type), and complains becuase the class I am trying to instantiate actually isn't a sub-class of ndarray (which it pre-supposes). So, a bit hacky, but using a custom unpickler, then matching on a compatbility class (that sub-classes from ndarray), allows me to return the correct class. The good thing here is that this whole routine isn't even called unless there is a TypeError on the original unpickle whoosh! -------- # new module: compat/unpickle_compat.py import numpy as np import pandas from pandas.core.series import Series from pandas.sparse.series import SparseSeries import pickle class Unpickler(pickle.Unpickler): pass def load_reduce(self): stack = self.stack args = stack.pop() func = stack[-1] if type(args[0]) is type: n = args[0].__name__ if n == 'DeprecatedSeries': stack[-1] = object.__new__(Series) return elif n == 'DeprecatedSparseSeries': stack[-1] = object.__new__(SparseSeries) return value = func(*args) stack[-1] = value Unpickler.dispatch['R'] = load_reduce def load(file): # try to load a compatibility pickle # fake the old class hierarchy # if it works, then return the new type objects try: pandas.core.series.Series = DeprecatedSeries pandas.sparse.series.SparseSeries = DeprecatedSparseSeries with open(file,'rb') as fh: return Unpickler(fh).load() except: raise finally: pandas.core.series.Series = Series pandas.sparse.series.SparseSeries = SparseSeries class DeprecatedSeries(Series, np.ndarray): pass class DeprecatedSparseSeries(DeprecatedSeries): pass -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Mon Apr 22 03:01:40 2013 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 21 Apr 2013 18:01:40 -0700 Subject: [Pandas-dev] pickle is evil In-Reply-To: References: Message-ID: On Sun, Apr 21, 2013 at 3:01 PM, Jeff Reback wrote: > I thought I'd share a particularly evil pickle issue. In my refactor of > Series to not subclass ndarray, the new pickling tests were breaking. No > suprise > because I changed __getstate__ to pickle via the BlockManager. In order to > ensure compat I thought I could just fix __setstate__ and figure out what to > do > based on the return state (e.g. the len of the state returned as a tuple or > dict or whatever). > > But no...apparently the reconstruction algorithm takes the class name that > it see and tries to create it w/o using __new__ (or anything else that you > can intercept), > it uses a builtin method called _reconstruct (which is a builtin, but I > can't figure out how to override it at all, must be only c-code). > > And then numpy gets ahold of it (as its an extension type), and complains > becuase the class I am trying to instantiate actually isn't a sub-class of > ndarray > (which it pre-supposes). > > So, a bit hacky, but using a custom unpickler, then matching on a > compatbility class (that sub-classes from ndarray), allows me to return the > correct class. > > The good thing here is that this whole routine isn't even called unless > there is a TypeError on the original unpickle > > whoosh! > > -------- > # new module: compat/unpickle_compat.py > > import numpy as np > import pandas > from pandas.core.series import Series > from pandas.sparse.series import SparseSeries > import pickle > > class Unpickler(pickle.Unpickler): > pass > > def load_reduce(self): > stack = self.stack > args = stack.pop() > func = stack[-1] > if type(args[0]) is type: > n = args[0].__name__ > if n == 'DeprecatedSeries': > stack[-1] = object.__new__(Series) > return > elif n == 'DeprecatedSparseSeries': > stack[-1] = object.__new__(SparseSeries) > return > > value = func(*args) > stack[-1] = value > > Unpickler.dispatch['R'] = load_reduce > > def load(file): > # try to load a compatibility pickle > # fake the old class hierarchy > # if it works, then return the new type objects > > try: > pandas.core.series.Series = DeprecatedSeries > pandas.sparse.series.SparseSeries = DeprecatedSparseSeries > with open(file,'rb') as fh: > return Unpickler(fh).load() > except: > raise > finally: > pandas.core.series.Series = Series > pandas.sparse.series.SparseSeries = SparseSeries > > class DeprecatedSeries(Series, np.ndarray): > pass > > class DeprecatedSparseSeries(DeprecatedSeries): > pass > > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev > Yes, pickle is evil. Will this fix affect pickle.loads/pickle.dumps? I would prefer to get a msgpack or Avro-based serialization format for Series or DataFrame sorted out before we start gutting the internals of the objects. - Wes From jeffreback at gmail.com Mon Apr 22 03:12:46 2013 From: jeffreback at gmail.com (Jeff Reback) Date: Sun, 21 Apr 2013 21:12:46 -0400 Subject: [Pandas-dev] pickle is evil In-Reply-To: References: Message-ID: <6B89BC1B-6D3C-4B61-A353-7366991D7D02@gmail.com> avro (better choice that msgpack I think) will be very straightforward add on the format should prob be done independently of internals anyhow at the price of a bit more code, or could store block managers and be somewhat code simpler On Apr 21, 2013, at 9:01 PM, Wes McKinney wrote: > On Sun, Apr 21, 2013 at 3:01 PM, Jeff Reback wrote: >> I thought I'd share a particularly evil pickle issue. In my refactor of >> Series to not subclass ndarray, the new pickling tests were breaking. No >> suprise >> because I changed __getstate__ to pickle via the BlockManager. In order to >> ensure compat I thought I could just fix __setstate__ and figure out what to >> do >> based on the return state (e.g. the len of the state returned as a tuple or >> dict or whatever). >> >> But no...apparently the reconstruction algorithm takes the class name that >> it see and tries to create it w/o using __new__ (or anything else that you >> can intercept), >> it uses a builtin method called _reconstruct (which is a builtin, but I >> can't figure out how to override it at all, must be only c-code). >> >> And then numpy gets ahold of it (as its an extension type), and complains >> becuase the class I am trying to instantiate actually isn't a sub-class of >> ndarray >> (which it pre-supposes). >> >> So, a bit hacky, but using a custom unpickler, then matching on a >> compatbility class (that sub-classes from ndarray), allows me to return the >> correct class. >> >> The good thing here is that this whole routine isn't even called unless >> there is a TypeError on the original unpickle >> >> whoosh! >> >> -------- >> # new module: compat/unpickle_compat.py >> >> import numpy as np >> import pandas >> from pandas.core.series import Series >> from pandas.sparse.series import SparseSeries >> import pickle >> >> class Unpickler(pickle.Unpickler): >> pass >> >> def load_reduce(self): >> stack = self.stack >> args = stack.pop() >> func = stack[-1] >> if type(args[0]) is type: >> n = args[0].__name__ >> if n == 'DeprecatedSeries': >> stack[-1] = object.__new__(Series) >> return >> elif n == 'DeprecatedSparseSeries': >> stack[-1] = object.__new__(SparseSeries) >> return >> >> value = func(*args) >> stack[-1] = value >> >> Unpickler.dispatch['R'] = load_reduce >> >> def load(file): >> # try to load a compatibility pickle >> # fake the old class hierarchy >> # if it works, then return the new type objects >> >> try: >> pandas.core.series.Series = DeprecatedSeries >> pandas.sparse.series.SparseSeries = DeprecatedSparseSeries >> with open(file,'rb') as fh: >> return Unpickler(fh).load() >> except: >> raise >> finally: >> pandas.core.series.Series = Series >> pandas.sparse.series.SparseSeries = SparseSeries >> >> class DeprecatedSeries(Series, np.ndarray): >> pass >> >> class DeprecatedSparseSeries(DeprecatedSeries): >> pass >> >> >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> http://mail.python.org/mailman/listinfo/pandas-dev > > Yes, pickle is evil. Will this fix affect pickle.loads/pickle.dumps? I > would prefer to get a msgpack or Avro-based serialization format for > Series or DataFrame sorted out before we start gutting the internals > of the objects. > > - Wes > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev From jeffreback at gmail.com Mon Apr 22 03:19:48 2013 From: jeffreback at gmail.com (Jeff Reback) Date: Sun, 21 Apr 2013 21:19:48 -0400 Subject: [Pandas-dev] pickle is evil In-Reply-To: <6B89BC1B-6D3C-4B61-A353-7366991D7D02@gmail.com> References: <6B89BC1B-6D3C-4B61-A353-7366991D7D02@gmail.com> Message-ID: I realized I didn't answer your question this just catches on pickle.load try: pickle.load except (TypeError): pickle_compat.load except: if not PY3: raise # try to I unpickle with an encoding here On Apr 21, 2013, at 9:12 PM, Jeff Reback wrote: > avro (better choice that msgpack I think) > will be very straightforward add on > > the format should prob be done independently of internals anyhow at the price of a bit more code, or could store block managers and be somewhat code simpler > > > > On Apr 21, 2013, at 9:01 PM, Wes McKinney wrote: > >> On Sun, Apr 21, 2013 at 3:01 PM, Jeff Reback wrote: >>> I thought I'd share a particularly evil pickle issue. In my refactor of >>> Series to not subclass ndarray, the new pickling tests were breaking. No >>> suprise >>> because I changed __getstate__ to pickle via the BlockManager. In order to >>> ensure compat I thought I could just fix __setstate__ and figure out what to >>> do >>> based on the return state (e.g. the len of the state returned as a tuple or >>> dict or whatever). >>> >>> But no...apparently the reconstruction algorithm takes the class name that >>> it see and tries to create it w/o using __new__ (or anything else that you >>> can intercept), >>> it uses a builtin method called _reconstruct (which is a builtin, but I >>> can't figure out how to override it at all, must be only c-code). >>> >>> And then numpy gets ahold of it (as its an extension type), and complains >>> becuase the class I am trying to instantiate actually isn't a sub-class of >>> ndarray >>> (which it pre-supposes). >>> >>> So, a bit hacky, but using a custom unpickler, then matching on a >>> compatbility class (that sub-classes from ndarray), allows me to return the >>> correct class. >>> >>> The good thing here is that this whole routine isn't even called unless >>> there is a TypeError on the original unpickle >>> >>> whoosh! >>> >>> -------- >>> # new module: compat/unpickle_compat.py >>> >>> import numpy as np >>> import pandas >>> from pandas.core.series import Series >>> from pandas.sparse.series import SparseSeries >>> import pickle >>> >>> class Unpickler(pickle.Unpickler): >>> pass >>> >>> def load_reduce(self): >>> stack = self.stack >>> args = stack.pop() >>> func = stack[-1] >>> if type(args[0]) is type: >>> n = args[0].__name__ >>> if n == 'DeprecatedSeries': >>> stack[-1] = object.__new__(Series) >>> return >>> elif n == 'DeprecatedSparseSeries': >>> stack[-1] = object.__new__(SparseSeries) >>> return >>> >>> value = func(*args) >>> stack[-1] = value >>> >>> Unpickler.dispatch['R'] = load_reduce >>> >>> def load(file): >>> # try to load a compatibility pickle >>> # fake the old class hierarchy >>> # if it works, then return the new type objects >>> >>> try: >>> pandas.core.series.Series = DeprecatedSeries >>> pandas.sparse.series.SparseSeries = DeprecatedSparseSeries >>> with open(file,'rb') as fh: >>> return Unpickler(fh).load() >>> except: >>> raise >>> finally: >>> pandas.core.series.Series = Series >>> pandas.sparse.series.SparseSeries = SparseSeries >>> >>> class DeprecatedSeries(Series, np.ndarray): >>> pass >>> >>> class DeprecatedSparseSeries(DeprecatedSeries): >>> pass >>> >>> >>> _______________________________________________ >>> Pandas-dev mailing list >>> Pandas-dev at python.org >>> http://mail.python.org/mailman/listinfo/pandas-dev >> >> Yes, pickle is evil. Will this fix affect pickle.loads/pickle.dumps? I >> would prefer to get a msgpack or Avro-based serialization format for >> Series or DataFrame sorted out before we start gutting the internals >> of the objects. >> >> - Wes >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> http://mail.python.org/mailman/listinfo/pandas-dev From wesmckinn at gmail.com Mon Apr 22 07:51:44 2013 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 21 Apr 2013 22:51:44 -0700 Subject: [Pandas-dev] 0.11 highlights? Message-ID: Could I get a 2-paragraph summary of 0.11 highlights for the release e-mail? I'm going to make sure Windows is good, take a look at the two parser issues, and then look to cut the release in the next 48 hours. Any pressing concerns? thanks, Wes From jeffreback at gmail.com Mon Apr 22 13:52:34 2013 From: jeffreback at gmail.com (Jeff Reback) Date: Mon, 22 Apr 2013 07:52:34 -0400 Subject: [Pandas-dev] 0.11 highlights? In-Reply-To: References: Message-ID: <5021C3A4-F6BB-425F-92C1-28D7370B6A46@gmail.com> can include html links but no rst right? I can be reached on my cell 917-971-6387 On Apr 22, 2013, at 1:51 AM, Wes McKinney wrote: > Could I get a 2-paragraph summary of 0.11 highlights for the release > e-mail? I'm going to make sure Windows is good, take a look at the two > parser issues, and then look to cut the release in the next 48 hours. > Any pressing concerns? > > thanks, > Wes > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev From yoval at gmx.com Mon Apr 22 14:18:29 2013 From: yoval at gmx.com (yoval p.) Date: Mon, 22 Apr 2013 15:18:29 +0300 Subject: [Pandas-dev] 0.11 highlights? In-Reply-To: References: Message-ID: <51752A95.4040205@gmx.com> Totally biased, user-visible list: - My favorite for the release, the new slicing methods .i/loc,i/at. - Large boost to csv export perf. - Fixed color cycling in plots (thanks @lesteve). - df slicing more consistent with timeseries (df['2001']). - Added missing one-liner comfort floated on ml (#3275), with index.to_series(). - Docs theme change and new opt-in mpl style. - Jeff's cookbook. I'll put the repr() saga to bed today. yp On 04/22/2013 08:51 AM, Wes McKinney wrote: > Could I get a 2-paragraph summary of 0.11 highlights for the release > e-mail? I'm going to make sure Windows is good, take a look at the two > parser issues, and then look to cut the release in the next 48 hours. > Any pressing concerns? > > thanks, > Wes > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev > From jeffreback at gmail.com Mon Apr 22 14:34:48 2013 From: jeffreback at gmail.com (Jeff Reback) Date: Mon, 22 Apr 2013 08:34:48 -0400 Subject: [Pandas-dev] 0.11 highlights? In-Reply-To: References: Message-ID: Biased too :) - New selection methods .i/loc, i/at - Dtype propogation and coexistance - Improved performance of df.to_csv() - Numexpr integration to accelerate operator evaluation - Improved timedelta operations - 10 Min to Pandas - New user introduction On Mon, Apr 22, 2013 at 1:51 AM, Wes McKinney wrote: > Could I get a 2-paragraph summary of 0.11 highlights for the release > e-mail? I'm going to make sure Windows is good, take a look at the two > parser issues, and then look to cut the release in the next 48 hours. > Any pressing concerns? > > thanks, > Wes > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yoval at gmx.com Mon Apr 22 17:22:35 2013 From: yoval at gmx.com (yoval p.) Date: Mon, 22 Apr 2013 18:22:35 +0300 Subject: [Pandas-dev] 0.11 highlights? In-Reply-To: References: Message-ID: <517555BB.4040406@gmx.com> s***, Totally left out the numexpr integration. big win! :) On 04/22/2013 03:34 PM, Jeff Reback wrote: > Biased too :) > > - New selection methods .i/loc, i/at > - Dtype propogation and coexistance > - Improved performance of df.to_csv() > - Numexpr integration to accelerate operator evaluation > - Improved timedelta operations > - 10 Min to Pandas - New user introduction > > > On Mon, Apr 22, 2013 at 1:51 AM, Wes McKinney > wrote: > > Could I get a 2-paragraph summary of 0.11 highlights for the release > e-mail? I'm going to make sure Windows is good, take a look at the two > parser issues, and then look to cut the release in the next 48 hours. > Any pressing concerns? > > thanks, > Wes > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev > > > > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev > From jreback at yahoo.com Mon Apr 22 18:14:55 2013 From: jreback at yahoo.com (Jeff Reback) Date: Mon, 22 Apr 2013 12:14:55 -0400 Subject: [Pandas-dev] docs Message-ID: <56FF2DAE-CB33-4499-9902-26A4100E3724@yahoo.com> Wes after release is it possible to do a doc update at some point? (ok the released docs) if so just merge into master and ten u build docs and update? (like u do now for dev) or is this more trouble than worth? I just have 1 specifc in mind, but won't get to it for a bit From changshe at gmail.com Mon Apr 22 20:34:40 2013 From: changshe at gmail.com (Chang She) Date: Mon, 22 Apr 2013 11:34:40 -0700 Subject: [Pandas-dev] docs In-Reply-To: <56FF2DAE-CB33-4499-9902-26A4100E3724@yahoo.com> References: <56FF2DAE-CB33-4499-9902-26A4100E3724@yahoo.com> Message-ID: we can update the released docs whenever. It's not that big of a hassle. They just live on a different directory on the pydata server. On Mon, Apr 22, 2013 at 9:14 AM, Jeff Reback wrote: > Wes > > after release is it possible to do a doc update at some point? (ok the released docs) > > if so just merge into master and ten u build docs and update? (like u do now for dev) > > or is this more trouble than worth? > > I just have 1 specifc in mind, but won't get to it for a bit > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev From jeffreback at gmail.com Mon Apr 22 21:05:15 2013 From: jeffreback at gmail.com (Jeff Reback) Date: Mon, 22 Apr 2013 15:05:15 -0400 Subject: [Pandas-dev] docs In-Reply-To: References: <56FF2DAE-CB33-4499-9902-26A4100E3724@yahoo.com> Message-ID: great thxs I can be reached on my cell 917-971-6387 On Apr 22, 2013, at 2:34 PM, Chang She wrote: > we can update the released docs whenever. It's not that big of a > hassle. They just live on a different directory on the pydata server. > > On Mon, Apr 22, 2013 at 9:14 AM, Jeff Reback wrote: >> Wes >> >> after release is it possible to do a doc update at some point? (ok the released docs) >> >> if so just merge into master and ten u build docs and update? (like u do now for dev) >> >> or is this more trouble than worth? >> >> I just have 1 specifc in mind, but won't get to it for a bit >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> http://mail.python.org/mailman/listinfo/pandas-dev > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev From jeffreback at gmail.com Mon Apr 22 21:05:15 2013 From: jeffreback at gmail.com (Jeff Reback) Date: Mon, 22 Apr 2013 15:05:15 -0400 Subject: [Pandas-dev] docs In-Reply-To: References: <56FF2DAE-CB33-4499-9902-26A4100E3724@yahoo.com> Message-ID: great thxs I can be reached on my cell 917-971-6387 On Apr 22, 2013, at 2:34 PM, Chang She wrote: > we can update the released docs whenever. It's not that big of a > hassle. They just live on a different directory on the pydata server. > > On Mon, Apr 22, 2013 at 9:14 AM, Jeff Reback wrote: >> Wes >> >> after release is it possible to do a doc update at some point? (ok the released docs) >> >> if so just merge into master and ten u build docs and update? (like u do now for dev) >> >> or is this more trouble than worth? >> >> I just have 1 specifc in mind, but won't get to it for a bit >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> http://mail.python.org/mailman/listinfo/pandas-dev > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev From wesmckinn at gmail.com Tue Apr 23 03:23:13 2013 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 22 Apr 2013 18:23:13 -0700 Subject: [Pandas-dev] 0.11 highlights? In-Reply-To: <517555BB.4040406@gmx.com> References: <517555BB.4040406@gmx.com> Message-ID: For some reason when I build the docs the ToC is not visible on the LHS. I think it's a Sphinx 1.2b1 bug. Not sure why a beta has been uploaded to PyPI. Rolled back to sphinx 1.1.3 and all is good now. On Mon, Apr 22, 2013 at 8:22 AM, yoval p. wrote: > s***, Totally left out the numexpr integration. big win! > :) > > On 04/22/2013 03:34 PM, Jeff Reback wrote: >> Biased too :) >> >> - New selection methods .i/loc, i/at >> - Dtype propogation and coexistance >> - Improved performance of df.to_csv() >> - Numexpr integration to accelerate operator evaluation >> - Improved timedelta operations >> - 10 Min to Pandas - New user introduction >> >> >> On Mon, Apr 22, 2013 at 1:51 AM, Wes McKinney > > wrote: >> >> Could I get a 2-paragraph summary of 0.11 highlights for the release >> e-mail? I'm going to make sure Windows is good, take a look at the two >> parser issues, and then look to cut the release in the next 48 hours. >> Any pressing concerns? >> >> thanks, >> Wes >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> http://mail.python.org/mailman/listinfo/pandas-dev >> >> >> >> >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> http://mail.python.org/mailman/listinfo/pandas-dev >> > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev From yoval at gmx.com Tue Apr 23 03:33:07 2013 From: yoval at gmx.com (yoval p.) Date: Tue, 23 Apr 2013 04:33:07 +0300 Subject: [Pandas-dev] 0.11 highlights? In-Reply-To: References: <517555BB.4040406@gmx.com> Message-ID: <5175E4D3.9070408@gmx.com> Major version bump, possibly breaking changes to base template. I'll take a look, but nothing too urgent I think. On 04/23/2013 04:23 AM, Wes McKinney wrote: > For some reason when I build the docs the ToC is not visible on the > LHS. I think it's a Sphinx 1.2b1 bug. Not sure why a beta has been > uploaded to PyPI. > > Rolled back to sphinx 1.1.3 and all is good now. > > On Mon, Apr 22, 2013 at 8:22 AM, yoval p. wrote: >> s***, Totally left out the numexpr integration. big win! >> :) >> >> On 04/22/2013 03:34 PM, Jeff Reback wrote: >>> Biased too :) >>> >>> - New selection methods .i/loc, i/at >>> - Dtype propogation and coexistance >>> - Improved performance of df.to_csv() >>> - Numexpr integration to accelerate operator evaluation >>> - Improved timedelta operations >>> - 10 Min to Pandas - New user introduction >>> >>> >>> On Mon, Apr 22, 2013 at 1:51 AM, Wes McKinney >> > wrote: >>> >>> Could I get a 2-paragraph summary of 0.11 highlights for the release >>> e-mail? I'm going to make sure Windows is good, take a look at the two >>> parser issues, and then look to cut the release in the next 48 hours. >>> Any pressing concerns? >>> >>> thanks, >>> Wes >>> _______________________________________________ >>> Pandas-dev mailing list >>> Pandas-dev at python.org >>> http://mail.python.org/mailman/listinfo/pandas-dev >>> >>> >>> >>> >>> _______________________________________________ >>> Pandas-dev mailing list >>> Pandas-dev at python.org >>> http://mail.python.org/mailman/listinfo/pandas-dev >>> >> >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> http://mail.python.org/mailman/listinfo/pandas-dev > From jreback at yahoo.com Tue Apr 23 04:40:04 2013 From: jreback at yahoo.com (Jeff Reback) Date: Mon, 22 Apr 2013 22:40:04 -0400 Subject: [Pandas-dev] 0.10, 0.10.1 on pydata page Message-ID: <855C601E-D301-4C88-BF62-B1E858CED45E@yahoo.com> prob should update the website whenever u have a chance for these versions From wesmckinn at gmail.com Tue Apr 23 04:46:40 2013 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 22 Apr 2013 19:46:40 -0700 Subject: [Pandas-dev] 0.10, 0.10.1 on pydata page In-Reply-To: <855C601E-D301-4C88-BF62-B1E858CED45E@yahoo.com> References: <855C601E-D301-4C88-BF62-B1E858CED45E@yahoo.com> Message-ID: On Mon, Apr 22, 2013 at 7:40 PM, Jeff Reback wrote: > prob should update the website whenever u have a chance for these versions > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev Doing this now From wesmckinn at gmail.com Tue Apr 23 20:52:46 2013 From: wesmckinn at gmail.com (Wes McKinney) Date: Tue, 23 Apr 2013 11:52:46 -0700 Subject: [Pandas-dev] pickle is evil In-Reply-To: References: <6B89BC1B-6D3C-4B61-A353-7366991D7D02@gmail.com> Message-ID: On Sun, Apr 21, 2013 at 6:19 PM, Jeff Reback wrote: > I realized I didn't answer your question > > this just catches on pickle.load > > try: > pickle.load > except (TypeError): > pickle_compat.load > except: > if not PY3: > raise > # try to I unpickle with an encoding here > > On Apr 21, 2013, at 9:12 PM, Jeff Reback wrote: > >> avro (better choice that msgpack I think) >> will be very straightforward add on >> >> the format should prob be done independently of internals anyhow at the price of a bit more code, or could store block managers and be somewhat code simpler >> >> >> >> On Apr 21, 2013, at 9:01 PM, Wes McKinney wrote: >> >>> On Sun, Apr 21, 2013 at 3:01 PM, Jeff Reback wrote: >>>> I thought I'd share a particularly evil pickle issue. In my refactor of >>>> Series to not subclass ndarray, the new pickling tests were breaking. No >>>> suprise >>>> because I changed __getstate__ to pickle via the BlockManager. In order to >>>> ensure compat I thought I could just fix __setstate__ and figure out what to >>>> do >>>> based on the return state (e.g. the len of the state returned as a tuple or >>>> dict or whatever). >>>> >>>> But no...apparently the reconstruction algorithm takes the class name that >>>> it see and tries to create it w/o using __new__ (or anything else that you >>>> can intercept), >>>> it uses a builtin method called _reconstruct (which is a builtin, but I >>>> can't figure out how to override it at all, must be only c-code). >>>> >>>> And then numpy gets ahold of it (as its an extension type), and complains >>>> becuase the class I am trying to instantiate actually isn't a sub-class of >>>> ndarray >>>> (which it pre-supposes). >>>> >>>> So, a bit hacky, but using a custom unpickler, then matching on a >>>> compatbility class (that sub-classes from ndarray), allows me to return the >>>> correct class. >>>> >>>> The good thing here is that this whole routine isn't even called unless >>>> there is a TypeError on the original unpickle >>>> >>>> whoosh! >>>> >>>> -------- >>>> # new module: compat/unpickle_compat.py >>>> >>>> import numpy as np >>>> import pandas >>>> from pandas.core.series import Series >>>> from pandas.sparse.series import SparseSeries >>>> import pickle >>>> >>>> class Unpickler(pickle.Unpickler): >>>> pass >>>> >>>> def load_reduce(self): >>>> stack = self.stack >>>> args = stack.pop() >>>> func = stack[-1] >>>> if type(args[0]) is type: >>>> n = args[0].__name__ >>>> if n == 'DeprecatedSeries': >>>> stack[-1] = object.__new__(Series) >>>> return >>>> elif n == 'DeprecatedSparseSeries': >>>> stack[-1] = object.__new__(SparseSeries) >>>> return >>>> >>>> value = func(*args) >>>> stack[-1] = value >>>> >>>> Unpickler.dispatch['R'] = load_reduce >>>> >>>> def load(file): >>>> # try to load a compatibility pickle >>>> # fake the old class hierarchy >>>> # if it works, then return the new type objects >>>> >>>> try: >>>> pandas.core.series.Series = DeprecatedSeries >>>> pandas.sparse.series.SparseSeries = DeprecatedSparseSeries >>>> with open(file,'rb') as fh: >>>> return Unpickler(fh).load() >>>> except: >>>> raise >>>> finally: >>>> pandas.core.series.Series = Series >>>> pandas.sparse.series.SparseSeries = SparseSeries >>>> >>>> class DeprecatedSeries(Series, np.ndarray): >>>> pass >>>> >>>> class DeprecatedSparseSeries(DeprecatedSeries): >>>> pass >>>> >>>> >>>> _______________________________________________ >>>> Pandas-dev mailing list >>>> Pandas-dev at python.org >>>> http://mail.python.org/mailman/listinfo/pandas-dev >>> >>> Yes, pickle is evil. Will this fix affect pickle.loads/pickle.dumps? I >>> would prefer to get a msgpack or Avro-based serialization format for >>> Series or DataFrame sorted out before we start gutting the internals >>> of the objects. >>> >>> - Wes >>> _______________________________________________ >>> Pandas-dev mailing list >>> Pandas-dev at python.org >>> http://mail.python.org/mailman/listinfo/pandas-dev The Deprecated hack have to be careful with as there could be threading issues. Oh boy. I'm not sure how much I want to support legacy pickles anyway, it would be better to have a release of pandas that enables pickle -> avro/msgpack serialized form so that people can migrate all their pickle data to that format, then we can feel free to break all the pickles, or at least versioning of serialized data becomes easier (when pickling/unpickling, we just pack the serialized bytes into the pickle, and that becomes something we can always unserialize) Sigh, it's 2013 and I've been talking about fixing the pickle/serialization problem since 2011, actually even earlier I think. Weekend project one of these days. - Wes From wesmckinn at gmail.com Tue Apr 23 21:08:46 2013 From: wesmckinn at gmail.com (Wes McKinney) Date: Tue, 23 Apr 2013 12:08:46 -0700 Subject: [Pandas-dev] 0.11 highlights? In-Reply-To: <5175E4D3.9070408@gmx.com> References: <517555BB.4040406@gmx.com> <5175E4D3.9070408@gmx.com> Message-ID: On Mon, Apr 22, 2013 at 6:33 PM, yoval p. wrote: > Major version bump, possibly breaking changes to base template. > I'll take a look, but nothing too urgent I think. > > On 04/23/2013 04:23 AM, Wes McKinney wrote: >> For some reason when I build the docs the ToC is not visible on the >> LHS. I think it's a Sphinx 1.2b1 bug. Not sure why a beta has been >> uploaded to PyPI. >> >> Rolled back to sphinx 1.1.3 and all is good now. >> >> On Mon, Apr 22, 2013 at 8:22 AM, yoval p. wrote: >>> s***, Totally left out the numexpr integration. big win! >>> :) >>> >>> On 04/22/2013 03:34 PM, Jeff Reback wrote: >>>> Biased too :) >>>> >>>> - New selection methods .i/loc, i/at >>>> - Dtype propogation and coexistance >>>> - Improved performance of df.to_csv() >>>> - Numexpr integration to accelerate operator evaluation >>>> - Improved timedelta operations >>>> - 10 Min to Pandas - New user introduction >>>> >>>> >>>> On Mon, Apr 22, 2013 at 1:51 AM, Wes McKinney >>> > wrote: >>>> >>>> Could I get a 2-paragraph summary of 0.11 highlights for the release >>>> e-mail? I'm going to make sure Windows is good, take a look at the two >>>> parser issues, and then look to cut the release in the next 48 hours. >>>> Any pressing concerns? >>>> >>>> thanks, >>>> Wes >>>> _______________________________________________ >>>> Pandas-dev mailing list >>>> Pandas-dev at python.org >>>> http://mail.python.org/mailman/listinfo/pandas-dev >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Pandas-dev mailing list >>>> Pandas-dev at python.org >>>> http://mail.python.org/mailman/listinfo/pandas-dev >>>> >>> >>> _______________________________________________ >>> Pandas-dev mailing list >>> Pandas-dev at python.org >>> http://mail.python.org/mailman/listinfo/pandas-dev >> > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev Alright release is out to the world. Thanks for all your hard work From wesmckinn at gmail.com Fri Apr 26 03:52:06 2013 From: wesmckinn at gmail.com (Wes McKinney) Date: Thu, 25 Apr 2013 18:52:06 -0700 Subject: [Pandas-dev] 0.11.x maintenance Message-ID: hey folks, what is the story with 0.12.x? Are we breaking any APIs yet? If so we should probably plan for a 0.11.1 that is *bugfix *only and we somewhat diligent about backporting critical bugfixes. Let me know what you think. - Wes From yoval at gmx.com Fri Apr 26 14:37:06 2013 From: yoval at gmx.com (yoval p.) Date: Fri, 26 Apr 2013 15:37:06 +0300 Subject: [Pandas-dev] 0.11.x maintenance In-Reply-To: References: Message-ID: <517A74F2.3020607@gmx.com> The story is that it was unclear if a bugtrack release is always done, and there is still no milestone for 0.11.1, which is how we're managing issues right now. confusion ensued. So we're doing a 0.11.1 - Obviously a good idea, already some serious bugs fixed. Breaking changes: There are basically 2 breaking changes, requiring to xlrd>=0.9.0 and reworking the repr for various things (Timestamp, *Index() now produce valid python code). Let me know what you want rolled back and deal with it. On the release engineering side of things: I think the wait for 0.11 to be release seriously compromised the development momentum. We did very well scheduling things for 0.12, but then things just got stuck waiting for 0.11rc1 then final and now 0.11.1, I think it's costing pandas too much to do things this way, there's obviously so much just waiting to be done, and PRs are for review, not for stagnation. As a special concern, this sort of things reduces jeff's output to a meager 20 PRs per day which is just net loss as far as pandas is concerned. OTOH maintaining multiple maintenance branches is extra work and makes the history more complex. There are alternative git workflows (gitflow, github are the usual fare), would be glad to hear opinions on what would work best. Please create a GH milestone for 0.11.1, and set the time-frame for it. Let me know if you want something rolled back for 0.11.1. Yoval On 04/26/2013 04:52 AM, Wes McKinney wrote: > hey folks, > > what is the story with 0.12.x? Are we breaking any APIs yet? If so we > should probably plan for a 0.11.1 that is *bugfix *only and we > somewhat diligent about backporting critical bugfixes. Let me know > what you think. > > - Wes > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev > From jeffreback at gmail.com Fri Apr 26 14:48:19 2013 From: jeffreback at gmail.com (Jeff Reback) Date: Fri, 26 Apr 2013 08:48:19 -0400 Subject: [Pandas-dev] 0.11.x maintenance In-Reply-To: <517A74F2.3020607@gmx.com> References: <517A74F2.3020607@gmx.com> Message-ID: aside from xlrd, and the minor HDFStore change (which should go in 0.11.1 in any event), I don't think any API changes in current master why don't we say 1/6wks for 0.11.1...and just try to do bug fixes / avoid big API changes I also would be -1 on mainting 2 released versions.....I think its reasonable to just push bigger changes to 0.12 and pull straightforward stuff to 0.11.1 On Fri, Apr 26, 2013 at 8:37 AM, yoval p. wrote: > The story is that it was unclear if a bugtrack release > is always done, and there is still no milestone for 0.11.1, which > is how we're managing issues right now. confusion ensued. > > So we're doing a 0.11.1 - Obviously a good idea, already > some serious bugs fixed. > > Breaking changes: > > There are basically 2 breaking changes, requiring > to xlrd>=0.9.0 and reworking the repr for various things > (Timestamp, *Index() now produce valid python code). > Let me know what you want rolled back and deal with it. > > On the release engineering side of things: > > I think the wait for 0.11 to be release seriously compromised > the development momentum. We did very well scheduling things > for 0.12, but then things just got stuck waiting for 0.11rc1 then final > and now 0.11.1, I think it's costing pandas too much to do things > this way, there's obviously so much just waiting to be done, and PRs > are for review, not for stagnation. > As a special concern, this sort of things reduces jeff's output to a > meager 20 PRs per day which is just net loss as far as pandas is concerned. > > OTOH maintaining multiple maintenance branches is extra work > and makes the history more complex. There are alternative > git workflows (gitflow, github are the usual fare), would be > glad to hear opinions on what would work best. > > > Please create a GH milestone for 0.11.1, and set the time-frame > for it. > > Let me know if you want something rolled back for 0.11.1. > > Yoval > > On 04/26/2013 04:52 AM, Wes McKinney wrote: > > hey folks, > > > > what is the story with 0.12.x? Are we breaking any APIs yet? If so we > > should probably plan for a 0.11.1 that is *bugfix *only and we > > somewhat diligent about backporting critical bugfixes. Let me know > > what you think. > > > > - Wes > > _______________________________________________ > > Pandas-dev mailing list > > Pandas-dev at python.org > > http://mail.python.org/mailman/listinfo/pandas-dev > > > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Fri Apr 26 20:07:33 2013 From: wesmckinn at gmail.com (Wes McKinney) Date: Fri, 26 Apr 2013 11:07:33 -0700 Subject: [Pandas-dev] 0.11.x maintenance In-Reply-To: <517A74F2.3020607@gmx.com> References: <517A74F2.3020607@gmx.com> Message-ID: On Fri, Apr 26, 2013 at 5:37 AM, yoval p. wrote: > The story is that it was unclear if a bugtrack release > is always done, and there is still no milestone for 0.11.1, which > is how we're managing issues right now. confusion ensued. > > So we're doing a 0.11.1 - Obviously a good idea, already > some serious bugs fixed. > > Breaking changes: > > There are basically 2 breaking changes, requiring > to xlrd>=0.9.0 and reworking the repr for various things > (Timestamp, *Index() now produce valid python code). > Let me know what you want rolled back and deal with it. > > On the release engineering side of things: > > I think the wait for 0.11 to be release seriously compromised > the development momentum. We did very well scheduling things > for 0.12, but then things just got stuck waiting for 0.11rc1 then final > and now 0.11.1, I think it's costing pandas too much to do things > this way, there's obviously so much just waiting to be done, and PRs > are for review, not for stagnation. > As a special concern, this sort of things reduces jeff's output to a > meager 20 PRs per day which is just net loss as far as pandas is concerned. > In retrospect it would have been better to make a 0.11.x branch earlier to enable forward development/PR-merging to continue while lingering bugs were being chased down for the release. I think we can do better next time. > OTOH maintaining multiple maintenance branches is extra work > and makes the history more complex. There are alternative > git workflows (gitflow, github are the usual fare), would be > glad to hear opinions on what would work best. > > > Please create a GH milestone for 0.11.1, and set the time-frame > for it. > > Let me know if you want something rolled back for 0.11.1. > > Yoval > > On 04/26/2013 04:52 AM, Wes McKinney wrote: >> hey folks, >> >> what is the story with 0.12.x? Are we breaking any APIs yet? If so we >> should probably plan for a 0.11.1 that is *bugfix *only and we >> somewhat diligent about backporting critical bugfixes. Let me know >> what you think. >> >> - Wes >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> http://mail.python.org/mailman/listinfo/pandas-dev >> > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev From wesmckinn at gmail.com Fri Apr 26 20:09:43 2013 From: wesmckinn at gmail.com (Wes McKinney) Date: Fri, 26 Apr 2013 11:09:43 -0700 Subject: [Pandas-dev] 0.11.x maintenance In-Reply-To: References: <517A74F2.3020607@gmx.com> Message-ID: I definitely wasn't suggesting 2 released versions, but just wanting to have a plan in the even that critical bug fixes require a minor release. Wouldn't want that to prevent significant PRs from getting merged. Hopefully said critical bug fixes would only require a handful of cherry-picks on top of the maintenance branch-- IPython has been doing as much with their minor releases but it definitely is extra work to have to backport bug fixes. On Fri, Apr 26, 2013 at 5:48 AM, Jeff Reback wrote: > > aside from xlrd, and the minor HDFStore change (which should go in 0.11.1 in > any event), I don't think any API changes in current master > > why don't we say 1/6wks for 0.11.1...and just try to do bug fixes / avoid > big API changes > > I also would be -1 on mainting 2 released versions.....I think its > reasonable to just push bigger changes to 0.12 and pull straightforward > stuff to 0.11.1 > > > On Fri, Apr 26, 2013 at 8:37 AM, yoval p. wrote: >> >> The story is that it was unclear if a bugtrack release >> is always done, and there is still no milestone for 0.11.1, which >> is how we're managing issues right now. confusion ensued. >> >> So we're doing a 0.11.1 - Obviously a good idea, already >> some serious bugs fixed. >> >> Breaking changes: >> >> There are basically 2 breaking changes, requiring >> to xlrd>=0.9.0 and reworking the repr for various things >> (Timestamp, *Index() now produce valid python code). >> Let me know what you want rolled back and deal with it. >> >> On the release engineering side of things: >> >> I think the wait for 0.11 to be release seriously compromised >> the development momentum. We did very well scheduling things >> for 0.12, but then things just got stuck waiting for 0.11rc1 then final >> and now 0.11.1, I think it's costing pandas too much to do things >> this way, there's obviously so much just waiting to be done, and PRs >> are for review, not for stagnation. >> As a special concern, this sort of things reduces jeff's output to a >> meager 20 PRs per day which is just net loss as far as pandas is >> concerned. >> >> OTOH maintaining multiple maintenance branches is extra work >> and makes the history more complex. There are alternative >> git workflows (gitflow, github are the usual fare), would be >> glad to hear opinions on what would work best. >> >> >> Please create a GH milestone for 0.11.1, and set the time-frame >> for it. >> >> Let me know if you want something rolled back for 0.11.1. >> >> Yoval >> >> On 04/26/2013 04:52 AM, Wes McKinney wrote: >> > hey folks, >> > >> > what is the story with 0.12.x? Are we breaking any APIs yet? If so we >> > should probably plan for a 0.11.1 that is *bugfix *only and we >> > somewhat diligent about backporting critical bugfixes. Let me know >> > what you think. >> > >> > - Wes >> > _______________________________________________ >> > Pandas-dev mailing list >> > Pandas-dev at python.org >> > http://mail.python.org/mailman/listinfo/pandas-dev >> > >> >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> http://mail.python.org/mailman/listinfo/pandas-dev > > > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev > From swlin at post.harvard.edu Fri Apr 26 23:12:37 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Fri, 26 Apr 2013 17:12:37 -0400 Subject: [Pandas-dev] Setuptools detection of host cpuinfo and/or header file availablity Message-ID: Hi Yoval, I had a question about setuptools that Wes suggested I run by you...do you know if there's there's a canonical (and, hopefully by extension, reliable and cross-platform...) way of checking for host cpu capabilities and header file availability in a python setuptools script to configure a build? This is for potentially including code using SSE intrinsics into pandas. Stephen From jeffreback at gmail.com Fri Apr 26 23:36:37 2013 From: jeffreback at gmail.com (Jeff Reback) Date: Fri, 26 Apr 2013 17:36:37 -0400 Subject: [Pandas-dev] Setuptools detection of host cpuinfo and/or header file availablity In-Reply-To: References: Message-ID: <9A761FD6-DF81-4D41-95AD-11A9532A60F8@gmail.com> I believe Numexpr does this (not sure it's 'reliable', but it's cross platform) I am think this is runtime - but could also be in setup On Apr 26, 2013, at 5:12 PM, Stephen Lin wrote: > Hi Yoval, > > I had a question about setuptools that Wes suggested I run by you...do > you know if there's there's a canonical (and, hopefully by extension, > reliable and cross-platform...) way of checking for host cpu > capabilities and header file availability in a python setuptools > script to configure a build? > > This is for potentially including code using SSE intrinsics into pandas. > > Stephen > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev From swlin at post.harvard.edu Fri Apr 26 23:41:17 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Fri, 26 Apr 2013 17:41:17 -0400 Subject: [Pandas-dev] Setuptools detection of host cpuinfo and/or header file availablity In-Reply-To: <9A761FD6-DF81-4D41-95AD-11A9532A60F8@gmail.com> References: <9A761FD6-DF81-4D41-95AD-11A9532A60F8@gmail.com> Message-ID: Well, I can do a runtime cpuid check but there's some expense for that (not for the check so much, but in making the executable include both code paths into the same executable and making it possible to select between them at runtime...). Also, that still leaves the header availability issue. On Fri, Apr 26, 2013 at 5:36 PM, Jeff Reback wrote: > I believe Numexpr does this (not sure it's 'reliable', but it's cross platform) > > I am think this is runtime - but could also be in setup > > On Apr 26, 2013, at 5:12 PM, Stephen Lin wrote: > >> Hi Yoval, >> >> I had a question about setuptools that Wes suggested I run by you...do >> you know if there's there's a canonical (and, hopefully by extension, >> reliable and cross-platform...) way of checking for host cpu >> capabilities and header file availability in a python setuptools >> script to configure a build? >> >> This is for potentially including code using SSE intrinsics into pandas. >> >> Stephen >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> http://mail.python.org/mailman/listinfo/pandas-dev From yoval at gmx.com Sat Apr 27 00:16:08 2013 From: yoval at gmx.com (yoval p.) Date: Sat, 27 Apr 2013 01:16:08 +0300 Subject: [Pandas-dev] Setuptools detection of host cpuinfo and/or header file availablity In-Reply-To: References: <9A761FD6-DF81-4D41-95AD-11A9532A60F8@gmail.com> Message-ID: <517AFCA8.2050908@gmx.com> Hi stephen, Nothing I know of that you can't google just as well as me or better. There's PyCPUID, which explicitly reports SSEx support, you could document it in the README and use it if it's available at build time. By "including intrinsics" do you mean setting compiler optimization flags that preclude older cpus from the binary ? or are you planning to embed asm in the cython files? I wouldn't put it past you. icc can automatically bloat the binary with parallel paths and choose at runtime like you describe IIRC. Yoval On 04/27/2013 12:41 AM, Stephen Lin wrote: > Well, I can do a runtime cpuid check but there's some expense for that > (not for the check so much, but in making the executable include both > code paths into the same executable and making it possible to select > between them at runtime...). Also, that still leaves the header > availability issue. > > On Fri, Apr 26, 2013 at 5:36 PM, Jeff Reback wrote: >> I believe Numexpr does this (not sure it's 'reliable', but it's cross platform) >> >> I am think this is runtime - but could also be in setup >> >> On Apr 26, 2013, at 5:12 PM, Stephen Lin wrote: >> >>> Hi Yoval, >>> >>> I had a question about setuptools that Wes suggested I run by you...do >>> you know if there's there's a canonical (and, hopefully by extension, >>> reliable and cross-platform...) way of checking for host cpu >>> capabilities and header file availability in a python setuptools >>> script to configure a build? >>> >>> This is for potentially including code using SSE intrinsics into pandas. >>> >>> Stephen >>> _______________________________________________ >>> Pandas-dev mailing list >>> Pandas-dev at python.org >>> http://mail.python.org/mailman/listinfo/pandas-dev > From swlin at post.harvard.edu Sat Apr 27 01:10:32 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Fri, 26 Apr 2013 19:10:32 -0400 Subject: [Pandas-dev] Setuptools detection of host cpuinfo and/or header file availablity In-Reply-To: <517AFCA8.2050908@gmx.com> References: <9A761FD6-DF81-4D41-95AD-11A9532A60F8@gmail.com> <517AFCA8.2050908@gmx.com> Message-ID: OK thanks anyway; haven't found anything authoritative online unfortunately. Also, not assembly but SSE intrinsics: http://stackoverflow.com/questions/11228855/header-files-for-simd-intrinsics The headers have data types that correspond to the SIMD registers and intrinsic functions that correspond to instructions, but it's not inline asm...instruction scheduling and register allocation are done for you. I don't know if all compilers we care about ship these headers though. Stephen On Fri, Apr 26, 2013 at 6:16 PM, yoval p. wrote: > Hi stephen, > > Nothing I know of that you can't google just as well as me or better. > There's PyCPUID, which explicitly reports SSEx support, you could > document it in the README and use it if it's available at build time. > > By "including intrinsics" do you mean setting compiler optimization > flags that preclude older cpus from the binary ? or are you planning to > embed asm in the cython files? I wouldn't put it past you. > > icc can automatically bloat the binary with parallel paths and choose at > runtime like you describe IIRC. > > Yoval > > On 04/27/2013 12:41 AM, Stephen Lin wrote: >> Well, I can do a runtime cpuid check but there's some expense for that >> (not for the check so much, but in making the executable include both >> code paths into the same executable and making it possible to select >> between them at runtime...). Also, that still leaves the header >> availability issue. >> >> On Fri, Apr 26, 2013 at 5:36 PM, Jeff Reback wrote: >>> I believe Numexpr does this (not sure it's 'reliable', but it's cross platform) >>> >>> I am think this is runtime - but could also be in setup >>> >>> On Apr 26, 2013, at 5:12 PM, Stephen Lin wrote: >>> >>>> Hi Yoval, >>>> >>>> I had a question about setuptools that Wes suggested I run by you...do >>>> you know if there's there's a canonical (and, hopefully by extension, >>>> reliable and cross-platform...) way of checking for host cpu >>>> capabilities and header file availability in a python setuptools >>>> script to configure a build? >>>> >>>> This is for potentially including code using SSE intrinsics into pandas. >>>> >>>> Stephen >>>> _______________________________________________ >>>> Pandas-dev mailing list >>>> Pandas-dev at python.org >>>> http://mail.python.org/mailman/listinfo/pandas-dev >> > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev From swlin at post.harvard.edu Sat Apr 27 01:14:51 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Fri, 26 Apr 2013 19:14:51 -0400 Subject: [Pandas-dev] Setuptools detection of host cpuinfo and/or header file availablity In-Reply-To: <517AFCA8.2050908@gmx.com> References: <9A761FD6-DF81-4D41-95AD-11A9532A60F8@gmail.com> <517AFCA8.2050908@gmx.com> Message-ID: > icc can automatically bloat the binary with parallel paths and choose at > runtime like you describe IIRC. Also, yeah, I've heard that it does that, but I don't trust a compiler to do this optimally :D how could it know where to place the check optimally? You don't want to do redundant checks within tight loops but you also don't want to create too many parallel code paths; there's also ABI/linking issues if it duplicates entire functions... Stephen From yoval at gmx.com Sat Apr 27 01:46:52 2013 From: yoval at gmx.com (yoval p.) Date: Sat, 27 Apr 2013 02:46:52 +0300 Subject: [Pandas-dev] Setuptools detection of host cpuinfo and/or header file availablity In-Reply-To: References: <9A761FD6-DF81-4D41-95AD-11A9532A60F8@gmail.com> <517AFCA8.2050908@gmx.com> Message-ID: <517B11EC.6010306@gmx.com> Conventional wisdom says that these days trying to outdo optimizing compilers is a fool's errand. Do you have numbers showing that manually coding at the instruction level can do significantly better then the compiler? If so, you're probably a modern day [John_Henry](https://en.wikipedia.org/wiki/John_Henry_(folklore)), In which case try not to drop dead at the end of a long and agonizing python packaging nightmare, as we'd like to see more of your tricks in pandas in the future. Yoval On 04/27/2013 02:14 AM, Stephen Lin wrote: >> icc can automatically bloat the binary with parallel paths and choose at >> runtime like you describe IIRC. > > Also, yeah, I've heard that it does that, but I don't trust a compiler > to do this optimally :D how could it know where to place the check > optimally? You don't want to do redundant checks within tight loops > but you also don't want to create too many parallel code paths; > there's also ABI/linking issues if it duplicates entire functions... > > Stephen > From swlin at post.harvard.edu Sat Apr 27 01:55:17 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Fri, 26 Apr 2013 19:55:17 -0400 Subject: [Pandas-dev] Setuptools detection of host cpuinfo and/or header file availablity In-Reply-To: <517B11EC.6010306@gmx.com> References: <9A761FD6-DF81-4D41-95AD-11A9532A60F8@gmail.com> <517AFCA8.2050908@gmx.com> <517B11EC.6010306@gmx.com> Message-ID: The compiler won't use the intrinsics on its own, unfortunately, since it doesn't know about the alignment and buffer size guarantees. I will be fixing this in llvm/clang soon enough :) Stephen On Fri, Apr 26, 2013 at 7:46 PM, yoval p. wrote: > Conventional wisdom says that these days trying to outdo > optimizing compilers is a fool's errand. > Do you have numbers showing that manually coding at the instruction > level can do significantly better then the compiler? > > If so, you're probably a modern day > [John_Henry](https://en.wikipedia.org/wiki/John_Henry_(folklore)), In > which case try not to drop dead at the end of a long and agonizing > python packaging nightmare, as we'd like to see more of your tricks in > pandas in the future. > > Yoval > > On 04/27/2013 02:14 AM, Stephen Lin wrote: >>> icc can automatically bloat the binary with parallel paths and choose at >>> runtime like you describe IIRC. >> >> Also, yeah, I've heard that it does that, but I don't trust a compiler >> to do this optimally :D how could it know where to place the check >> optimally? You don't want to do redundant checks within tight loops >> but you also don't want to create too many parallel code paths; >> there's also ABI/linking issues if it duplicates entire functions... >> >> Stephen >> > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > http://mail.python.org/mailman/listinfo/pandas-dev From swlin at post.harvard.edu Sat Apr 27 02:32:05 2013 From: swlin at post.harvard.edu (Stephen Lin) Date: Fri, 26 Apr 2013 20:32:05 -0400 Subject: [Pandas-dev] Setuptools detection of host cpuinfo and/or header file availablity In-Reply-To: References: <9A761FD6-DF81-4D41-95AD-11A9532A60F8@gmail.com> <517AFCA8.2050908@gmx.com> <517B11EC.6010306@gmx.com> Message-ID: Also, here's the data :) https://github.com/pydata/pandas/issues/3146 On Fri, Apr 26, 2013 at 7:55 PM, Stephen Lin wrote: > The compiler won't use the intrinsics on its own, unfortunately, since > it doesn't know about the alignment and buffer size guarantees. > > I will be fixing this in llvm/clang soon enough :) > > Stephen > > On Fri, Apr 26, 2013 at 7:46 PM, yoval p. wrote: >> Conventional wisdom says that these days trying to outdo >> optimizing compilers is a fool's errand. >> Do you have numbers showing that manually coding at the instruction >> level can do significantly better then the compiler? >> >> If so, you're probably a modern day >> [John_Henry](https://en.wikipedia.org/wiki/John_Henry_(folklore)), In >> which case try not to drop dead at the end of a long and agonizing >> python packaging nightmare, as we'd like to see more of your tricks in >> pandas in the future. >> >> Yoval >> >> On 04/27/2013 02:14 AM, Stephen Lin wrote: >>>> icc can automatically bloat the binary with parallel paths and choose at >>>> runtime like you describe IIRC. >>> >>> Also, yeah, I've heard that it does that, but I don't trust a compiler >>> to do this optimally :D how could it know where to place the check >>> optimally? You don't want to do redundant checks within tight loops >>> but you also don't want to create too many parallel code paths; >>> there's also ABI/linking issues if it duplicates entire functions... >>> >>> Stephen >>> >> >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> http://mail.python.org/mailman/listinfo/pandas-dev From jeffreback at gmail.com Tue Apr 30 22:42:21 2013 From: jeffreback at gmail.com (Jeff Reback) Date: Tue, 30 Apr 2013 16:42:21 -0400 Subject: [Pandas-dev] datetimes Message-ID: Currently we allow ONLY datetime64[ns] as an internal representation (and analgously timedelta64[ns] for timedeltas). There are several issues where things like this are done: a) Series([np.datetime(2013,1,1),np.datetime(2013,1,2)],dtype='M8[ms]') b) Series([datetime(2013,1,1),datetime(2013,1,2)],dtype='M8[D]') in a) the np.datetimes are by default [us], so we need to do a conversion to M8[ns], ok, can do that to keep the internal rep, but what about the dtype specified? is this effectively an astype, but then is this conceptually just a display thing, e.g. the user wants to view the data as [ms], rather than [ns] several options to think about: 1) ignore completely the passed dtype and do some conversions on np.datetime64 (which we already do) to guarantee a M8[ns] internally (we do this now, but bork on a passed dtype that is not M8[ns] when the data is M8) 2) keep the passed dtype (or the inferred dtype) internally, effectively making datetimes a suite of M8[ms,D,s,ns......] 3) keep data a M8[ns] internally and provide an asfreq which works kind of like the PeriodIndex method, which can provide a DateTimeIndex I guess of the requested frequency? but then I keep thinking, is there any actual difference between 20130101 15:00:01.12345 in [ms], or [ns] (right now no) Any thoughts....I know I am ramblings a bit, but confused over what is even necessary here... Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: