[SciPy-User] Pylab - standard packages

Fri Sep 21 16:57:11 EDT 2012

On Fri, Sep 21, 2012 at 4:38 PM, Fernando Perez <fperez.net at gmail.com> wrote:
> Warning: what follows is a highly opinionated, completely biased post.
>  I'll be using a 'we' that refers to the IPython developers because
> the credit for much of what I talk about goes to the whole team, but
> ultimately the rant is my responsibility, so flame me if need be.
>
> self.put_hat(kind='IPython').
>
> I think it's important to address directly the question of the IPython
> notebook.  I realize that not everybody uses it, and it has some extra
> dependencies (though they are really easy ones to satisfy).  But I
> also think it's an important discussion that goes to the question of
> whether we simply are trying to play catch-up what matlab/R-Rstudio
> offer, or to be truly forward-looking and rethink how scientific
> computing will be done for the coming decade.  Needless to say, I have
> little interest in the former and am putting all my energy into the
> latter: if it were otherwise, I'd been contributing to Octave for the
> last 10 years instead.
>
> My argument, in short: we should consider *some* notebook-type tool as
> a first-class citizen of this effort, for the simple reason that such
> an approach is one whose time has come.  A notebook environment is the
> only tool that truly tackles in an integrated manner the problem that
> we've been referring to as the 'lifecycle of a scientific idea'
> (https://speakerdeck.com/u/fperez/p/ipython-tools-for-the-lifecycle-of-research-computing?slide=3).
>
> Context: all disciplines are becoming intensely computational, the
> need for real-time collaboration on live computational analysis is
> great, the pressures for moving towards truly reusable, reproducible
> work are coming from multiple angles (major journals, funding
> agencies, ...), we need a much smoother transition between analysis
> codes and publications, and we need better ways to share our analysis
> work over the internet, for education and for archival purposes.
> Having a good IDE is a really important point, and my hat is off to
> the stellar work the Spyder team has done (and coincidentally, another
> Colombian physicist, Carlos Córdoba, is leading the charge on the
> spyder/ipython integration work) .  But to be blunt, a matlab-style
> IDE does not tackle the important questions above in any meaningful
> way.
>
> In the last decade's worth of the pylab world (using our new moniker
> in its intended fashion), we've certainly taken inspiration from the
> major systems out there, but it has always been that: *inspiration*,
> never simple copying:
>
> - John Hunter's brilliance with matplotlib was not so much to copy the
> high-level API and look/feel of plot windows to ease the transition
> from matlab.  It was to rethink the question of what a plotting
> library should be, abstracting over GUI toolkits and an elegant OO
> architecture underneath the familiar scripting interface.
>
> - Numpy's arrays are similar to matlab/fortran ones, obviously, but
> when used with the full power of slicing, fancy indexing and
> structured dtypes, they make matlab's look like the 1970's relic they
> are.  Jim Hugunin, Perry and Travis led the way to build something
> that has no match.
>
> - The one-man army that is Wes McKinney had R's DataFrame squarely in
> his sights when he built pandas, but he went far, far beyond the basic
> ideas in R to provide one of the most powerful packages we've seen in
> recent memory.
>
> - etc... you get my point.
>
>
> Now, as I said above, the scientific computing world is changing, and
> more importantly, a lot of things in the broader scientific world are
> also undergoing very drastic changes: the push for open access, data
> sharing and reproducibility of results is likely to make a lot of
> things look very different in 10 years than they do now.  We can argue
> that the whole online education wave of Coursera/Udacity/EdX is a bit
> of a bubble, but there's no denying the internet will play a role in
> how scientists are trained both in and out of traditional academia.
>
> I argue that, after having spent the last decade building up the pylab
> foundations to be competitive with the 'big boys', we are uniquely
> well positioned to stop following and actually lead on many of these
> problems.  And for that, my contention is that it is absolutely
> necessary to have:
>
> - A tool that bridges the gaps between exploratory work,
> collaboration, production, publication and education.
>
> - An open format for sharing, publishing and archiving executable
> computational work.
>
> - A system that is accessible through the browser, so that computation
> can be located where the data is, since we can't move the data to the
> desktop anymore.  Remote collaboration also is most sensibly tackled
> via a browser, as google docs has amply demonstrated.
>
>
> Up until now I have *not* said that we should use the *IPython*
> notebook.  Our efforts on this front are, I am sure, full of
> limitations and imperfections.  But if we're not going to tackle the
> problems above, I would like it to be with an explicit decision on
> whether it is because:
>
> 1. this community only wants to stick to a traditional
> shell+editor/IDE approach.
>
> 2. the IPython solution is the wrong one, it has technical flaws, etc.
>
> If it's #1, I think it would be a huge, huge mistake and one of lack
> of foresight, ambition and vision.  If that's the decision, I'm sure
> that we in the IPython team will simply continue fighting for that
> vision on our own, as we are pretty convinced it's the right thing to
> do.  And evidence is mounting that others think the same too:
>
> - Michigan State University is teaching *two* courses on advanced
> genomics that are heavily notebook based:
> http://ged.msu.edu/angus/beacon-2012/index.html,
> https://github.com/ngs-docs/ngs-notebooks.
>
> - At Berkeley we have (but this is not driven by me) both an intensive
> bootcamp and a semester-long course on scientific python with the
> same:
> https://github.com/profjsb/python-bootcamp,
> https://github.com/profjsb/python-seminar.
>
> - We can now blog straight off the notebook
> (http://blog.fperez.org/2012/09/blogging-with-ipython-notebook.html),
> and Jose Unpingco is effectively writing a full book on signal
> processing as a series of blog posts that are notebooks:
> http://python-for-signal-processing.blogspot.com.
>
> - there's more, just google it.
>
> Now, if the reluctance is to go with the *IPython* notebook, then I'd
> like to know what the alternative is.  We have effectively put 10
> years of work into this problem, and the current implementation is the
> third or fourth attempt
> (http://blog.fperez.org/2012/01/ipython-notebook-historical.html).  We
> know it's by no means perfect, but honestly I think it would be a lot
> more sensible to fix whatever our limitations are than to start yet
> once more from scratch.  So by all means beat on the format, work with
> us to improve it so it meets your needs, let us know what's wrong with
> it or help us improve the tooling around it (ipython itself, the
> nbconvert tools, the nbviewer.ipython.org site, etc...).  But to be
> blunt, please don't think that ignoring 10 years of work on this
> problem is the right approach.
>
>
> In summary, I think that sticking to a shell+editor/IDE view of the
> problem would be missing a huge opportunity to play a key role in
> shaping the next decade's worth of scientific computing. And by the
> way, it's not like the others are standing still here:
>
> -  Wolfram is busy at work promoting a closed, highly proprietary idea
> (http://www.wolfram.com/cdf-player).
>
> - Matlab is building a solution around Microsoft Word:
> http://www.mathworks.com/help/matlab/matlab_prog/create-a-matlab-notebook-with-microsoft-word.html.
>  They have a huge market share and resources, so they can and will
> push pretty deep with this.
>
> - The R community has rapidly banded behind knitr (http://yihui.name/knitr).
>
>
> If the pylab community decides to not tackle this problem (and
> opportunity!) head-on, at least from IPython we will continue.  I
> currently have 5 grants in the pipeline all of which would provide, if
> funded, some measure of support for this kind of work.  We all know
> funding is a crap shoot, but even if only some of them go through we
> should have a decent amount of resources not only for our (this
> includes Brian, who's also involved with several) own time but also
> for students, postdocs and developers, to tackle this.  And I simply
> view it as too important not to continue fighting in this direction.
>
>
> Now, after all this rant, I want to make clear that I'm *not* saying
> that we should stop talking about the simple shell or that everyone
> should switch to *only* using notebooks.  One important property of
> the IPython notebooks is that it is very easy to generate a  pure .py
> script out of any notebook, any time (and we know how to improve those
> conversion facilities quite a bit).  So even if a project decides to
> ship all of its examples as notebooks, it's trivial to ensure that
> they are also accessible in pure script form to be run from the
> command line or loaded into spyder/IDLE/etc as well as converted to
> clean html in the sphinx-built documentation.
>
> Furthermore, the notebook is not the tool for building large-scale
> library code, so there will always be a place for
> emacs/vim/textmate/spyder, where the focus is more on the
> 'development' than the interactive exploration/analysis.
>
> But having notebooks in the projects, once we also build tools for
> cross-project help indexing, will let us provide users with powerful
> help that can search for a term across all the installed
> pylab-compliant tools and will give one-click access to live,
> executable examples they can modify immediately.  Mathematica has had
> this for over a decade and it is absolutely extraordinary.  The same
> tools can also index the pure .py versions, of course, but after 5
> years of not having a Mathematica license, I still miss this every
> time I have to trawl multiple online galleries looking for something
> in the pylab world.
>
>
> OK, I doubt anyone is reading by now, so I'll stop here...  Flame away.

No argument from me. I just spend a day getting notebooks into the
statsmodels documentation and trying to improve our html repr
rendering.

notebooks are a great way of getting rendered and commented code and
all of the many previous attempts where half baked.

That's for the teaching side, I don't know much about the future of
collaborative, parallel, cloud, ... interpreters.  (Anaconda seems to
have it built in.)

(even if my development environment is spyder and eclipse.)

Josef

>
> f
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user