[IPython-dev] pyspark and IPython

Brian Granger ellisonbg at gmail.com
Thu Aug 29 17:58:45 EDT 2013


Sorry I wasn't clear in my question.  I am very aware of how amazing
Spark and Shark are.  I do think you are right that they are looking
very promising right now.  What I don't see is what IPython can offer
in working with them.  Given their architecture, I don't see how for
example you could run spark jobs from the IPython Notebook
interactively.  Is that the type of thing you are thinking about?  Or
are you more thinking about direct integration of spark and
IPython.parallel.  I am more wondering what the benefit of
IPython+Spark integration would be.  I know that Fernando and Min have
talked with some of the AMP lab people and I would love to see what
can be done.  I would probably be best to sit down and talk further
with the spark/shark devs at some point.  But if you can learn more
about their architecture and investigate the possibilities and report
back, that would be fantastic.

On Thu, Aug 29, 2013 at 2:41 PM, Nitin Borwankar <nborwankar at gmail.com> wrote:
> Hi Brian,
>
> The advantage IMHO is that pyspark and the larger UCB AMP effort are a huge
> open source effort for distributed parallel computing that improves upon the
> Hadoop model. Spark the underlying layer + Shark the Hive compatible query
> language adds performance gains of 10x - 100x.  The effort has 20+ companies
> contributing code including Yahoo and 70+ contributors. AMP has a 10M$ grant
> from NSF.  So
> a) it's not going away soon
> b) it may be hard to compete with it without that level of resources
> c) they do have a Python shell (have not used it yet) and they appear
> committed to have Python as a first class language in their effort.
> d) lets see if we can find ways to integrate with it.
>
> I think integration at the level of the interactive interface might make
> sense.
>
> Just my 2c but I think this effort may leapfrog pure Hadoop over the next
> 2-3 years.
>
>
> Nitin.
>
>
>
>
> ------------------------------------------------------------------
> Nitin Borwankar
> nborwankar at gmail.com
>
>
> On Thu, Aug 29, 2013 at 1:35 PM, Brian Granger <ellisonbg at gmail.com> wrote:
>>
>> >From a quick glance, it looks like both pyspark and IPython use
>> similar parallel computing models in terms of the process model.  You
>> might think that would help them to integrate, but in this case I
>> think it will get in the way of integration.  Without learning more
>> about the low-level details of their architecture it is really
>> difficult to know if it is possible or not.  But I think the bigger
>> question is what would the motivation for integration be?  Both
>> IPython and spark provide self-contained parallel computing
>> capabilties - what usage cases are there for using both at the same
>> time?  I think the biggest potential show stopper is that pyspark is
>> not designed in any way to be interactive as far as I can tell.
>> Pyspark jobs basically run in batch mode, which is going to make it
>> really tough to fit into IPython's interactive model.  Worth looking
>> more into though..
>>
>> Cheers,
>>
>> Brian
>>
>> On Thu, Aug 29, 2013 at 11:28 AM, Nitin Borwankar <nborwankar at gmail.com>
>> wrote:
>> > I'm at AmpCamp3 at UCB and see that there would be huge benefits to
>> > integrating pyspark with IPython and IPyNB.
>> >
>> > Questions:
>> >
>> > a) has this been attempted/done? if so pointers pl.
>> >
>> > b) does this overlap the IPyNB parallel computing effort in
>> > conflicting/competing ways?
>> >
>> > c) if this has not been done yet - does anyone have a sense of how much
>> > effort this might be? (I've done a small hack integrating postgres psql
>> > into
>> > ipynb so I'm not terrified by that level of deep digging, but are there
>> > any
>> > show stopper gotchas?)
>> >
>> > Thanks much,
>> >
>> > Nitin
>> > ------------------------------------------------------------------
>> > Nitin Borwankar
>> > nborwankar at gmail.com
>> >
>> > _______________________________________________
>> > IPython-dev mailing list
>> > IPython-dev at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/ipython-dev
>> >
>>
>>
>>
>> --
>> Brian E. Granger
>> Cal Poly State University, San Luis Obispo
>> bgranger at calpoly.edu and ellisonbg at gmail.com
>> _______________________________________________
>> IPython-dev mailing list
>> IPython-dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>
>
>
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>



-- 
Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger at calpoly.edu and ellisonbg at gmail.com



More information about the IPython-dev mailing list