Numpy and Terabyte data

Rustom Mody rustompmody at gmail.com
Tue Jan 2 23:07:45 EST 2018


On Wednesday, January 3, 2018 at 1:43:40 AM UTC+5:30, Paul  Moore wrote:
> On 2 January 2018 at 17:24, Rustom Mody wrote:
> > Someone who works in hadoop asked me:
> >
> > If our data is in terabytes can we do statistical (ie numpy pandas etc)
> > analysis on it?
> >
> > I said: No (I dont think so at least!) ie I expect numpy (pandas etc)
> > to not work if the data does not fit in memory
> >
> > Well sure *python* can handle (streams of) terabyte data I guess
> > *numpy* cannot
> >
> > Is there a more sophisticated answer?
> >
> > ["Terabyte" is a just a figure of speech for "too large for main memory"]
> 
> You might want to look at Dask (https://pypi.python.org/pypi/dask,
> docs at http://dask.pydata.org/en/latest/).

Thanks
Looks like what I was asking about



More information about the Python-list mailing list