Numpy and Terabyte data

Rustom Mody rustompmody at gmail.com
Tue Jan 2 12:24:18 EST 2018


Someone who works in hadoop asked me:

If our data is in terabytes can we do statistical (ie numpy pandas etc)
analysis on it?

I said: No (I dont think so at least!) ie I expect numpy (pandas etc)
to not work if the data does not fit in memory

Well sure *python* can handle (streams of) terabyte data I guess
*numpy* cannot

Is there a more sophisticated answer?

["Terabyte" is a just a figure of speech for "too large for main memory"]




More information about the Python-list mailing list