Numpy and Terabyte data

jason at apkudo.com jason at apkudo.com
Tue Jan 2 13:06:02 EST 2018


I'm not sure if I'll be laughed at, but a statistical sampling of a randomized sample should resemble the whole.

If you need min/max then min ( min(each node) )
If you need average then you need sum( sum(each node)) sum(count(each node))*

*You'll likely need to use log here, as you'll probably overflow.

It doesn't really matter what numpy can nagle you just need to collate the data properly, defer the actual calculation until the node calculations are complete. 

Also, numpy should store values more densely than python itself.





More information about the Python-list mailing list