[Tutor] Processing CSV files

Alan Gauld alan.gauld at btinternet.com
Wed Oct 9 01:44:38 CEST 2013


On 09/10/13 00:26, Leena Gupta wrote:

> I do have an additional question related to Cassandra & Python. As part
> of data processing, I need to fetch slices of data from Cassandra and
> run computations like sum and percentile calculation on it.

Sorry, I've never even heard of Cassandra before

> So for calculating the sum & percentile in Python, some of the data
> slices on Cassandra could fetch a lot of rows (e.g.750,000 to 1mill
> rows) … And since I need to compute a sum and percentile, I need to
> consider all the rows.

But not all at the same time. You can create a running total and keep 
track of the count. Assuming the API supports an iterative read - but 
I've no experience there.

But if the rows are short even a million rows shouldn't be a big problem 
given your RAM. And assuming you are using 64bit of
course...

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.flickr.com/photos/alangauldphotos



More information about the Tutor mailing list