Efficient processing of large nuumeric data file

Fri Jan 18 17:43:17 EST 2008

On Fri, 18 Jan 2008 09:58:57 -0800, Paul Rubin wrote:

> David Sanders <dpsanders at gmail.com> writes:
>> The data files are large (~100 million lines), and this code takes a
>> long time to run (compared to just doing wc -l, for example).
> 
> wc is written in carefully optimized C and will almost certainly run
> faster than any python program.

However, wc -l doesn't do the same thing as what the Original Poster is 
trying to do. There is little comparison between counting the number of 
lines and building a histogram, except that both tasks have to see each 
line. Naturally the second task will take longer compared to wc.

("Why does it take so long to make a three-tier wedding cake? I can boil 
an egg in three minutes!!!")

-- 
Steven