Efficient processing of large nuumeric data file
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Fri Jan 18 17:43:17 EST 2008
On Fri, 18 Jan 2008 09:58:57 -0800, Paul Rubin wrote:
> David Sanders <dpsanders at gmail.com> writes:
>> The data files are large (~100 million lines), and this code takes a
>> long time to run (compared to just doing wc -l, for example).
>
> wc is written in carefully optimized C and will almost certainly run
> faster than any python program.
However, wc -l doesn't do the same thing as what the Original Poster is
trying to do. There is little comparison between counting the number of
lines and building a histogram, except that both tasks have to see each
line. Naturally the second task will take longer compared to wc.
("Why does it take so long to make a three-tier wedding cake? I can boil
an egg in three minutes!!!")
--
Steven
More information about the Python-list
mailing list