Efficient processing of large nuumeric data file
David Sanders
dpsanders at gmail.com
Sat Jan 19 10:36:29 EST 2008
On Jan 18, 11:15 am, David Sanders <dpsand... at gmail.com> wrote:
> Hi,
>
> I am processing large files of numerical data. Each line is either a
> single (positive) integer, or a pair of positive integers, where the
> second represents the number of times that the first number is
> repeated in the data -- this is to avoid generating huge raw files,
> since one particular number is often repeated in the data generation
> step.
>
> My question is how to process such files efficiently to obtain a
> frequency histogram of the data (how many times each number occurs in
> the data, taking into account the repetitions). My current code is as
> follows:
Many thanks to all for the very detailed and helpful replies. I'm
glad to see I was on the right track, but more happy to have learnt
some different approaches.
Thanks and best wishes,
David.
More information about the Python-list
mailing list