Efficient processing of large nuumeric data file

David Sanders dpsanders at gmail.com
Sat Jan 19 10:36:29 EST 2008


On Jan 18, 11:15 am, David Sanders <dpsand... at gmail.com> wrote:
> Hi,
>
> I am processing large files of numerical data.  Each line is either a
> single (positive) integer, or a pair of positive integers, where the
> second represents the number of times that the first number is
> repeated in the data -- this is to avoid generating huge raw files,
> since one particular number is often repeated in the data generation
> step.
>
> My question is how to process such files efficiently to obtain a
> frequency histogram of the data (how many times each number occurs in
> the data, taking into account the repetitions).  My current code is as
> follows:

Many thanks to all for the very detailed and helpful replies.  I'm
glad to see I was on the right track, but more happy to have learnt
some different approaches.

Thanks and best wishes,
David.



More information about the Python-list mailing list