[SciPy-user] Incremental histogram?

Robert Kern robert.kern at gmail.com
Tue Mar 11 13:47:40 EDT 2008


On Tue, Mar 11, 2008 at 4:02 AM, Roger Herikstad
<roger.herikstad at gmail.com> wrote:
> Hi list,
>   I need to histogram an array of long ints, but the array itself is
>  too big to keep in memory. I was thinking of using an incremental
>  approach, i.e. assign each sample in the array to the appropriate bin,
>  sample by sample. Right now, I have the array (well, list really)
>  constructed as a generator, and I was wondering if anyone has an
>  efficient algorithm for doing histogram count on such a generator
>  object?

Where does this array usually live? Is it constructed algorithmically,
or is it on disk?

Anyways, I would batch up the elements into largish but
comfortably-sized arrays, use numpy.histogram() on each, and add
together the histograms. If the arrays live on disk in memory-mappable
form, I recommend Roberto De Almeida's arrayterator to do the batching
for you:

  http://pypi.python.org/pypi/arrayterator/0.2.8

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco



More information about the SciPy-User mailing list