[SciPy-user] Incremental histogram?

Robert Kern robert.kern at gmail.com
Tue Mar 11 20:22:04 EDT 2008


On Tue, Mar 11, 2008 at 7:10 PM, Roger Herikstad
<roger.herikstad at gmail.com> wrote:
> Hi,
>   Thanks, I'll definitely look into the arrayiterator. The array is
>  constructed algorithmically, and is actually a pairwise difference
>  between data points belonging to different clusters. I need to
>  histogram these differences to look for points close than a certain
>  threshold, and also look at the distribution of the differences. Each
>  cluster can contain as many as a few hundred thousand points, and
>  since the data points are long ints, I quickly run out of memory. What
>  I was thinking of was to use an iterator that will allow me to iterate
>  over chunks of an iterator, doing a histogram on each chunk
>  separately. However, I couldn't find any such iterator in the
>  itertools module.

It's easiest to do manually.

>  Maybe the arrayiterator does that?

No, it works particularly on arrays. Essentially, it generates slice
indices for each of the chunks and yields the slices of the base
array. Thus, it works well on memory-mapped arrays. It won't work for
you.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco



More information about the SciPy-User mailing list