[SciPy-user] Incremental histogram?
Robert Kern
robert.kern at gmail.com
Tue Mar 11 13:47:40 EDT 2008
On Tue, Mar 11, 2008 at 4:02 AM, Roger Herikstad
<roger.herikstad at gmail.com> wrote:
> Hi list,
> I need to histogram an array of long ints, but the array itself is
> too big to keep in memory. I was thinking of using an incremental
> approach, i.e. assign each sample in the array to the appropriate bin,
> sample by sample. Right now, I have the array (well, list really)
> constructed as a generator, and I was wondering if anyone has an
> efficient algorithm for doing histogram count on such a generator
> object?
Where does this array usually live? Is it constructed algorithmically,
or is it on disk?
Anyways, I would batch up the elements into largish but
comfortably-sized arrays, use numpy.histogram() on each, and add
together the histograms. If the arrays live on disk in memory-mappable
form, I recommend Roberto De Almeida's arrayterator to do the batching
for you:
http://pypi.python.org/pypi/arrayterator/0.2.8
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
More information about the SciPy-User
mailing list