[Numpy-discussion] efficient 3d histogram creation

josef.pktd at gmail.com josef.pktd at gmail.com
Wed May 6 20:21:45 EDT 2009


On Wed, May 6, 2009 at 7:39 PM, Chris Colbert <sccolbert at gmail.com> wrote:
> i just realized I don't need the line:
>
> cdef int z = img.shape(2)
>
> it's left over from tinkering. sorry. And i should probably convert the out
> array to type float to handle large data sets.
>
> Chris
>
> On Wed, May 6, 2009 at 7:30 PM, <josef.pktd at gmail.com> wrote:
>>
>> On Wed, May 6, 2009 at 6:06 PM, Chris Colbert <sccolbert at gmail.com> wrote:
>> > I decided to hold myself over until being able to take a hard look at
>> > the
>> > numpy histogramdd code:
>> >
>> > Here is a quick thing a put together in cython. It's a 40x speedup over
>> > histogramdd on Vista 32 using the minGW32 compiler. For a (480, 630, 3)
>> > array, this executed in 0.005 seconds on my machine.
>> >
>> > This only works for arrays with uint8 data types having dimensions (x,
>> > y, 3)
>> > (common image format). The return array is a (16, 16, 16) equal width
>> > bin
>> > histogram of the input.
>> >
>> > If anyone wants the cython C-output, let me know and I will email it to
>> > you.
>> >
>> > If there is interest, I will extend this for different size bins and
>> > aliases
>> > for different data types.
>> >
>> > Chris
>> >
>> > import numpy as np
>> >
>> > cimport numpy as np
>> >
>> > DTYPE = np.uint8
>> > DTYPE32 = np.int
>> >
>> > ctypedef np.uint8_t DTYPE_t
>> > ctypedef np.int_t DTYPE_t32
>> >
>> > def hist3d(np.ndarray[DTYPE_t, ndim=3] img):
>> >     cdef int x = img.shape[0]
>> >     cdef int y = img.shape[1]
>> >     cdef int z = img.shape[2]
>> >     cdef int addx
>> >     cdef int addy
>> >     cdef int addz
>> >     cdef np.ndarray[DTYPE_t32, ndim=3] out = np.zeros([16, 16, 16],
>> > dtype=DTYPE32)
>> >     cdef int i, j, v0, v1, v2
>> >
>> >
>> >     for i in range(x):
>> >         for j in range(y):
>> >             v0 = img[i, j, 0]
>> >             v1 = img[i, j, 1]
>> >             v2 = img[i, j, 2]
>> >             addx = (v0 - (v0 % 16)) / 16
>> >             addy = (v1 - (v1 % 16)) / 16
>> >             addz = (v2 - (v2 % 16)) / 16
>> >             out[addx, addy, addz] += 1
>> >
>> >     return out
>> >
>>
>> Thanks for the example for using cython. Once I figure out what the
>> types are, cython will look very convenient for loops, and pyximport
>> takes care of the compiler.
>>
>> Josef
>>
>> import pyximport; pyximport.install()
>> import hist_rgb    #name of .pyx files
>>
>> import numpy as np
>> factors = np.random.randint(256,size=(480, 630, 3))
>> h = hist_rgb.hist3d(factors.astype(np.uint8))
>> print h[:,:,0]


playing some more with cython: here is a baby on the fly code generator
input type int, output type float64

a dispatch function by type is missing

no segfaults, even though most of the time a call the function with
the wrong type.

Josef

code = '''
import numpy as np
cimport numpy as np

__all__ = ["hist3d"]
DTYPE = ${imgtype}
DTYPE32 = ${outtype}

ctypedef ${imgtype}_t DTYPE_t
ctypedef ${outtype}_t DTYPE_t32

def hist3d(np.ndarray[DTYPE_t, ndim=3] img):
    cdef int x = img.shape[0]
    cdef int y = img.shape[1]
    #cdef int z = img.shape[2]
    cdef int addx
    cdef int addy
    cdef int addz
    cdef np.ndarray[DTYPE_t32, ndim=3] out = np.zeros([16, 16, 16],
dtype=DTYPE32)
    cdef int i, j, v0, v1, v2


    for i in range(x):
        for j in range(y):
            v0 = img[i, j, 0]
            v1 = img[i, j, 1]
            v2 = img[i, j, 2]
            addx = (v0 - (v0 % 16)) / 16
            addy = (v1 - (v1 % 16)) / 16
            addz = (v2 - (v2 % 16)) / 16
            out[addx, addy, addz] += 1

    return out
'''

from string import Template
s = Template(code)
src = s.substitute({'imgtype': 'np.int', 'outtype': 'np.float64'})
open('histrgbintfl2.pyx','w').write(src)

import pyximport; pyximport.install()
import histrgbintfl2

import numpy as np
factors = np.random.randint(256,size=(480, 630, 3))
h = histrgbintfl2.hist3d(factors)#.astype(np.uint8))
print h[:,:,0]



More information about the NumPy-Discussion mailing list