[Cython] Memory leak when using Typed Memory View and np array of objects

Haijie Gu gu.haijie at gmail.com
Wed Oct 2 18:36:01 CEST 2013


Hi,

I'm new to cython's typed memory view, and found some cases where the
function that uses typed memory view has memory leaking.
The leak happens when I pass a numpy array of objects, where each object
itself is a numpy array.( You can get this 'weird' object from construction
a pandas Series with a list of numpy arrays).

Please see the following code snippet or use the attached code to reproduce
the case.
I appreciate any help and suggestions in advance! (I also posted on the
cython-users google group. Apologize for the redundancy.)


# BEGIN CONTENT OF test.pyx

# this does not leak
cpdef int do_nothing(arr):

    return 0


# this does leak

cpdef int do_nothing_typed(double[:] arr):

    return 0


# this does leak

cpdef int do_nothing_but_copy(arr):

    cdef double[:] _arr = arr

    return 0

# END CONTENT OF test.pyx

# BEGIN CONTENT OF runtest.py
          # ... omit all the imports here

def gc_obj_hist():

    """

    Returns a sorted map from type to the counts

    of in memory objects with the type

    """

    hst = defaultdict(lambda: 0)

    for v in gc.get_objects():

        hst[type(v)] += 1

    l = sorted(hst.iteritems(), key=operator.itemgetter(1), reverse=True)

    return l




           # NOT LEAK


def test1(n=10000):

    s = pd.Series([np.random.randn(10) for i in range(n)])

    for i in range(n):

        do_nothing(s[i])

    print "Top 5 object types after test 1: " + str(gc_obj_hist()[:5])



# LEAK

def test2(n=10000):

    s = pd.Series([np.random.randn(10) for i in range(n)])

    for i in range(n):

        do_nothing_typed(s[i])

    print "Top 5 object types after test 2: " + str(gc_obj_hist()[:5])



# LEAK


def test3(n=10000):

    s = pd.Series([np.random.randn(10) for i in range(n)])

    for i in range(n):

        do_nothing_but_copy(s[i])

    print "Top 5 object types after test 3: " + str(gc_obj_hist()[:5])



# NOT LEAK


def test4(n=10000):

    s = pd.Series([np.random.randn(10) for i in range(n)])

    for i in range(n):

        do_nothing_but_copy(np.array(s[i]))

    print "Top 5 object types after test 4: " + str(gc_obj_hist()[:5])



if __name__ == "__main__":

    n = 100000


    test1(n)


    test2(n)


    test3(n)


    test4(n)

# END CONTENT OF runtest.py


Thanks,
-jay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20131002/de68939d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: leaktest.tar.gz
Type: application/x-gzip
Size: 893 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20131002/de68939d/attachment-0001.bin>


More information about the cython-devel mailing list