Confusing reference ownership: how to properly deallocate (via Py_DECREF) objects of an object?

Question

I was analysing the following code, which compiles and runs correctly, but generates a memory leak.

The cfiboheap is a C implementation of a Fibonacci Heap and the following code is a Cython wrapper (a part of it) for cfiboheap.

My doubts starts on the insert function. The object data has been created somewhere and passed to the function insert(). Since the function wants to add this object to the fiboheap it increases its reference count. But afterwards? To whom the ownership goes? In my understanding, the C function fh_insertkey() just borrows the ownership. Then it returns a proprietary pointer that needs to be incapsulated, and then returned by insert(). Cool. But my object data and its ref count? By creating the capsule I'm creating a new object, but I'm not decreasing the ref count of data. This produces the memory leak.

(Note that commenting out Py_INCREF or adding Py_DECREF before the return of insert() results in a segmentation fault.)

My questions are:

1) Why is it necessary to increment the ref count of data during the insert()?

2) Why is it not necessary to use a Py_DECREF during the extract()?

3) More generally, how can I exactly keep track of the reference ownership when jumping between C and Python?

4) How to properly deallocate an object like this FiboHeap? Should I use preventively a Py_XDECREF in __dealloc__() and, if yes, how?

Thanks!

cimport cfiboheap
from cpython.pycapsule cimport PyCapsule_New, PyCapsule_GetPointer
from python_ref cimport Py_INCREF, Py_DECREF 

cdef inline object convert_fibheap_el_to_pycapsule(cfiboheap.fibheap_el* element):
    return PyCapsule_New(element, NULL, NULL)

cdef class FiboHeap:

    def __cinit__(FiboHeap self):
        self.treeptr = cfiboheap.fh_makekeyheap()
        if self.treeptr is NULL:
            raise MemoryError()

    def __dealloc__(FiboHeap self):
        if self.treeptr is not NULL:
            cfiboheap.fh_deleteheap(self.treeptr)

    cpdef object insert(FiboHeap self, double key, object data=None):
        Py_INCREF(data)
        cdef cfiboheap.fibheap_el* retValue = cfiboheap.fh_insertkey(self.treeptr, key, <void*>data)
        if retValue is NULL:
            raise MemoryError()

        return convert_fibheap_el_to_pycapsule(retValue)

    cpdef object extract(FiboHeap self):
        cdef void* ret = cfiboheap.fh_extractmin(self.treeptr)
        if ret is NULL:
            raise IndexError("FiboHeap is empty")

        return <object> ret

    cpdef object decrease_key(FiboHeap self,  object element, double newKey):
        cdef void* ret = cfiboheap.fh_replacekey(self.treeptr, convert_pycapsule_to_fibheap_el(element), newKey)
        if ret is NULL:
            raise IndexError("New Key is Bigger")

        return <object> ret

Note that this hasn't been written by me, but I'm using this example to better understand obj referencing and to stop the leak (since I am actually using the code).

The main code that makes use of FiboHeap (and where the leak happens) looks like this:

cdef dijkstra(Graph G, int start_idx, int end_idx):

    cdef np.ndarray[object, ndim=1] fiboheap_nodes = np.empty([G.num_nodes], dtype=object) # holds all of our FiboHeap Nodes Pointers
    Q = FiboHeap()
    fiboheap_nodes[start_idx] = Q.insert(0, start_idx)
    # Then occasionally:
    Q.insert(...)
    Q.decrease_key(...)
    Q.extract()

    return

extract is not a peek, but a proper pop, so it is deleting the C element in the C fiboheap.

In conclusion: it seems clear that the ref count of data causes a memory leak, but why? And how to stop it?

The initial (but different) question about this memory leak can be found [here](http://stackoverflow.com/questions/38251216/how-to-deallocate-a-typed-numpy-array-is-setting-callback-free-data-a-viable-op). — Gioker, Jul 09 '16 at 19:48
Why are you making a capsule at all? It seems useless and unsafe. Also, does `extract` do a peek or a pop? — user2357112, Jul 09 '16 at 20:12
I'm posting here since you mentioned it in your initial question - I don't think I understand what `fiboheap` does well enough to really answer this one. It'd avoid trying to use `PyObject*`s and reference counting in Cython - it's very hard to get right. The question @user2357112 asked about what `extract` does is key here.... — DavidW, Jul 09 '16 at 20:56
The `extract` is a pop! So it destroys the element of the Heap. @user2357112 the code is not written by me, but I realized it was causing a memory leak while I was using it. I am both trying to stop the memory leak and understanding the object referencing better, since this is a kinda weird usage. I edited the question a bit. — Gioker, Jul 10 '16 at 06:45
@user2357112 Why there shouldn't be a capsule? The C function `cfiboheap.fh_insertkey` is returning a C pointer, which has to be passed to other Python functions afterwards, like `decrease_key(...)`. — Gioker, Jul 10 '16 at 07:22
@Gioker: It makes sense with `decrease_key`, but when I made that comment, `decrease_key` wasn't there. The capsule wasn't usable for anything in that version. — user2357112, Jul 10 '16 at 07:48
@user2357112 Yep, sorry, my bad! I didn't include `decrease_key` initially because I though it was out of scope in this question. — Gioker, Jul 10 '16 at 08:10

score 2 · Accepted Answer · answered Jul 10 '16 at 12:00

1) It is necessary to increase the reference count in insert because its reference count will be automatically decreased at the end of insert. Cython does not know you are storing the object for later. (You can inspect the generated C code to see the DECREF at the end of the function). If insert is called with an object of reference count 1 (i.e. .insert(SomeObject()), then the object would be destroyed at the end of insert without the INCREF

2) If the object is removed from the cfiboheap during extract then you should do a DECREF to acknowledge the fact that you no longer hold it. Cast it to object first (so you still hold a reference to it)

   cdef void* ret = cfiboheap.fh_extractmin(self.treeptr) # refcount is 1 here (from the INCREF when it was stored)
   if ret==NULL:
        # ...

   ret_obj = <object>ret
   # reference count should be 2 here - one for being on the heap and one for ret_obj. Casting to object increases the refcount in Cython
   Py_DECREF(ret_obj) # 1 here
   return ret_obj

3) Honestly you try not to use PyObject* if you can avoid it! It's much better to let Cython do the work. If you can't avoid it then you just need to ensure INCREF is called once when you store the object, and DECREF is called once when you stop storing it.

4) You do need to decref the remaining objects on the heap in __dealloc__. A very easy way to do that might be to all extract until the cfiboheap is empty:

try:
    while True:
        self.extract()
except IndexError:
    pass # ignore the error - we're done emptying the heap

A comment about the use of capsules: who owns the fibheap_el that they point to (and when does this get destructed)? If it gets destructed when the cfiboheap gets destructed then you have the issue of a capsule with an invalid pointer still being alive. Using this capsule somewhere might end up causing problems. If it doesn't get destructed by the cfiboheap then you potentially have another memory leak.

Answer to point 1) is still a bit unclear. The C function `insert` should only borrow the reference of the element, so why Cython should decrease the ref count? Instead, I think the problem is related to what explained [here](https://docs.python.org/3/extending/extending.html#thin-ice), that is the pointer to a valid element of the heap might get disposed during a subsequent extract, possible invalidating a subsequent insert in the same position. In this sense you are right when you say that Cython does not know that I'm storing the element for later. — Gioker, Aug 04 '16 at 10:44

Confusing reference ownership: how to properly deallocate (via Py_DECREF) objects of an object?

1 Answers1

Linked