I was analysing the following code, which compiles and runs correctly, but generates a memory leak.
The cfiboheap is a C implementation of a Fibonacci Heap and the following code is a Cython wrapper (a part of it) for cfiboheap.
My doubts starts on the insert function. The object data has been created somewhere and passed to the function insert(). Since the function wants to add this object to the fiboheap it increases its reference count. But afterwards? To whom the ownership goes? In my understanding, the C function fh_insertkey() just borrows the ownership. Then it returns a proprietary pointer that needs to be incapsulated, and then returned by insert(). Cool. But my object data and its ref count? By creating the capsule I'm creating a new object, but I'm not decreasing the ref count of data. This produces the memory leak.
(Note that commenting out Py_INCREF or adding Py_DECREF before the return of insert() results in a segmentation fault.)
My questions are:
1) Why is it necessary to increment the ref count of data during the insert()?
2) Why is it not necessary to use a Py_DECREF during the extract()?
3) More generally, how can I exactly keep track of the reference ownership when jumping between C and Python?
4) How to properly deallocate an object like this FiboHeap? Should I use preventively a Py_XDECREF in __dealloc__() and, if yes, how?
Thanks!
cimport cfiboheap
from cpython.pycapsule cimport PyCapsule_New, PyCapsule_GetPointer
from python_ref cimport Py_INCREF, Py_DECREF
cdef inline object convert_fibheap_el_to_pycapsule(cfiboheap.fibheap_el* element):
return PyCapsule_New(element, NULL, NULL)
cdef class FiboHeap:
def __cinit__(FiboHeap self):
self.treeptr = cfiboheap.fh_makekeyheap()
if self.treeptr is NULL:
raise MemoryError()
def __dealloc__(FiboHeap self):
if self.treeptr is not NULL:
cfiboheap.fh_deleteheap(self.treeptr)
cpdef object insert(FiboHeap self, double key, object data=None):
Py_INCREF(data)
cdef cfiboheap.fibheap_el* retValue = cfiboheap.fh_insertkey(self.treeptr, key, <void*>data)
if retValue is NULL:
raise MemoryError()
return convert_fibheap_el_to_pycapsule(retValue)
cpdef object extract(FiboHeap self):
cdef void* ret = cfiboheap.fh_extractmin(self.treeptr)
if ret is NULL:
raise IndexError("FiboHeap is empty")
return <object> ret
cpdef object decrease_key(FiboHeap self, object element, double newKey):
cdef void* ret = cfiboheap.fh_replacekey(self.treeptr, convert_pycapsule_to_fibheap_el(element), newKey)
if ret is NULL:
raise IndexError("New Key is Bigger")
return <object> ret
Note that this hasn't been written by me, but I'm using this example to better understand obj referencing and to stop the leak (since I am actually using the code).
The main code that makes use of FiboHeap (and where the leak happens) looks like this:
cdef dijkstra(Graph G, int start_idx, int end_idx):
cdef np.ndarray[object, ndim=1] fiboheap_nodes = np.empty([G.num_nodes], dtype=object) # holds all of our FiboHeap Nodes Pointers
Q = FiboHeap()
fiboheap_nodes[start_idx] = Q.insert(0, start_idx)
# Then occasionally:
Q.insert(...)
Q.decrease_key(...)
Q.extract()
return
extract is not a peek, but a proper pop, so it is deleting the C element in the C fiboheap.
In conclusion: it seems clear that the ref count of data causes a memory leak, but why? And how to stop it?