X
15154 Rate this article:
5.0

Circular References - What are they and how can they be resolved?

Anonym

In this recent blog post, What the *bleep* is IDL doing? Garbage collection, Dain discussed IDL's garbage collection mechanism, which is performed when the reference count of an object or pointer (collectively referred to as heap variables) reaches zero. There are cases, however, where it seems like this doesn't happen. If this is the case, the code may contain a circular reference.

 

What is a circular reference?

A circular reference occurs when one heap variable contains a reference to a second heap variable, and the second one contains a reference back to the first. For instance, if A is an object, and somewhere in A, there is a reference to B, and within B is a reference back to A, there is a circular reference.

Here is a simple example:

p1 = Ptr_New(/ALLOCATE_HEAP)
p2 = Ptr_New(p1)
*p1 = p2
help, /HEAP


Heap Variables:
    # Pointer: 2
    # Object : 0

<PtrHeapVar1>  refcount=2
                POINTER   = <PtrHeapVar2>
<PtrHeapVar2>  refcount=2
                POINTER   = <PtrHeapVar1>

In this example, there are two references to each pointer. One reference is contained in the variable that I created (p1 and p2). The second reference to the first pointer is within the second pointer, and vice-versa. If I get rid of the references I am holding onto (by setting the variables to !NULL), IDL will reduce the refcount for each of these pointers by one. From my perspective, these pointers are gone. However, they still reference each other, and therefore IDL's refcount never reached zero, meaning that the pointers won't be garbage collected.

p1 = !null
p2 = !null
help, /HEAP


Heap Variables:
    # Pointer: 2
    # Object : 0

<PtrHeapVar1>  refcount=1
                POINTER   = <PtrHeapVar2>
<PtrHeapVar2>  refcount=1
                POINTER   = <PtrHeapVar1>

A common case when this occurs is with parent/child relationships. The parent keeps track of its children, and sometimes the child needs to know who its parent is.

Circular references can be triangular as well, or the loop can extend through many objects and pointers. Issues related to these more complex circular references can be difficult to debug.

Side note:

Although I no longer have a variable that references these pointers, I haven't lost them forever. As long as they are valid pointers and I know their heap identifiers, I can retrieve them using the PTR_VALID function with the /CAST keyword.

p1 = Ptr_Valid(1, /CAST)
help, p1


P1              POINTER   = <PtrHeapVar1>

 

Why are circular references a problem?

Circular references can be a problem for a number of reasons. The main reason is unnecessary memory usage. If the variables fell out of scope but the underlying pointers or objects aren't cleaned up, the memory is "leaked." Too much leakage, especially for large objects, slows down processing and can eventually cause IDL to hang. 

Additionally, if I call HELP, /HEAP as a form of debugging, I now have to sift through these "dead" heap variables before finding what I'm looking for.

 

How can circular references be resolved?

Manual Cleanup

If you're confident that you will never need a heap variable again, you can manage the memory by manually destroying it with OBJ_DESTROY or PTR_FREE. This is easier said than done, however. Destroying heap variables should be done with caution. Code that attempts to use a pointer or object that has been previously destroyed will halt with an error. Furthermore, in the pointer example above, freeing "p1" will also free the second pointer if I do not hold on to a reference to it. This is because the refcount for the second pointer reached zero and it was garbage collected. Implicit garbage collection often leads to unexpected results.

Side note: In the case of lists and hashes, implicit garbage collection is desired. If I have nested hashes, for instance from calling JSON_PARSE, and I manually destroy the root level hash, the hashes inside it will fall out of scope and be garbage collected. This saves me from needing to recursively cleanup every nested hash by hand.

Use Weak References

When I call p2 = Ptr_New(p1), my variable p2 is a strong reference to the pointer. Additionally, the pointer contains a strong reference to p1. IDL will increment the refcount for a heap variable for every strong reference there is to it. If I do not wish to directly reference the first pointer with the second, but the second one needs to be aware of the first, I can use a weak reference

A weak reference means that the heap identifier is used in place of the object or pointer reference. The heap identifier can be retrieved using the /GET_HEAP_IDENTIFIER keyword on OBJ_VALID or PTR_VALID, and, as mentioned above, the object/pointer can be retrieved from the identifier using the /CAST keyword.

Follow Strict Ownership

Sometimes following strict ownership rules can help prevent confusing reference circles. For example, whoever created an object can be held responsible for that object's lifecycle. A parent/child relationship is a good use-case of when ownership should be observed. The parent should contain a strong reference to all children, and it is a good idea for the parent to know if and when the children should be destroyed (i.e. if a child becomes irrelevant to the program after the parent is destroyed, then the parent should manually destroy the child within its ::Cleanup method).

The parent should own the child and not the other way around (although if you ask my two year old daughter, she might disagree with that statement!). Therefore, if the child needs information about the parent for any reason, it should use a weak reference and not a strong reference.

Disable Refcounting (use with caution!)

There are a few instances when you may want to turn off IDL's automatic garbage collection. You can do so by calling the HEAP_REFCOUNT function (this function will return the current refcount for a heap variable, which can be useful for debugging) and setting the /DISABLE keyword.

If you do not provide an argument to this function, garbage collection will be turned off globally.

I advise you to use this with caution because if garbage collection is turned off, then you as the programmer are fully responsible for the lifecycle of every object created within your program, including ones you may not immediately realize, such as with nested lists or hashes. The garbage can will get full very quickly if it isn't regularly emptied.