Sunday, July 27, 2014

A Red-Black Tree - slightly unusual

The Red-Black Tree is not something I can implement without referring a standard textbook - it's implementation is just way too intricate. Even so, when you read about it, everything conceptually clicks, somehow, but you wonder how someone could have got around to actually inventing this beast !

The Red-Black Tree is the data structure of choice when a worst case O(log N) complexity is desired. As such AVL trees have preceded them and they make the same guarantees but it seems the AVL trees have lost out to the Red-Black Trees in recent implementations. The AVL tree seems to be too eager to re-balance the heights of the sub-trees resulting in larger constant factor in typical usages. Red-Black trees just don't care too much about imbalances, by only guaranteeing that any path to leaf contains a maximum of twice the number of nodes than the other paths from the same node.

So, we see that the Red-Black Tree is incorporated in some very commonly used C++ - STL structures (maps).

I have implemented the tree now, it's a standard implementation, but with a few twists on how it is used. See my GitHub source shared here...

The Red-Black tree in my implementation has the following features :

1. It has a C++ interface, but there are no pointers in the interface or internal code. There are only 4 byte unsigned integers serving as offsets into a linear structure, such as a vector or array.

2. Tree operations, even insertions, do not require copying an element or compound objects. As such the objects are maintained in a vector by the client code and the tree only maintains offsets into the client's vector. Goes without saying that the objects to be inserted need not have a default constructor either.

3. Comparison function (actually a class) is supplied by the client code and invoked by the tree. Nothing new here though ...

4. Removing elements from the tree is complemented by a free-list feature. The free-list holds a list of freed elements so that subsequent insertions can re-use memory from removed elements.

I hope to derive these advantages from these characteristics :

1. Interleaved insertions-deletions should behave better, with reduced need to allocate-release memory, maybe to the OS or to a shared memory pool.

2. Not using any pointers presents multiple benefits - the memory consumed by the tree will not change just by changing between a 32-bit and a 64-bit OS ! Since the 4 byte unsigned integral offsets all refer to memory within a single contiguously allocated array, the memory references should be more compact/localized. Cache hits should be better than what could have been if memory for nodes was allocated individually.

3. No copying of objects involved. Default constructor on elements to be inserted is not required. Only 4 byte offsets referring to the container are stored. Unfortunately, such references mean that offsets of existing elements cannot be changed, that might happen if the container's elements were to be deleted.

This is something in work too - I am planning to implement a companion container to this tree, that can support deletions too. Also to come are a few other data structures that are not commonly found in libraries...

No comments: