Sunday, June 4, 2017

More code optimization notes

Continuing from my earlier post about code optimization techniques here.

This is all about making the code more data oriented. Which means that the code is written such that its data is organized intelligently in memory, considering CPU cache access patterns and data sizes.

As it is, extracting benefits out of any of these techniques is a black art. Some may back-fire in specific circumstances. So, benchmark... be absolutely sure to measure(twice) before cutting(once).
Time to consider a few more ways to make code run faster...

  1. Inlining functions: Yes, nothing too special about this but worth noting. Take care about its impact on size of the executables.
  2. Mark functions as constexpr: The newer C++ standards make it simple to go further than inlines. If you tell the compiler that the function can compute statically for all its inputs, make it constexpr so that the computations finish at compilation stage itself. Of course, the inputs that the callers pass to such a function should also be known fully at compile time. Inlining the function produces no guarantee of actual inlining by the compiler but if we do make the function as constexpr, we are sure that there are absolutely no runtime costs of this function at all.
  3. Provide specialized versions of a function per data type. This is not just about your function taking a polymorphic base type as an argument. C++ allows overloaded function names. It is sometimes possible to write a function slightly differently for a particular data type so that it is more optimal. Perhaps it saves you a remote call or two. But at a very minimum, if it helps runtime conditional checks, it could be worth doing.
  4. Unrolling loops: Doing more actions in one iteration of a loop.
    for(int i = 0; i < 4*x; ++i){
                // process(arr[i])
    changes to...
    for(int i = 0; i < x; i += 4){
                   process(arr[i + 1]);
                   process(arr[i + 2]);
                   process(arr[i + 3]);
    which just saves some book-keeping conditional checks and increments in the for-loop that were happening every iteration.
  5. Pass values for anti-aliasing: Sometimes you have got to assure the compiler that some value is certainly not going to change for every iteration of some big loop. Then the compiler will not add a fresh load for that variable.
    void func(int arr[N], int* p){ // 'p' passed as pointer. Not sure if it is actually aliasing some arr[i].
                   for(int i = 0; i < N; ++i){
                       // also uses 'p' 
                      // load 'p' again to ensure it is latest.
    Here, we could pass p by value, that is good enough for the compiler.
  6. Avoid guessing for branch predictions: Sometimes, calculating all conditional paths in a piece of code and then deciding at the very end what to return could help save from a bad branch prediction cost.
  7. Process large arrays in parallel. Batch them and take care to avoid false sharing, i.e - threads that still access the same data and then block on mutual access. Make sure each thread gets a full cache line.
  8. If you process something in parallel as in the previous point, you might have to reduce at the end to get a combined result. Reduce pair-wise instead of making all threads block on a single shared data.
  9. Unroll loops based on latency of inner operations and order the operations so that maximum registers/cache is used at each point within the iteration.
  10. Separate fast and slow code. Remove exception handling out of tight code.

That's it for now.
Hoping to be mindful of such finer points.

Thursday, January 21, 2016

Find the permutation !

I first thought that solving this problem would require much longer piece of code ! Urge you to have a go at this first.

The first thought is to see if each inversion sequence has a one-to-one correspondence with the actual permutation. Given a permutation, its inversion sequence is obviously unique. Other way, if an inversion sequence has two different permutations, then we can start verifying numbers and their positions in ascending order as follows - Suppose x is the lowest number whose position does not match in the two permutations but their inversion sequence is the same. But the higher of the inversion values is not consistent since it must be made of the numbers lower than itself. So the permutation is unique too.

The brutest of brute force ideas for this problem could be to just generate all permutations of the input length and see if the criteria is met.

So the clear next approach is to try generating the permutation by considering the positions in the reverse order. Since the inversion sequence (input) only worries about numbers lesser than the current appearing to the right, we use the reverse ordering while writing out the output permutation. So for the permutation of numbers from 1-8, we should be able to write out the answer by deciding the position of '8' first then descending towards '1'.

At each step, we keep track of how many numbers lie to the right of the current position. And here is the core problem - For a permutation of size 'N' numbers ('N' in 10^^5 range), how to maintain efficiently the relative position-from-the-right for each number while some positions keep getting filled iteratively ? O(N) numbers and O(N) per number meaning overall O(N^^2) is not an option with that large 'N'.

The next option is maintaining an augmented balanced R-B Tree wherein each node keeps the count of nodes under it. Too long and complex to code...

Another data structure that comes to mind is a Binary Indexed Tree - BIT . This will help fill up a position and then update the relative positions appropriately in O(N.log N) time. This would require referring to some known implementations.

Somehow I felt that this should require something less than a BIT in full generality. Basically I got this -
1. Start with an array of size 'N' filled with values equal to the initial relative position from the right. So the values are in descending order from '(N - 1)' towards '1'.
2. Going iteratively in reverse order as thought earlier, do a modified binary search for the right position.
3. During the binary search, whenever going right, mark the current node to say that its relative position will reduce by '1' since we are on the way to fill up a position towards it's right.
4. When traversing to the left, increment the position that is sought since we know that those many positions to the right have been occupied !
5. When the correct position is found (sought position matches the calculated position), mark that position as deleted but nothing to delete from the array or shift values.

This takes O(log N) time for each of the number being placed with an overall O(N.log N)...

Here is the "accepted" code I got,


class Position{
        int pos;
        int change;
        bool deleted;
        Position():change(0), deleted(false){

int bin_search(std::vector& vecpos, int seek, int left, int right){
        if(right <= left){
                vecpos[left].deleted = true;
                return left;
        int mid = left + (right - left) / 2;
        int curr = vecpos[mid].pos - vecpos[mid].change;
        if(vecpos[mid].deleted == false && seek == curr){
                vecpos[mid].deleted = true;
                return mid;
        if(seek > curr){
                return bin_search(vecpos, seek + vecpos[mid].change, left, mid - 1);
                return bin_search(vecpos, seek, mid + 1, right);

int main(){
int N;
std::cin >> N;

std::vector inv(N);
std::vector vecpos(N);
std::vector ans(N);
for(int i = 0; i < N; ++i){
        std::cin >> inv[i];
        vecpos[i].pos = N - i - 1;

for(int i = N - 1; i >= 0; --i){
        int pos = bin_search(vecpos, inv[i], 0, N - 1);
        ans[pos] = i + 1;

for(int i = 0; i < N; ++i){
        std::cout << ans[i];
        if(i < N - 1)
                std::cout << " ";
std::cout << "\n";
return 0;

This works because for any 'leaf' node position, we always use the same binary search traversal !

It's always interesting to see patterns linking linear structures like arrays with hierarchical ones like trees. The connection almost always requires something novel.

Sunday, December 13, 2015

Comments on : Functional programming in java script

I highly recommend a wonderful set of interactive exercises here !

It starts with set of exercises for functional programming but later on takes an unexpected turn and goes on to demonstrate features of Microsoft's reactive extensions library for java script. Nevertheless, I jumped into it initially being curious about functional programming techniques common to java script.

Starting with functional programming, there are at least 25 exercises that show how defining a small set of simple abstractions can lead to short and powerful programs. I could work out defining and using the five primary functions (map, reduce, filter, zip and concatAll).

Totally convincing exposition of the power of functional programming - the way that truly complex business logic can be described by surprisingly few lines of code involving just intelligent application of those 5 primary functions ! Basically this was the most 'fun' part of this tutorial.

Later on, not knowing how asynchronous coding and the functional programming techniques are related, the reactive extensions library is introduced. It was striking to learn and find similarities between traversing arrays (using the functional techniques) with processing events (using special objects called 'Observables'). Observables are supposed to be (possibly infinite length) sequences of events. The sequences relate to arrays and thus we can use functional programming primitives to process them. Being observables, though, allows the object and underlying sequence to change over time. Whenever the observable changes, the listeners can run code to process the event.

So the crux of the tutorial is to show how the same kind of programming techniques can help process any kind of data collections or events. It is a unified way of dealing with synchronous data like arrays as well as asynchronous objects.

I am quite used to multi-threaded programming with C++ but not exactly with writing asynchronous java script code that is commonly involved with event handling. So even though I have not really faced the difficulties in handling asynchronous actions in code first hand, I can still appreciate that asynchronicity is bound to present considerable complexity in writing efficient and correct code. The tutorial also comes with examples that show how complex the code can get once certain asynchronous calls are made in parallel and we usually define success and failure callbacks on each request. If it is required that some activity happens once *all* parallel tasks complete then then we have to check and track each request status and perform accordingly.

I still found the narrative promising - that the unified programming model is very helpful in managing that complexity. Functional programming strikes one as hard, sometimes I wondered why one should bother about functional stuff when the equivalent procedural code was so easy and a mechanical activity to write (even if it was 20 times the size). One has to get over a significant learning curve to understand why.

But that is how programming has evolved it seems over the years, in pursuing procedural code for a lower learning curve, we now realize that some things cannot be too simple.

Monday, May 11, 2015

How hard is web development ?

I have trained and worked for most of my career as a back-end engineer and I'm used to struggling with C++, algorithms and distributed computing. Recently, I have taken up some assignments in web development as part of my work. So how does this front-end development with the ubiquitous web technologies feel like and how hard can it be to do it well ?

The web technologies (javascript, HTML, CSS, HTTP and their ilk) are quite inviting to new-comers. There is an attempt to simplify creating your first web page and showing it in a browser. With just a couple of days in learning, you can achieve surprisingly significant work. Very unlike C++/algorithms/multi-threading, they absolutely have no such pretensions.

But it is a costly mistake to go any further (than developing an HTML "Hello World") without a sound understanding of the concepts. There is a vast technology stack that helps the web do what it can do and it should be systematically learnt.

Javascript, the currency of today's web development, is said to be the worlds most widely misunderstood and abused language. It works like a charm in small doses but any piece of code more than 2K lines can become a maintenance nightmare with numerous security vulnerabilities, unless developed correctly.

Web Security is a large topic by itself. A lot of serious vulnerabilities have surfaced due to a rush to introduce new functionality with security only coming in as an afterthought or post damages. Why does HTTP even exist when the technical chops for using HTTPS was already available ?

It's not just depth but also breadth of the technologies that complicate matters. We know that anything that looks particularly impressive and made using CSS wizardry, can break badly on some other browser. There is a whole matrix on browser compatibility for even HTML support.

Anyway, here is what I had to do as a first assignment : find a way to download/upload files from/to a web-server. I struggled to find a comprehensive resource that covered all the relevant issues involved here. I found myself looking at scattered bits of code all over StackOverflow that claimed to do what was required but each one had something missing. And very soon, I was reduced to testing and searching for what just "worked" instead of knowing what should work and why. Backward compatibility was found broken by Ext JS, a client-side javascript framework in terms of whether (and how) it allowed setting HTTP headers for form submits. Server code requirements were completely unspecified. Finally I got the following code working, notice how the code for downloads looks completely different from that for uploads. It uses Ext JS as a client framework so some specific API is used. The server is assumed to have a couple of REST APIs that manage the persistence of the files being transacted on, checks the CSRF token and does certain useful validations on allowable file sizes and file types.

Downloads :

  alias: 'formaction.standardsubmit', 
  doSubmit: function() {
   var formInfo = this.buildForm(); 
methodURL = 'SomeURL' + 'fetchAttachment'; // 'fetchAttachment is a particular REST API in my server.
methodParamsObject = { // this is an object required by the server code in my case
     user: Ext.encode({

// Insert your CSRF token if used to the methodParamsObject here ...

fileDownloadForm = Ext.create(
      standardSubmit: true, 
      url: methodURL, 
      method: 'POST'

fileDownloadForm.submit({params: methodParamsObject});

Uploads :

var file_content;
var new_form;
var file_name;

var fileChangeCallback = function (e) {
 var file = file_content.files[0];
 var reader = new FileReader();
 reader.onloadend = function (e) {
  alert("called CB loadend");
  if ( == FileReader.DONE) {
   var l_data = reader.result;
   l_data =;
  } else {
   alert("Failed to read file");
 alert("called CB");
 file_name.value =;

new_form = document.createElement("form"); = 'file_upload';
new_form.method = 'POST';
new_form.action = 'some URL' + 'addAttachment'; // Again, one REST API in my server implementation.
new_form.enctype = 'multipart/form-data'; // required for correct file handling on server.

var csrf_dummy = {

// Insert CSRF token here if used to the csrf_dummy object In my case, CSRF was being checked by server so passing a token string with a particular key 'CSRFTOKEN' was essential.

var csrf_token = document.createElement('input');
csrf_token.type = 'hidden'; = 'CSRFTOKEN';
csrf_token.value = csrf_dummy.CSRFTOKEN;


var user = document.createElement('input');
user.type = 'hidden'; = 'user';
user.value = "{'userId':'1059','roleId':'1058','roleName':'Administrator','userName':'admin'}";

file_content = document.createElement('input');
file_content.type = 'file'; = 'fileContents';
file_content.addEventListener('change', fileChangeCallback);


Sunday, August 31, 2014

Code optimization patterns - my quick notes

These patterns are based on what I read about here. I am still reproducing (dumping !) it here (with as high fidelity as I can manage), primarily because these patterns are what I do use in assorted situations and also that I want to keep in deep touch with all of them and keep applying them as much as I can. These are just my notes.

Some of these ideas are thrown up as directly conflicting with Object Orientation principles, by breaking encapsulation and trashing inheritance, in favor of composition. It really is a "black art" to know the appropriate situations where I need to apply them.

Here they are :

1/ Data locality -
a/ Converting a list of pointers to a list of offsets(4 byte integers) into contiguously allocated arrays. Iterating over an array of pointers is bad - it makes memory access to random locations - array of pointers "thrash the cache" (too bad, it rhymes !). Allocating array of objects in contiguous memory locations makes it likely that successive objects will be "found" in the cache.
b/ Hot-cold separation - Perhaps the object is large and the array of objects will not fit in the cache anyway. Perhaps for most iterations, only a few specific attributes of each object are processed. Splitting a larger object into smaller sets so that more frequently accessed parts are together. This might make the entire array of objects fit in the cache too.
c/ Reduced branch mis-prediction by
c1/ sorting a list of objects based on some criteria such that iteration only happens for a subset without enclosed 'if' condition
c2/ Creating separate lists of objects based on same criteria so that sorting an entire list is avoided
c3/ Just placing the iteration loop inside the 'if' condition, not other way - simple enough.
c4/ Not mixing different types of objects together in an array if they are to be iterated upon, even if they have same base class types and having separate arrays of each type.

2/ Dirty flag - I can think of it as the poor man's lazy evaluation. Basic dirty flagging is to have an additional boolean flag to tell if the object / cache is stale and requires refreshing. It helps avoid computation until it is required and only if it is required. Another flavor to the pattern is time-stamping or sequence numbering - We have a number or time-stamp that only increments without any danger of repeating and use it to mark the 'version' of the object. Whenever the sequence number is less than some other 'master', we update the cache.

3/ Object pooling - This is basically re-using allocations of some sort - either resources like database connections, threads or simple objects or even plain memory. We just want to avoid creating and destroying objects in memory when we know they really are required in future in some other contexts. There could be low and high watermarks set so that a minimum and a maximum limit on the number of objects in the pool is enforced, for not consuming too much memory and also being ready for multiple requests in burst. If we think of multi-threading scenarios, then each thread dynamically allocating memory from Operating systems is known to be slow, due to locking required for using the common heap. If each thread can use its own pool of memory, then no locking is required, re-use is achieved and thread-specific memory pool should not end up being fragmented.

4/ Spatial partitioning - Organize objects that are "close" to each other so that given an object, the "neighbors" can be efficiently located and processed. This is obviously helpful with real geometric spaces such as nearby particles being checked for collisions. However, the distance as a metric can be extended in other ways - Given a tree like structure, the distance between any two nodes may be expressed in terms of the number of common parents between them. This helps us partition the nodes better, so that queries like finding the number of child nodes for any given node may be made more efficient. This partitioning may be limited to nodes at same levels in the tree too. This pattern does present difficulties when the nodes can actually change distances to other nodes and we have to come up with efficient ways of updating the partitioning depending on the problem context.

Monday, August 11, 2014

USA Computing Olympiad Training Problem 3.3.3 (Camelot)

This is about an algorithmic coding problem that I saw on the USA Computing Olympiad training pages. There were several others that I solved earlier to reach this stage but this one is special ... named Camelot, in section 3.3.3. It seems to have been first posed in the IOI 1998.

Attempting to solve it, swept me through the full range of emotions, going from cocky over-confidence to being nastily surprised to bewilderment to despair, frustration and finally to immense relief of eventually getting rid of it by solving it ! There was absolutely no elation at all on solving it, just massive relief, that's all !

The statement goes like this :

Centuries ago, King Arthur and the Knights of the Round Table used to meet every year on New Year's Day to celebrate their fellowship. In remembrance of these events, we consider a board game for one player, on which one chess piece king and several knight pieces are placed on squares, no two knights on the same square.

During the play, the player can place more than one piece in the same square. The board squares are assumed big enough so that a piece is never an obstacle for any other piece to move freely.

The player's goal is to move the pieces so as to gather them all in the same square - in the minimal number of moves. To achieve this, he must move the pieces as prescribed above. Additionally, whenever the king and one or more knights are placed in the same square, the player may choose to move the king and one of the knights together from that point on, as a single knight, up to the final gathering point. Moving the knight together with the king counts as a single move.

Write a program to compute the minimum number of moves the player must perform to produce the gathering. The pieces can gather on any square, of course.


Line 1: Two space-separated integers: R,C, the number of rows and columns on the board. There will be no more than 26 columns and no more than 30 rows.
Line 2..end: The input file contains a sequence of space-separated letter/digit pairs, 1 or more per line. The first pair represents the board position of the king; subsequent pairs represent positions of knights. There might be 0 knights or the knights might fill the board. Rows are numbered starting at 1; columns are specified as upper case characters starting with `A'.

8 8
D 4
A 3 A 8
H 1 H 8

The king is positioned at D4. There are four knights, positioned at A3, A8, H1, and H8.

A single line with the number of moves to aggregate the pieces.
SAMPLE OUTPUT (file camelot.out)


They gather at B5. 
Knight 1: A3 - B5 (1 move) 
Knight 2: A8 - C7 - B5 (2 moves) 
Knight 3: H1 - G3 - F5 - D4 (picking up king) - B5 (4 moves) 
Knight 4: H8 - F7 - D6 - B5 (3 moves) 
1 + 2 + 4 + 3 = 10 moves.

You could assume that the King and Knight pieces move just like the standard chess moves. The board co-ordinate annotation is exactly the same too : The columns (A-Z) start go from left to right, while the rows (1 - 30) go from down to upwards.

Basically it requires full appreciation of breadth first search. To solve it, you need to use breadth first search in novel ways, tweaking it to suit the fact that the King and the Knight pieces can join/merge while moving and adjust shortest distances accordingly.

I struggled with the problem since initially I (wrongly) assumed that even two or more Knights could join just like a King and a Knight. That could make this problem even tougher. But as such, the join only happens between King and a Knight.

The input limits on the problem are very delicately adjusted to time-out the solutions that have even one excessive nested iteration. Suppose for 'R' rows, 'C' columns and 'P' pieces, the expected time complexity should be O(R*C*P) ... For the maximum input dimensions, it works out to be approximately 600K operations, should be comfortable for a time limit of 1 second. My initial solution was working OK, but had an extra iteration on number of pieces in the innermost loop, that was timing out my solution.

Finally, I got the code below, that does these things :
1. Records the distance to each square for the King.
2. For each square on the board, records the shortest path to every other square, using breadth first search, and saves all of it.
3. Tweaks breadth first search to record shortest path from each Knight to all other squares & assuming that this Knight picks up the King. This uses information stored in (1)
4. Iterates over the entire board, checking if the square can be the final gathering point. In the innermost loop, each Knight is examined if that Knight can pick up the King to reach the candidate gathering square in the cheapest way.

#include <fstream>
#include <vector>
#include <queue>
#include <math.h>

const int INF = 10000000;

int min(int a, int b){if(a < b) return a; return b;}

int max(int a, int b){if(a > b) return a; return b;}

int kr[8] = {-2, -1, 1, 2, 2, 1, -1, -2};
int kc[8] = {1, 2, 2, 1, -1, -2, -2, -1};

class co_ords{
 int r; int c;

class Knight_King{
 int for_knight;
 int for_king;
 int r;
 int c;

void init(Knight_King dists[30][26], int rows, int cols, int king_row, int king_col){
 for(int rr = 0; rr < rows; ++rr){
  for(int cc = 0; cc < cols; ++cc){
   dists[rr][cc].for_king = max(abs(rr - king_row), abs(cc - king_col));
   dists[rr][cc].for_knight = INF;

class Knight_Cost{
 Knight_King KK[30][26];

bool valid(int r, int c, int rows, int cols){
 if(r >= 0 && c >= 0 && r < rows && c < cols)
  return true;
 return false;
void reset(int arr[30][26], int rows, int cols){
 for(int x = 0; x < rows; ++x){
  for(int y = 0; y < cols; ++y){
   arr[x][y] = INF;
void knights_shortest(int dists[30][26], int rows, int cols, int r, int c, std::queue& q){
 dists[r][c] = 0;
 co_ords rc; rc.r = r; rc.c = c;
 int rr, cc;
  co_ords rc = q.front();
  for(int i = 0; i < 8; ++i){
   rr  = rc.r + kr[i];
   cc = rc.c + kc[i];
   if(valid(rr, cc, rows, cols) && dists[rr][cc] == INF){
    co_ords rrcc; rrcc.r = rr; rrcc.c = cc;
    dists[rr][cc] = min(dists[rr][cc], dists[rc.r][rc.c] + 1);

void knights_paths(Knight_King costs[30][26], int rows, int cols, int r, int c, std::queue& q){
 costs[r][c].for_knight = 0;

 Knight_King rc; rc.r = r; rc.c = c; rc.for_king = costs[r][c].for_king; rc.for_knight = costs[r][c].for_knight;
 int rr, cc;
  Knight_King rc = q.front();
  for(int i = 0; i < 8; ++i){
   rr  = rc.r + kr[i];
   cc = rc.c + kc[i];
   if(valid(rr, cc, rows, cols)){
    bool enq = ((min(rc.for_king, costs[rr][cc].for_king) + rc.for_knight + 1) < (costs[rr][cc].for_king + costs[rr][cc].for_knight));
     Knight_King rrcc; rrcc.r = rr; rrcc.c = cc; rrcc.for_king = min(rc.for_king, costs[rr][cc].for_king); rrcc.for_knight = rc.for_knight + 1;
     costs[rr][cc].for_king = min(rc.for_king, costs[rr][cc].for_king);
     costs[rr][cc].for_knight = rrcc.for_knight;

int get_sum(int dists[30][26], const std::vector& knights, int num_knights){
 int ret = 0;
 for(int i = 0; i < num_knights; ++i){
  ret += dists[knights[i].r][knights[i].c];
 return ret;

class square{
 int dists[30][26];
 square(int rows, int cols){
  for(int i = 0; i < rows; ++i){
   for(int j = 0; j < cols; ++j){
    dists[i][j] = INF;

int main(){
 int rows, cols;
 char col;
 std::vector knights;
 std::queue qkk;
 std::queue q;

 int row, num_knights = 0;
 int king_row, king_col;
 std::ifstream ifs("");
 ifs >> rows >> cols;

 ifs >> col >> row;
 king_col = (col - 'A');
 king_row = (rows - row);
 while(ifs >> col){
  ifs >> row;
  co_ords rc;
  rc.r = rows - row;
  rc.c = col - 'A';
 std::ofstream ofs("camelot.out");
 if(num_knights == 0){
  ofs << "0\n";
  return 0;

 std::vector board[30];
 for(int i = 0; i < 30; ++i){
  for(int j = 0; j < 26; ++j){
   square s(rows, cols);
   knights_shortest(board[i][j].dists, rows, cols, i, j, q);

 std::vector cost(num_knights);

 for(int x = 0; x < num_knights; ++x){
  init(cost[x].KK, rows, cols, king_row, king_col);
  knights_paths(cost[x].KK, rows, cols, knights[x].r, knights[x].c, qkk);

 int min_so_far = INF;
 int sum = INF, tmp_sum, sim_king_dist;

 for(int r = 0; r < rows; ++r){
  for(int c = 0; c < cols; ++c){
   sim_king_dist = max(abs(king_row - r), abs(king_col - c));
   sum = get_sum(board[r][c].dists, knights, num_knights) + sim_king_dist;

   for(int k = 0; k < num_knights; ++k){
    tmp_sum = sum;

    sum -= (sim_king_dist + board[r][c].dists[knights[k].r][knights[k].c]);

    sum += (cost[k].KK[r][c].for_king + cost[k].KK[r][c].for_knight);

    if(sum < min_so_far)
     min_so_far = sum;
    sum = tmp_sum;

 ofs << min_so_far << "\n";
 return 0;

A result of a rivetting journey over the weekend...

Sunday, July 27, 2014

A Red-Black Tree - slightly unusual

The Red-Black Tree is not something I can implement without referring a standard textbook - it's implementation is just way too intricate. Even so, when you read about it, everything conceptually clicks, somehow, but you wonder how someone could have got around to actually inventing this beast !

The Red-Black Tree is the data structure of choice when a worst case O(log N) complexity is desired. As such AVL trees have preceded them and they make the same guarantees but it seems the AVL trees have lost out to the Red-Black Trees in recent implementations. The AVL tree seems to be too eager to re-balance the heights of the sub-trees resulting in larger constant factor in typical usages. Red-Black trees just don't care too much about imbalances, by only guaranteeing that any path to leaf contains a maximum of twice the number of nodes than the other paths from the same node.

So, we see that the Red-Black Tree is incorporated in some very commonly used C++ - STL structures (maps).

I have implemented the tree now, it's a standard implementation, but with a few twists on how it is used. See my GitHub source shared here...

The Red-Black tree in my implementation has the following features :

1. It has a C++ interface, but there are no pointers in the interface or internal code. There are only 4 byte unsigned integers serving as offsets into a linear structure, such as a vector or array.

2. Tree operations, even insertions, do not require copying an element or compound objects. As such the objects are maintained in a vector by the client code and the tree only maintains offsets into the client's vector. Goes without saying that the objects to be inserted need not have a default constructor either.

3. Comparison function (actually a class) is supplied by the client code and invoked by the tree. Nothing new here though ...

4. Removing elements from the tree is complemented by a free-list feature. The free-list holds a list of freed elements so that subsequent insertions can re-use memory from removed elements.

I hope to derive these advantages from these characteristics :

1. Interleaved insertions-deletions should behave better, with reduced need to allocate-release memory, maybe to the OS or to a shared memory pool.

2. Not using any pointers presents multiple benefits - the memory consumed by the tree will not change just by changing between a 32-bit and a 64-bit OS ! Since the 4 byte unsigned integral offsets all refer to memory within a single contiguously allocated array, the memory references should be more compact/localized. Cache hits should be better than what could have been if memory for nodes was allocated individually.

3. No copying of objects involved. Default constructor on elements to be inserted is not required. Only 4 byte offsets referring to the container are stored. Unfortunately, such references mean that offsets of existing elements cannot be changed, that might happen if the container's elements were to be deleted.

This is something in work too - I am planning to implement a companion container to this tree, that can support deletions too. Also to come are a few other data structures that are not commonly found in libraries...