Sunday, July 27, 2014

A Red-Black Tree - slightly unusual

The Red-Black Tree is not something I can implement without referring a standard textbook - it's implementation is just way too intricate. Even so, when you read about it, everything conceptually clicks, somehow, but you wonder how someone could have got around to actually inventing this beast !

The Red-Black Tree is the data structure of choice when a worst case O(log N) complexity is desired. As such AVL trees have preceded them and they make the same guarantees but it seems the AVL trees have lost out to the Red-Black Trees in recent implementations. The AVL tree seems to be too eager to re-balance the heights of the sub-trees resulting in larger constant factor in typical usages. Red-Black trees just don't care too much about imbalances, by only guaranteeing that any path to leaf contains a maximum of twice the number of nodes than the other paths from the same node.

So, we see that the Red-Black Tree is incorporated in some very commonly used C++ - STL structures (maps).

I have implemented the tree now, it's a standard implementation, but with a few twists on how it is used. See my GitHub source shared here...

The Red-Black tree in my implementation has the following features :

1. It has a C++ interface, but there are no pointers in the interface or internal code. There are only 4 byte unsigned integers serving as offsets into a linear structure, such as a vector or array.

2. Tree operations, even insertions, do not require copying an element or compound objects. As such the objects are maintained in a vector by the client code and the tree only maintains offsets into the client's vector. Goes without saying that the objects to be inserted need not have a default constructor either.

3. Comparison function (actually a class) is supplied by the client code and invoked by the tree. Nothing new here though ...

4. Removing elements from the tree is complemented by a free-list feature. The free-list holds a list of freed elements so that subsequent insertions can re-use memory from removed elements.

I hope to derive these advantages from these characteristics :

1. Interleaved insertions-deletions should behave better, with reduced need to allocate-release memory, maybe to the OS or to a shared memory pool.

2. Not using any pointers presents multiple benefits - the memory consumed by the tree will not change just by changing between a 32-bit and a 64-bit OS ! Since the 4 byte unsigned integral offsets all refer to memory within a single contiguously allocated array, the memory references should be more compact/localized. Cache hits should be better than what could have been if memory for nodes was allocated individually.

3. No copying of objects involved. Default constructor on elements to be inserted is not required. Only 4 byte offsets referring to the container are stored. Unfortunately, such references mean that offsets of existing elements cannot be changed, that might happen if the container's elements were to be deleted.

This is something in work too - I am planning to implement a companion container to this tree, that can support deletions too. Also to come are a few other data structures that are not commonly found in libraries...

Saturday, July 26, 2014

About API design and copy semantics

This is a tale of how a combination of counter-intuitive copy semantics and API design kept me debugging a piece of code, late into the night, for several hours... Late night coding might be fun but it there is never any fun in working in a crisis mode, with time pressure and against a bug.

I suspect that the reasons behind my ordeal, are also the reasons why most programmers tend to vastly under-estimate the time it will take them to finish a programming task.

Sequences or contiguously allocated arrays usually offer some facilities to copy their internal representation, that is efficient, and shallow. Certainly more efficient than simply iterating over the entire sequence and copying each object by value. Typically such sequences have a raw buffer where the data is managed, a 4 or 8 byte 'length' to know the size of that buffer and whether the sequence 'owns' the buffer or not.

Ownership of the raw buffer is the "hot potato", the sequence which ends up with it should be the one releasing the buffer's memory. The consequences are ghastly - If both of them release or if both fail to release.

When a copy of a sequence is required, into another sequence, the pointer to the raw buffer is assigned, the 'length' is copied and the ownership is relinquished by the original sequence depending on what the caller wants.

Consider such feature from a CORBA sequence, and an implementation from 'Orbacus',
typedef OB::FixSeq< ::CORBA::ULong, int > MSeq; /* CORBA's fixed sequence of integers */

MSeq source;
source.length(2); /* Fill some data to test copy */
source[0] = 1;
source[1] = 2;

MSeq target; /* Empty sequence */

bool takeOwnership = true; /* Let's have target sequence acquire ownership, and source sequence relinquish it too ! */

/* Now actual call to copy sequence into target  */
target.replace(source.maximum(), 
               source.length(), 
               source.get_buffer(takeOwnership), 
               takeOwnership);


I noted the following painful aspects of the 'replace()' call, that is supposed to do the efficient copy :

1. The name itself, 'replace' is suggestive of an action to 'change the guts' of the target sequence, but implies nothing about the fact that even the source sequence can be totally 'gutted' through passing certain choice of options transferring ownership.

2. How much of grunt work is expected from the caller ! The function takes 4 arguments including the length, maximum allocation of the source sequence and buffer, all internal details of the source sequence... Why does it not do with just a reference to source sequence and ownership option, and use all such internal detail from the source sequence itself ? After all, 'replace()' is a member function.

3. I find that boolean option of 'takeOwnership' is accepted twice in the same function call - first to tell the source that it should relinquish ownership and second to have the target sequence acquire it. Now, I can't think of a situation where the caller might desire to pass 'takeOwnership' as 'true' in first and 'false' in second or the other way. In fact, using different values for the ownership caused a crash that I debugged. The code was deep into the implementation and the results are totally counter-intuitive.

/* This code crashes  */
/* We are copying between 3 sequences : original -> intermediate -> final */
#include "OB/CORBA.h"
#include 

int main(){

typedef OB::FixSeq< ::CORBA::ULong, int > MSeq;

MSeq original;
original.length(2);
original[0] = 1;
original[1] = 2;


MSeq intermediate; 
intermediate.replace(original.maximum(), 
                     original.length(), 
                     original.get_buffer(false), 
                     true);

std::cout << intermediate.length() << " " << intermediate[0] << " " << intermediate[1] << "\n";


MSeq final;
final.replace(intermediate.maximum(), 
              intermediate.length(), 
              intermediate.get_buffer(false), 
              true);

std::cout << final.length() << "\n"; /* Length is as expected ==> 2 */

std::cout << final[0] << " " << final[1] << "\n"; // Still ... Crashes here...

return 0;

}

/*
To build :
g++ -I  -L  -lOB -lJTC -ldl -lpthread 

*/


What is certainly bad is that the length of the 'final' sequence is 2 but the internal buffer is just not there, any access causes a crash. The resulting sequence is simply not internally consistent.

The 'replace()' function API is designed such that it accepted too many arguments, that should have been deduced from a source reference. And that made it easy for a caller to mis-use the API. In that sense, it fails to have certain characteristics mentioned in an excellent piece from Joshua Bloch here !

Saturday, January 18, 2014

Beginning with Haskell

I just gave myself permission to spend some time learning a new programming language, and here are my notes on my initial effort. I bet anyone who has read this, has on their radar, to learn some functional programming language. I chose Haskell. Judging by the experiences of others attempting the same thing, and with vastly more neural firepower than me, maybe that is a brave decision !

So, yes, Haskell is daunting from the outside, but I am only first checking out if it is really possible to make any progress without being an expert in Category Theory or Type Systems ... After all, Haskell is supposed to be very general purpose. Can it give me some prop by allowing a humble "Hello World" come out easy ?

Here is one of the first programs I attempted : calculating the 'n'th Fibonacci number.

With Haskell, this is a first attempt,

fib n
       | n == 0 = 0
       | n == 1 = 1
       | n > 1 = fib(n - 1) + fib(n - 2)


So terse and clean. I could challenge someone to write a specification of the Fibonacci program in plain English and in fewer characters. Even without any introduction to the syntax, we can see through this affair easily, as if someone just wrote the pseudo code. Of course, as horribly performing as a naive C++ program following the same pattern ! Calculating the 30th Fibonacci takes upwards of 5 seconds and if someone asks for the 60th Fibonacci, this program won't return before the Haley's comet and much later. Exponential algorithmic complexity can bite hardest, even with Haskell.

So here is a result after more hours of reading this fantastic introductory online book named "Learn You a Haskell" :

fib_improved n = fst $ fib_helper n (0, 1)

fib_helper 0 (a, b) = (a, b)
fib_helper n (a, b) = fib_helper (n - 1) (b, a + b)


Somehow, this is equally short as the earlier version but is super fast. Asking for the 100th Fibonacci instantly gets me this result : "354224848179261915075" !

The last line of this code tells the whole story, which is using recursion to iterate over a "tuple" repeatedly and transforming the tuple in the process. Probably this textual explanation does a far worse job (at explaining) than what the single line of Haskell makes obvious. The first two lines are just special cases and presentation.

The patterns I see so far are twofold :
1. Recursion is cool, but the algorithmic considerations are paramount. Haskell, with its principles about immutable data, presents additional twists to the efficient coding game. We need to worry about unnecessary copying (to implement data immutability), and how not to blow up the stack even while using recursion.
2. One of the techniques to make things efficient in Haskell is to pass additional arguments to function calls and "build" an answer through them, instead of first calling the function and then processing the returned data from a function. That is what the second version of the Fibonacci program does differently from the first.

Encouraging !


Tuesday, June 19, 2012

Money matters

I never thought that any development in Graph Theory would have anything to do with war between countries. But it seems to. Sometime during the cold war era, the Americans were solving the problem of how to break up the extensive soviet railroad network with the least effort. So which are the places to bomb and destroy railway lines such that there is a maximum disconnect between railway stations ? You just find a "min-cut" for the graph of towns and connecting railway lines. The same min-cut helps in efficiently blowing up oil pipelines and electricity supply as well. And so, in due course of time, we got a few more pages added to the book of the Theory of Graphs.

Well, the deeper point is notice what motivates innovation. During times such as the cold war, innovation in America was driven by a country-wide feeling of competition, threat, fear or maybe mass hysteria. It can lead people to work harder and think harder. There was rapid development in not just missile technologies but also space missions, electronics, energy and of course, computing. Now, the cold war is over and the emotions have cooled down. But people are still doing smart things.

My guess is this. I feel that the collective demographic feelings can only partly explain the rapid developments in that period or any period. For any creative effort to truly succeed, someone has to finance all the efforts. In that period, the governments were rather generous in funding research into all areas that were deemed important. So a lot of money went into missions to the moon and it bore spectacular results. Today, the money is flowing into different areas and that is where all the action is happening. The things that people value can change from time to time but the research projects which click are the ones where the funding is active. Of course, people can build great things without much money (software) or without much secrecy (Open Source Software). But they won't be able to send a man to Mars and certainly cannot build a controlled nuclear fusion reactor. My point is that development is not stalled because of limits of human intellect, but because nobody really values it enough to spend hard-earned money on it.

An article by an astrophysicist named Neil Tyson here makes the same point very elegantly. Elected governments may say that their first priority is to build "clean and green" renewable energy projects but how can we find if the powerful folks really serious about it ? I'd simply look at the kind of money they are spending on it.

Monday, February 13, 2012

Monstrosity as in C++

I am writing this piece out of a renewed terror of C++, particularly after reading some of the items from "Effective C++/More Effective C++" by Scott Meyers (http://aristeia.com/books.html) and now moving on to "Modern C++ Design" by Andrei Alexandrescu (http://www.moderncppdesign.com).
I have been using C++ for practically most of the last 10 years but it still succeeds in throwing up dark corner cases and quirks. Particularly jarring are some of them which go against my intuition of how the code should behave. My experience is in the same vein as here and here. More criticism of C++ is easily found all over the web, some of it coming from star programmers like Linus Torvalds as well. Whenever I am comfortable with using C++ and reading C++ code, it is just when the code uses the small sub-set of C++ that I use and I know. And everyone just seems to use their own favorite sub-set of C++.
An example is the items in the books by Meyers. Many of them are very informative and it never feels like a drab listing, thanks to the lively style of Meyers' writing. Most appreciable are items about class design and efficiency matters. In contrast, most troubling items are those that attempt to explain, cope with or "engineer" around C++ syntax. Operator overloading still tops my list of nonsense and unnecessary features. If it’s just syntactic sugar, it is making everyone a diabetic.
To be fair, C++ has had to deal with supporting new features as well as maintaining backward compatibility with C. It is feature-rich and multi-paradigm. It seems that individual features of C++ like Classes, Inheritance, Templates, Exception Handling and others are very useful and well thought of. However, when these facilities combine in a real world code, they have a devastating effect. Explanations of smart "techniques" to somehow handle each and every combination case increasingly venture into the territory of the obscure and the ugly. Lack of garbage collection and memory/system issues only worsen the situation. So when Meyers tries to work out such complex issues, it just strikes me that such situations should just be avoided at all costs. There should be simpler ways of doing such things. Knowing C++, it is apparent that there would be many ways of doing the same thing, some simple, others not so.
I hope to write something more over the next weeks about C++ and particularly, "simpler" ways of doing things and anything truly useful, like compile time checks.

Tuesday, November 22, 2011

Hunting Eichmann By Neal Bascomb : An international thriller.

I finished reading an absolute thriller this week, named 'Hunting Eichmann' written by Neal Bascomb, and found it simply un-putdownable. Its about a notorious Nazi war criminal named Lt. Col. Adolf Eichmann, his flight out of war-torn Germany into Argentina and then capture by Israeli secret service agents and trial.
Eichmann was a head of department on Jewish issues during the War and was in direct charge of deporting and plundering the jews of Hungary, Austria, Poland among other places to eventually dispatching to extermination in concentration camps. Eichmann got orders from Himmler but was known to be extravagantly aggressive in executing those orders. His infamy grew towards the end of the war, in 1944, through his publicized mass murders in Hungary. Bascomb has done great work in unraveling what Eichmann was up to, on the run just after Germany lost the war. Its surprising that he could manage to linger for as many as 6 years in Germany itself doing various odd jobs. Surprising because of the lack of attention from the Allied forces and failure to apprehend senior Nazi leaders and known mass murderers, such as Eichmann. Eventually it seems that Eichmann escaped to Argentina, not because of threat of capture in Germany but because of boredom and in search of a new life in a new country. The allies were clearly busy with countering the soviet threats and also capitalizing on the German technology and scientific manpower.
Eichmann, in Argentina was leading the life of a lower middle class citizen and not something like a celebrated fugitive in a favorable foreign country. Through meticulous planning and daring, more than 10 Israeli agents belonging to the Mossad reach Buenos Aires, Argentina. They are able to locate Eichmann's house through information provided primarily by the father of the girlfriend of Eichmann's son. Each of the Israeli agents have their own story to tell, mostly of losing family and kin in the concentration camps driven by Eichmann and others. The agents thankfully kept their cool and did not summarily assassinate Eichmann, but managed to smuggle him on an Israeli civilian jet to Israel for a proper trial.Eichmann was hanged until death in Israel and the ashes after the incineration of his corpse were scattered into the sea to avoid giving any neo-nazis a place of worship. After hanging, when Eichmann was released off the noose, there was a loud sound - of air escaping from the lungs of the corpse. I tried to imagine what that would sound like ! It is said to have given nightmares to the executioners ...

About the writing
The events are historical and dramatic but the narrative is colorful as well. Perhaps the reading seems even more gripping to me because the events are based on documented facts and the author manages to weave them into a superb storyline. It is pacy, there is never a dull moment anywhere. On several occasions, Bascomb does strain to impress upon the reader the dangers and suspense that the Israeli team faced. Sometimes he really overdoes it. Why should Vera Eichmann have a nightmare the very night before her husband is apprehended on the street ? Why should a car with four men stop Eichmann a few days prior to the operation just to ask directions to the city and put Eichmann in a state of tenuous doubt ? Such mentions border on the cinematic and the attempt to incite thrill just shows. Regardless, I found it eminently absorbing. The tension that builds when the Israelis are about to grab Eichmann on the street and push into a car is something to be experienced.
Some areas are thoroughly unconvincing and jarring. How come the Mossad chief even aimed to capture the Nazi doctor Josef Mengele also hiding in Argentina and take him along with Eichmann ? That too without informing other agents at all before Eichmann was captured ? The planning, reconnaissance and effort that went into trapping Eichmann was phenomenal, how could Mengele be captured as if it is just a casual takeaway ?

Banality of Evil
None of the characters in this affair are 'larger-than-life'. No matter what the 'James Bond' movies suggest, I have known that the secret service agents and spies are never so, they have to make every attempt to blend in with their surroundings and not catch attention. They need to be good at evading questions, telling lies, forgery, camouflaging through make-up, calm even under life-threatening pressure and be infinitely courageous. The Mossad agents, who trapped Eichmann were perfectly so in every degree. Further, these men had families and were settled in life otherwise. A comment on Mossad chief Isser Harel's recruitment strategy says that Mossad needed perfectly honest men to do a scoundrel's work.
Eichmann on the other hand was a revelation. During the war, this guy had a wife, three sons and a good house. He was slightly arrogant and had a taste for extravagant life but his appearance was no way close to resembling a bloodthirsty monster that he actually was. Eichmann's mass murders were purely hate-driven. When the Israeli agents caught him, Eichmann was nearly a nervous wreck, weak and balding, eking out a living on the city outskirts. Although its not unusual for such a weakling to be cruel, for a settled family guy to send millions of families to their death appears to be. That's where the banality of evil strikes.

Impact
Arresting Eichmann and hanging him was truly significant event in the context of our understanding of the terrible carnage in the war. Our collective consciousness of the Holocaust was enhanced because of the Eichmann trial. In contrast, the Nuremberg trials were really about the topmost leaders in the Nazi hierarchy and those were the public faces of the war crimes, not the real executioners. It also seems to have brought the Israeli Prime Minister David Ben Gurion a lot of political momentum and for Mossad, international awareness. According to Bascomb, Ben Gurion hurried the announcement of Eichmann's capture once he knew that more than 50 israelis were already aware of it. Such an announcement clearly endangered the Mossad operatives who were still in and around Argentina. Politicians are the same everywhere !

Israel : At founding and Modern times.
The nation of Israel is shining today as a progressive nation in spite of limited resources (human and natural) and being surrounded by Arab enemies. Terrible events like the Holocaust and Arab wars preceded the founding of this country some 60 years ago. Wars have continued. In Bascomb's words, what is admirable is the 'strong sense of purpose' that the young Israelis have, to withstand all calamities, fight for survival and be unique in its middle eastern neighborhood. Through ingenuity and hard work, they have succeeded in even farming in a desert through water management technologies. Somewhere I have heard about the Israelis retro-fitting an american F-16 fighter plane engine into the body of a french Mirage-2000 to gain performance advantages of both such aeroplanes.

Overall, it was a happy reading.

Monday, October 24, 2011

From "How Life imitates Chess" By Garry Kasparov

It has been a thought-provoking and pleasurable three weeks with "How Life imitates Chess", by Garry Kasparov. Well-written, in the sense that it does not feel like a self-help book at all (and it is not anyway). But it introduces the elements of success : approach, attitude, strategy, tactics, habits and thinking style. The famous world chess champion starts by an anecdote of his own chess encounters and then projects the concept into a real-life situation. Points to note are -

1/ Become more self-aware. Be constantly questioning the self on all decisions. Avoid being on an autopilot and relying on plain instincts to take any decisions.

2/ Play your own game. Everyone has his own style. Play a game that suits the style and in that way, make detailed notes on one self much more than focusing on any opponent.

3/ Add an element of surprise. Introduce some imagination and fantasy in the game. Break the routine every once in a while. Let the mind wander and bring in any radical ideas.

4/ Become a strategist as well as a good tactician. Pay attention to the bigger picture in addition to the detailed and sometimes routine calculations.

5/ MTQ ( Material, Time, Quality) is a crucial triad that tells the governing factors important for evaluating a situation. Material could be money, and physical resources. Time is universal. Quality is often understated in importance. Quality could mean having some strategic, knowledge, energy, ideas or skills advantage. Although quality is desirable as an end by itself, it can be utilized later for material or time. Somewhere, more than adequate importance is showered on material. Time is certainly sacrificed routinely for it. Quality is rarely in the picture when deciding what long-term strategy needs to be followed.

6/ Being able to evaluate a situation is quite different from just listing possibilities. Better decision making requires better evaluation and better evaluation relies on a proper blend of the MTQ factors. In particular, the MTQ blend must fit with temperament, strategy and willingness to take risks. MTQ factorization should provide at least a systematic way of beginning an evaluation. It should help take some weight off the instinctive and reactionary tendency.

7/ There can be a deadlock in some circumstances when I cannot see a good course of action. It is then a good time to introduce a radically different idea just to break the routine. Such a step can surprise others and bide some time for me. Importantly, it wins some Quality component of MTQ because it improves position, going from stagnant to dynamic, in such a way that I can take advantage later.

8/ Understand the rationale behind some decisions and events. Merely following a precedent does not go a long way because every situation is distinct from others and there is no sure recipe for all.

9/ SWOT (Strength, Weakness, Opportunity, Threat) analysis is the real life counterpart of MTQ. Only if I know the current position (or state of material, time and quality) can I really decide how to plan ahead. In chess, the opening game is the time for creativity but primarily for following certain tried-and-tested methods. Similarly, life's early years are spent in a grind on a beaten path of growing-up, schooling, college and career. Rarely does it vary by much. The middle game in chess is greatly dynamic, where there are not only various ways of doing things but also each move can potentially dispatch us either on a road to victory or total disaster. I am well into the middle game. The end game is where cold calculation and predictability rise to prominence. Everyone can get complacent and bored by the routine and predictability which makes one vulnerable to mistakes. Each day and each moment is therefore a challenge - to be constantly critical and questioning about ones decisions, searching for better alternatives, not become complacent or go on an auto-pilot.

10/ Seize the attacker's advantage, which means taking the initiative, having courage, taking calculated risks and innovation. The attacker gets a positive momentum going in the favour. The attacker seems to have a positive pressure to take action and before that, to take a decision, whereas the defender simply needs to wait and watch. The attacker always wants to disrupt the status quo and in that way, is naturally complacency-proof. According to Kasparov, attack is better not because it is the only way, but that it works best, and certainly did for him.

11/ Question success, and failure too. We should try to understand why things succeed or fail. We love to find agreement and consensus because it saves us from taking hard decisions and confrontation. Which is why we are surrounded by like-minded people and those with similar habits. But it is important not to be.

12/ Intuition is not about some amateur coming up with the right answer without much thinking. That is more like luck. Intuition is all about someone with experience and skill hitting on an unconventional approach or a novel idea.

13/ Have a multidimensional personality. It is better to be good at something other than the profession as well. Being a good public speaker will instill confidence in all other areas of occupation as well. Richard Feynman is said to have improved as a physicist by being a better drummer as well.

14/ When there is a crisis, it essentially means that there is a clear and present danger as well as an opportunity. Its very difficult to stir up opportunities in a still, static and silent environment. Only a crisis provides a window to create a break.

15/ In real life as in chess, there is never much doubt on what to do when a problem is seen at hand. Our minds begin to take it up and solve it in whatever manner appropriate. But the grander question is when there is no apparent problem, what should be done then ? Should a plain old routine be followed, or should there be some kind of improvisation ? Too often, such situations lead to complacency. How to detect a crisis before it forms, and particularly when there is nothing really visibly wrong with anything ? I feel the key to a great strategy is thinking of such questions and then tackling them. Ordinary thinking only leads us to answer questions that we see. The better minds come up with the right questions, worthy enough to be solved. This last one is my favorite...