Sunday, August 31, 2014

Code optimization patterns - my quick notes

These patterns are based on what I read about here. I am still reproducing (dumping !) it here (with as high fidelity as I can manage), primarily because these patterns are what I do use in assorted situations and also that I want to keep in deep touch with all of them and keep applying them as much as I can. These are just my notes.

Some of these ideas are thrown up as directly conflicting with Object Orientation principles, by breaking encapsulation and trashing inheritance, in favor of composition. It really is a "black art" to know the appropriate situations where I need to apply them.

Here they are :

1/ Data locality -
a/ Converting a list of pointers to a list of offsets(4 byte integers) into contiguously allocated arrays. Iterating over an array of pointers is bad - it makes memory access to random locations - array of pointers "thrash the cache" (too bad, it rhymes !). Allocating array of objects in contiguous memory locations makes it likely that successive objects will be "found" in the cache.
b/ Hot-cold separation - Perhaps the object is large and the array of objects will not fit in the cache anyway. Perhaps for most iterations, only a few specific attributes of each object are processed. Splitting a larger object into smaller sets so that more frequently accessed parts are together. This might make the entire array of objects fit in the cache too.
c/ Reduced branch mis-prediction by
c1/ sorting a list of objects based on some criteria such that iteration only happens for a subset without enclosed 'if' condition
c2/ Creating separate lists of objects based on same criteria so that sorting an entire list is avoided
c3/ Just placing the iteration loop inside the 'if' condition, not other way - simple enough.
c4/ Not mixing different types of objects together in an array if they are to be iterated upon, even if they have same base class types and having separate arrays of each type.

2/ Dirty flag - I can think of it as the poor man's lazy evaluation. Basic dirty flagging is to have an additional boolean flag to tell if the object / cache is stale and requires refreshing. It helps avoid computation until it is required and only if it is required. Another flavor to the pattern is time-stamping or sequence numbering - We have a number or time-stamp that only increments without any danger of repeating and use it to mark the 'version' of the object. Whenever the sequence number is less than some other 'master', we update the cache.

3/ Object pooling - This is basically re-using allocations of some sort - either resources like database connections, threads or simple objects or even plain memory. We just want to avoid creating and destroying objects in memory when we know they really are required in future in some other contexts. There could be low and high watermarks set so that a minimum and a maximum limit on the number of objects in the pool is enforced, for not consuming too much memory and also being ready for multiple requests in burst. If we think of multi-threading scenarios, then each thread dynamically allocating memory from Operating systems is known to be slow, due to locking required for using the common heap. If each thread can use its own pool of memory, then no locking is required, re-use is achieved and thread-specific memory pool should not end up being fragmented.

4/ Spatial partitioning - Organize objects that are "close" to each other so that given an object, the "neighbors" can be efficiently located and processed. This is obviously helpful with real geometric spaces such as nearby particles being checked for collisions. However, the distance as a metric can be extended in other ways - Given a tree like structure, the distance between any two nodes may be expressed in terms of the number of common parents between them. This helps us partition the nodes better, so that queries like finding the number of child nodes for any given node may be made more efficient. This partitioning may be limited to nodes at same levels in the tree too. This pattern does present difficulties when the nodes can actually change distances to other nodes and we have to come up with efficient ways of updating the partitioning depending on the problem context.

Monday, August 11, 2014

USA Computing Olympiad Training Problem 3.3.3 (Camelot)

This is about an algorithmic coding problem that I saw on the USA Computing Olympiad training pages. There were several others that I solved earlier to reach this stage but this one is special ... named Camelot, in section 3.3.3. It seems to have been first posed in the IOI 1998.

Attempting to solve it, swept me through the full range of emotions, going from cocky over-confidence to being nastily surprised to bewilderment to despair, frustration and finally to immense relief of eventually getting rid of it by solving it ! There was absolutely no elation at all on solving it, just massive relief, that's all !

The statement goes like this :

Centuries ago, King Arthur and the Knights of the Round Table used to meet every year on New Year's Day to celebrate their fellowship. 
In remembrance of these events, we consider a board game for one player, on which one chess piece king and several knight pieces are placed on squares, no two knights on the same square.

During the play, the player can place more than one piece in the same square. 
The board squares are assumed big enough so that a piece is never an obstacle for any other piece to move freely.

The player's goal is to move the pieces so as to gather them all in the same square - in the minimal number of moves. 
To achieve this, he must move the pieces as prescribed above. Additionally, whenever the king and one or more 
knights are placed in the same square, the player may choose to move the king and one of the knights together 
from that point on, as a single knight, up to the final gathering point. Moving the knight together with the king 
counts as a single move.

Write a program to compute the minimum number of moves the player must perform to produce the gathering. The pieces
 can gather on any square, of course.



Line 1: Two space-separated integers: R,C, the number of rows and columns on the board. There will be no more than 26 
columns and no more than 30 rows.
Line 2..end: The input file contains a sequence of space-separated letter/digit pairs, 1 or more per line. The first pair 
represents the board position of the king; subsequent pairs represent positions of knights. There might be 0 knights or the 
knights might fill the board. Rows are numbered starting at 1; columns are specified as upper case characters starting with `A'.

8 8
D 4
A 3 A 8
H 1 H 8

The king is positioned at D4. There are four knights, positioned at A3, A8, H1, and H8.

A single line with the number of moves to aggregate the pieces.
SAMPLE OUTPUT (file camelot.out)


They gather at B5. 
Knight 1: A3 - B5 (1 move) 
Knight 2: A8 - C7 - B5 (2 moves) 
Knight 3: H1 - G3 - F5 - D4 (picking up king) - B5 (4 moves) 
Knight 4: H8 - F7 - D6 - B5 (3 moves) 
1 + 2 + 4 + 3 = 10 moves.

You could assume that the King and Knight pieces move just like the standard chess moves. The board co-ordinate annotation is exactly the same too : The columns (A-Z) start go from left to right, while the rows (1 - 30) go from down to upwards.

Basically it requires full appreciation of breadth first search. To solve it, you need to use breadth first search in novel ways, tweaking it to suit the fact that the King and the Knight pieces can join/merge while moving and adjust shortest distances accordingly.

I struggled with the problem since initially I (wrongly) assumed that even two or more Knights could join just like a King and a Knight. That could make this problem even tougher. But as such, the join only happens between King and a Knight.

The input limits on the problem are very delicately adjusted to time-out the solutions that have even one excessive nested iteration. Suppose for 'R' rows, 'C' columns and 'P' pieces, the expected time complexity should be O(R*C*P) ... For the maximum input dimensions, it works out to be approximately 600K operations, should be comfortable for a time limit of 1 second. My initial solution was working OK, but had an extra iteration on number of pieces in the innermost loop, that was timing out my solution.

Finally, I got the code below, that does these things :
1. Records the distance to each square for the King.
2. For each square on the board, records the shortest path to every other square, using breadth first search, and saves all of it.
3. Tweaks breadth first search to record shortest path from each Knight to all other squares & assuming that this Knight picks up the King. This uses information stored in (1)
4. Iterates over the entire board, checking if the square can be the final gathering point. In the innermost loop, each Knight is examined if that Knight can pick up the King to reach the candidate gathering square in the cheapest way.

#include <fstream>
#include <vector>
#include <queue>
#include <math.h>

const int INF = 10000000;

int min(int a, int b){if(a < b) return a; return b;}

int max(int a, int b){if(a > b) return a; return b;}

int kr[8] = {-2, -1, 1, 2, 2, 1, -1, -2};
int kc[8] = {1, 2, 2, 1, -1, -2, -2, -1};

class co_ords{
 int r; int c;

class Knight_King{
 int for_knight;
 int for_king;
 int r;
 int c;

void init(Knight_King dists[30][26], int rows, int cols, int king_row, int king_col){
 for(int rr = 0; rr < rows; ++rr){
  for(int cc = 0; cc < cols; ++cc){
   dists[rr][cc].for_king = max(abs(rr - king_row), abs(cc - king_col));
   dists[rr][cc].for_knight = INF;

class Knight_Cost{
 Knight_King KK[30][26];

bool valid(int r, int c, int rows, int cols){
 if(r >= 0 && c >= 0 && r < rows && c < cols)
  return true;
 return false;
void reset(int arr[30][26], int rows, int cols){
 for(int x = 0; x < rows; ++x){
  for(int y = 0; y < cols; ++y){
   arr[x][y] = INF;
void knights_shortest(int dists[30][26], int rows, int cols, int r, int c, std::queue& q){
 dists[r][c] = 0;
 co_ords rc; rc.r = r; rc.c = c;
 int rr, cc;
  co_ords rc = q.front();
  for(int i = 0; i < 8; ++i){
   rr  = rc.r + kr[i];
   cc = rc.c + kc[i];
   if(valid(rr, cc, rows, cols) && dists[rr][cc] == INF){
    co_ords rrcc; rrcc.r = rr; rrcc.c = cc;
    dists[rr][cc] = min(dists[rr][cc], dists[rc.r][rc.c] + 1);

void knights_paths(Knight_King costs[30][26], int rows, int cols, int r, int c, std::queue& q){
 costs[r][c].for_knight = 0;

 Knight_King rc; rc.r = r; rc.c = c; rc.for_king = costs[r][c].for_king; rc.for_knight = costs[r][c].for_knight;
 int rr, cc;
  Knight_King rc = q.front();
  for(int i = 0; i < 8; ++i){
   rr  = rc.r + kr[i];
   cc = rc.c + kc[i];
   if(valid(rr, cc, rows, cols)){
    bool enq = ((min(rc.for_king, costs[rr][cc].for_king) + rc.for_knight + 1) < (costs[rr][cc].for_king + costs[rr][cc].for_knight));
     Knight_King rrcc; rrcc.r = rr; rrcc.c = cc; rrcc.for_king = min(rc.for_king, costs[rr][cc].for_king); rrcc.for_knight = rc.for_knight + 1;
     costs[rr][cc].for_king = min(rc.for_king, costs[rr][cc].for_king);
     costs[rr][cc].for_knight = rrcc.for_knight;

int get_sum(int dists[30][26], const std::vector& knights, int num_knights){
 int ret = 0;
 for(int i = 0; i < num_knights; ++i){
  ret += dists[knights[i].r][knights[i].c];
 return ret;

class square{
 int dists[30][26];
 square(int rows, int cols){
  for(int i = 0; i < rows; ++i){
   for(int j = 0; j < cols; ++j){
    dists[i][j] = INF;

int main(){
 int rows, cols;
 char col;
 std::vector knights;
 std::queue qkk;
 std::queue q;

 int row, num_knights = 0;
 int king_row, king_col;
 std::ifstream ifs("");
 ifs >> rows >> cols;

 ifs >> col >> row;
 king_col = (col - 'A');
 king_row = (rows - row);
 while(ifs >> col){
  ifs >> row;
  co_ords rc;
  rc.r = rows - row;
  rc.c = col - 'A';
 std::ofstream ofs("camelot.out");
 if(num_knights == 0){
  ofs << "0\n";
  return 0;

 std::vector board[30];
 for(int i = 0; i < 30; ++i){
  for(int j = 0; j < 26; ++j){
   square s(rows, cols);
   knights_shortest(board[i][j].dists, rows, cols, i, j, q);

 std::vector cost(num_knights);

 for(int x = 0; x < num_knights; ++x){
  init(cost[x].KK, rows, cols, king_row, king_col);
  knights_paths(cost[x].KK, rows, cols, knights[x].r, knights[x].c, qkk);

 int min_so_far = INF;
 int sum = INF, tmp_sum, sim_king_dist;

 for(int r = 0; r < rows; ++r){
  for(int c = 0; c < cols; ++c){
   sim_king_dist = max(abs(king_row - r), abs(king_col - c));
   sum = get_sum(board[r][c].dists, knights, num_knights) + sim_king_dist;

   for(int k = 0; k < num_knights; ++k){
    tmp_sum = sum;

    sum -= (sim_king_dist + board[r][c].dists[knights[k].r][knights[k].c]);

    sum += (cost[k].KK[r][c].for_king + cost[k].KK[r][c].for_knight);

    if(sum < min_so_far)
     min_so_far = sum;
    sum = tmp_sum;

 ofs << min_so_far << "\n";
 return 0;

A result of a rivetting journey over the weekend...