Monday, October 24, 2011

From "How Life imitates Chess" By Garry Kasparov

It has been a thought-provoking and pleasurable three weeks with "How Life imitates Chess", by Garry Kasparov. Well-written, in the sense that it does not feel like a self-help book at all (and it is not anyway). But it introduces the elements of success : approach, attitude, strategy, tactics, habits and thinking style. The famous world chess champion starts by an anecdote of his own chess encounters and then projects the concept into a real-life situation. Points to note are -

1/ Become more self-aware. Be constantly questioning the self on all decisions. Avoid being on an autopilot and relying on plain instincts to take any decisions.

2/ Play your own game. Everyone has his own style. Play a game that suits the style and in that way, make detailed notes on one self much more than focusing on any opponent.

3/ Add an element of surprise. Introduce some imagination and fantasy in the game. Break the routine every once in a while. Let the mind wander and bring in any radical ideas.

4/ Become a strategist as well as a good tactician. Pay attention to the bigger picture in addition to the detailed and sometimes routine calculations.

5/ MTQ ( Material, Time, Quality) is a crucial triad that tells the governing factors important for evaluating a situation. Material could be money, and physical resources. Time is universal. Quality is often understated in importance. Quality could mean having some strategic, knowledge, energy, ideas or skills advantage. Although quality is desirable as an end by itself, it can be utilized later for material or time. Somewhere, more than adequate importance is showered on material. Time is certainly sacrificed routinely for it. Quality is rarely in the picture when deciding what long-term strategy needs to be followed.

6/ Being able to evaluate a situation is quite different from just listing possibilities. Better decision making requires better evaluation and better evaluation relies on a proper blend of the MTQ factors. In particular, the MTQ blend must fit with temperament, strategy and willingness to take risks. MTQ factorization should provide at least a systematic way of beginning an evaluation. It should help take some weight off the instinctive and reactionary tendency.

7/ There can be a deadlock in some circumstances when I cannot see a good course of action. It is then a good time to introduce a radically different idea just to break the routine. Such a step can surprise others and bide some time for me. Importantly, it wins some Quality component of MTQ because it improves position, going from stagnant to dynamic, in such a way that I can take advantage later.

8/ Understand the rationale behind some decisions and events. Merely following a precedent does not go a long way because every situation is distinct from others and there is no sure recipe for all.

9/ SWOT (Strength, Weakness, Opportunity, Threat) analysis is the real life counterpart of MTQ. Only if I know the current position (or state of material, time and quality) can I really decide how to plan ahead. In chess, the opening game is the time for creativity but primarily for following certain tried-and-tested methods. Similarly, life's early years are spent in a grind on a beaten path of growing-up, schooling, college and career. Rarely does it vary by much. The middle game in chess is greatly dynamic, where there are not only various ways of doing things but also each move can potentially dispatch us either on a road to victory or total disaster. I am well into the middle game. The end game is where cold calculation and predictability rise to prominence. Everyone can get complacent and bored by the routine and predictability which makes one vulnerable to mistakes. Each day and each moment is therefore a challenge - to be constantly critical and questioning about ones decisions, searching for better alternatives, not become complacent or go on an auto-pilot.

10/ Seize the attacker's advantage, which means taking the initiative, having courage, taking calculated risks and innovation. The attacker gets a positive momentum going in the favour. The attacker seems to have a positive pressure to take action and before that, to take a decision, whereas the defender simply needs to wait and watch. The attacker always wants to disrupt the status quo and in that way, is naturally complacency-proof. According to Kasparov, attack is better not because it is the only way, but that it works best, and certainly did for him.

11/ Question success, and failure too. We should try to understand why things succeed or fail. We love to find agreement and consensus because it saves us from taking hard decisions and confrontation. Which is why we are surrounded by like-minded people and those with similar habits. But it is important not to be.

12/ Intuition is not about some amateur coming up with the right answer without much thinking. That is more like luck. Intuition is all about someone with experience and skill hitting on an unconventional approach or a novel idea.

13/ Have a multidimensional personality. It is better to be good at something other than the profession as well. Being a good public speaker will instill confidence in all other areas of occupation as well. Richard Feynman is said to have improved as a physicist by being a better drummer as well.

14/ When there is a crisis, it essentially means that there is a clear and present danger as well as an opportunity. Its very difficult to stir up opportunities in a still, static and silent environment. Only a crisis provides a window to create a break.

15/ In real life as in chess, there is never much doubt on what to do when a problem is seen at hand. Our minds begin to take it up and solve it in whatever manner appropriate. But the grander question is when there is no apparent problem, what should be done then ? Should a plain old routine be followed, or should there be some kind of improvisation ? Too often, such situations lead to complacency. How to detect a crisis before it forms, and particularly when there is nothing really visibly wrong with anything ? I feel the key to a great strategy is thinking of such questions and then tackling them. Ordinary thinking only leads us to answer questions that we see. The better minds come up with the right questions, worthy enough to be solved. This last one is my favorite...

Tuesday, September 27, 2011

Commentary on "Russia's War" by Richard Overy

I have had great expectations of this book Russia's War by Richard Overy but I had not been able to get around to reading it even two years since I bought it. And so I have finally begun. I see that this book recounts history of the second world war attack on Soviet Russia by Germany. But this is written with Russia, its people, army and polity, in the fore. It leverages some recently released archives to build a complete but not excessively detailed account. It's full of references to monumental sacrifices borne by the russian people and also to their unimaginable bravery, grit and fight against all odds. It all begins with a background of the Communist revolution and the bloody Civil war in 1917-1918. Then it goes on to focus on Stalin's leadership, misery heaped on russian people by a range of calamities, from famines, collective farming, secret police, to purges and torture.

Some of my impressions on reading this book are as follows,

About Soviet psyche : I came closer to understanding the soviet people. I had heard that the soviets are very lovable and gentle people who have borne such a monstrous deal of suffering, sacrifices and bloodshed. I would like to attribute all this to the russians specifically even if I write "Soviet". The soviet army did fight ferociously in the war, dumbfounded the germans who thought it was all suicidal. When Germany attacked the Soviet Union, it was widely expected to win in less than a few months even in the minds of observers as far away as in the US. The germans had won most of Europe in less than 18 months and had tremendous momentum going for them. The soviet army was supposed to be a primitive fighting force and a the country full of semi-asiatic people, they were the proverbial underdogs. I just love to see an underdog getting into a fight, against all odds and making history. Prof. Overy has made a case for the soviets by giving them due for forcing a win over Germany. This conflicted with my understanding hitherto that it was primarily the bitter russian winter and a series of blunders by Hitler that made Germany lose the war. So its not that Germany just lost the war, the Soviets did something to win it. Improvisation, military reforms, and massive hard work did it for the soviets. In recent times, reports of mathematical skills among soviets are abound, they have been among some notorious criminal hackers in the computing world.

Soviet feeling of being understated by Westerners : There is persistent mention in this book of the soviets being underestimated by the western powers. On one occasion, the soviets were not invited for peace talks with Germany. Above all, there is an unmistakable expression of disdain and racial/ethnic hatred among the germans and the british reserved for the russians. The Soviet Union suffered 80 % of the casualties among all the allied powers. Germany too had most of the defences lined up against the soviets compared to the british and americans. So even as I feel that the Britsh losses in second world war more than anything else won India its independence, its the german defeat at the hands of the soviets, more than anything else, that won the allies the world war.

Numbers of people dead, vanquished : The numbers are just mind-numbing. Just how cheap a soviet life was ! The pages are full of mention of thousands and millions of lives lost. By the time I finished, I could not really recollect some infamous massacres, like the one in Katyin forest of the Polish nationalists. I am certain that in spite of so many invasions that my country, India has faced, there has been no war or massacre or plunder that comes anywhere near half as close as what the soviets suffered.

Stalin : This is the personality at the centre of the russian response to German aggression. As in many other books, Stalin is a mysterious, power hungry, unpredictable and extremely cruel being. To me, Stalin appears to be an almost legendary master at accumulating power, setting people against each other, taking credit for others' achievements. But most notable is Stalin's capability to retain supreme power even as he goes on extracting bloody scarifices out of everyone. He vanquished ("purged" is the apt word) not only the obvious enemies of the collective farming (rich peasants or Kulaks) but also ukranians, minorities, intellectuals, army, and even the Communist party itself. He arrested and exiled the wife of a serving army general. Not even the celebrated Marshal Zhukov was spared of Stalin's caprice, hysterical suspicion and pathological cruelty. He let the secret police arrest, torture and kill any threatening opponents and then also proceeded to purge the secret police themselves ! Am reminded of Martin Amis (in "Koba, the dread"), where he says that Stalin was exceptionally harsh on Ukranians, Jews, minorities, and also georgians. Wonder what, Stalin was himself a Georgian. He hated his son because his son was a georgian, and his son was georgian because Stalin was georgian. Amis writes all this with remarkable effect.

Finally, I loved reading this book. Towards the end, Stalin dies. Beria is executed. The book has a slight documentary feel to it (has ample tables on wartime figures and maps but maps are tacky). Not quite as colourful and expressive as Martin Amis's piece (Koba, the dread). But Prof. Overy has changed some of the impressions I had about the war and also showed how the Russians actually did it. It challenges some myths about Stalin, particularly the manner he bowed to several of his generals' wartime decisions in time of crisis. The end is poignant too. Final assertion in the book is that even as the war was won and the soviet people had a momentary sweet taste of victory but the despotic oppression lasted much beyond the war.

Saturday, June 4, 2011

Development with pre-requisites and source code security.

Admittedly running a risk of combining two different subjects but they are so linked. One is development with pre-requisite, compiled libraries and other, source code security.
I want to write something on some of the problems arising out of having an extra large codebase, large and geographically scattered development teams. Certainly, we are looking at a distributed version control to help us out here. But I want to see what an individual developer needs to do to accomplish the daily bread-and-butter i.e to read and develop programs, build and test. This is particularly about individual source code management, ease of development and source code security.

Simple things first, how does a developer working on a small piece of that monstrous 4 million lines of code actually build and run the application on a measly desktop ? The actual run-time size of the application is much less troublesome as the time to download the code every morning because I have no idea which other developer has modified what other parts of the code and how it impacts my own code. Basically, there could be so many scattered changes coming in daily over the entire spectrum that the entire codebase on my desktop needs a refresh. Things go out of hand when you now consider that each developer will compile, build the entire code for testing any part of the application. The daily build time for each developer is the biggest blow here.

But really, each developer should just download and build code that is being touched by his/her team and now have to worry about all other code and treat it like a black-box. The UI development team almost never touches any code in Application Framework. There is a clear separation of concerns among those development teams. For testing, one needs to entire set of executables.

I have encountered this situation in my prevous stint working as a CAD-PLM application developer, for code written in C++. The method employed there is what I want to share.
We need a really sturdy server with a shared (and read-only) directory, that has a complete and "best-so-far" version of executables. A build machine does a daily build of the entire source code that is checked-in by the EOD and places all executables in this shared directory. We then have all developers map/mount this shared directory to some local directory.
Given this, the developer only downloads code that his/her team deals with and provides the shared directory as a 'include' and 'library' path after the local code. The compilers as well as the runtime environments happily work with the source code on local machine and pre-requisites on the shared path.

An intentional side-effect of this is that not all developers can see all source code, although they have all the executables to play with. The source is neatly divided into sub-directories containing code that belongs to a logical functionality. The version control system can explicitly disallow all but some users from checking-out certain such directories.
Why is hiding source sometimes desirable ,in today's computing world ? I can easily imagine a piece of source code that is truly break-through, with patented algorithms, technologies and what not. Then it follows that the organization does not want just "any" developer to get his hands on the source code - for securing it against tampering as well as source-code theft. What else but source code, is the most important asset of a software development company ? We have had several cases so far, where a sacked, disgruntled employee went out to sell his company's intellectual property for cool bucks. See reports here and here. After such incidents, those companies must have gone overboard, in searching employee baggage on each transit, for any smuggled electronic media.

All this might seem banal but if things do not get arranged in this manner to begin with, it is difficult to pull it back clean.

In the next few entries, I will explore a software licensing mechanism to control software usage, that will also hinge on some source code security. It all depends on some parts of the source code not bing readable by anyone. Secured source code is such a good thing.

Monday, April 25, 2011

XML file comparison

Another problem is to compare two XML files with the same structure and bring out a result that spells out the differences in the two files. Again I am set out to talk on a solution more involved than the simple one we actually chose to implement.

Simple approach :
A simple approach is to start by referring to the topmost parent node in the code and try to find it in both the XML files. If the node is found in both the files, then no difference found at the first level and we can proceed to the next deeper level of nodes to query. The actual XML tags are thus hard-coded and the same process can be repeated. If it is not found in either of the files, then the first difference is found. We can note this difference and then generate some other structure to describe the difference or a "delta". We can call this as the difference structure. If the XML is a representation of a relational database then this difference structure can then be used to translate into a set of RDBMS queries that equalise the database contents. Thus it is quite plausible that the difference structure is a linear structure rather than a tree-like XML.

Drawbacks of the simple approach :
The logic to parse the structure is inextricably mixed up with the logic to generate a difference structure that is a linear one. If the XML structure changes, the parsing logic also changes and the changes in the delta generated also follow. What would be ideal is the parsing of the nodes be independent of what the XML is about and just the actual difference structure (and subsequent RDBMS queries) generated be dependent on the actual differences in the XML. Its the changes needed to the parsing code that is a matter of contention. I think we can do better.

A better approach :
As it usually happens in computer algorithms, the better approach is more complex than the simple ones. As an aside, the situation is quite different when dealing with mathematical statements and proofs - the better proofs are shorter and have an "elegance" quality in them, in the long run, they prove to be more readily intuitive.
Ok, here we should let the coding logic be independent of any actual XML tags. We actually create a nice array of strings that is multi-dimensional and "hand-write" the XML tags in them. So If the XML has a hierarchy that goes two levels deep, then the array will be actually a list of a list of strings. We just create a string structure that mirrors the XML. This looks tedious but is way simpler than coding that structure for parsing into something the compiler finds acceptable.
The parsing code traverses this string array and treats it as as a descriptor to the XML. It does the same for both the files and if it now finds a difference, it will proceed to generate a in-memory linear difference structure. A separate piece of code will translate the differences into something like a set of RDBMS queries as required and the important fact is that this query generator is strictly different from the parser.
If the XML structure changes (or more likely, extended), no change is needed to the parsing logic. Just the hand-written string array should be changed to reflect the changes. Some skill involved in coding the parsing logic but once done, this approach scores over the earlier one on counts of maintainability and extensibility.

I see parallels in this approach and this is one that could be described as a "data-driven" approach (my term, my quotes). The actual code is only like a markup processor, that is very generic and simplfied. The key point is not that compilation is avoided but that the changes are not in code but in an array-like stucture of simple strings or values.

XML specification and duplicate tag processing

XML has been long touted as a very promising method for information exchange. Some count it as too verbose and doubt how efficient XML turns out to be if the information is voluminous. However, XML still reigns as the most widely accepted method to convey structured data, in a human readable form, for which parsers are widely available and one that is extensible.

One pattern of usage was noticed at my work product : Referring to another tag, to copy content -

A huge XML file that carries product control configuration of the entire application is usually being edited by humans. It basically stores configuration properties for various services that run as part of the product. What should we do if there are multiple duplicate services and they essentially have the identical properties ?

For example -

<Top_Parent_Node attr1="val1">
<Service_Node attr2="val21">
<Prop attr3="val3">
....
... Complex set of enclosed tags ....
....
</Prop>
</Service_Node>

<Service_Node attr2="val22"> <!-- duplicated service tag : we need this for the application -->
<Prop attr3="val3"> <!-- Forced to repeat this from the previous tag -->
....
... Complex set of enclosed tags ....
....
</Prop>
</Service_Node>
.... More such repetitions ....
</Top_Parent_Node>

The simplest way is to repeat the properties at both locations by copy-paste. We are rather good at that.
We, however, screw up miserably when it comes to propagating changes to one set of properties to all other identical locations.

I have a suspicion that this is a common situation that others run into as well. Which makes a good case for formalising this requirement in the XML specifications itself. The XML specification should allow a choice - either specify tags or make a reference to other tag that will be as good as copied into this tag while parsing.
For example -

<Top_Parent_Node attr1=val1>
<Service_Node attr2=val21 ?xmlref="N1" > <!-- Label this tag as a reference -->
<Prop attr3=val3>
....
... Complex set of enclosed tags ....
....
</Prop>
</Service_Node>

<Service_Node attr2=val22> <!-- duplicated service tag : we need this for the application -->
<?xmlref="N1" /> <!-- No need to repeat - referred label is treated as copied -->
</Service_Node>
.... More such repetitions ....
</Top_Parent_Node>

Few points to note :
- Only one place where entire spec of a node that will possibly duplicate resides.
- Any changes made to one place will reflect in all other places which refer to it.
- The first Service_Node, that carries the complete spec is labelled in a unique manner. This label is part of the specification and any node can be labelled in this manner. Thus it need not appear in any dtds or xsls as an available attribute.
- Any node can refer to this label by enclosing a <?xmlref> with a label identifier. The parser should copy the entire specification within the referred node into this node.
- The referring node and the referred node need not be in the same hierarchy or tree depth. The parser should deal with a referring node appearing before the referred node in the file. This is to keep the XML parsing independent of ordering. If the referred node is not found, the parser should throw an exception. I can see that DOM parsers can handle this in a straightforward manner. The SAX however should need to parse to the end in search of a referred node.
- I don't quite see the possibility to provide partial overriding capability to this idea without unnecessarily complicating the idea and obfuscating the XML specification.
- The fact that integrity is maintained easily with changes to the spec gives some credence and value to this idea over the fact that readability of the XML is somewhat hampered.

When I encountered this problem at my workplace, I must say that the problem was solved at the application layer, i.e- a new tag was added inside the duplicates to refer to the other node. It was a simple hack to the problem but it seems not to solve the problem but rather work around it. As you would have known, this is what happens in a commercial context under time pressure.

Friday, February 4, 2011

Software development (requirements and design)

Two strictly random thoughts on two aspects of software development - Requirements and Design follow. Just something to play with over the weekend -

1/ Systematic and scientific approach to software development instead of intuitive, programmer driven.

An intuitive, programmer-driven approach is a typical one taken up by programmers, that is heavily based on experience in solving similar natured problems and also on observations of other programmers' practices. Programmers are very trained to follow patterns of solutions. Most of the time, it goes OK, since the thought process has been reviewed by several programmers and has a proven workability. However, what if a fresh look is taken in some situations where the chosen solution is only based on tried and tested formulas ? How about deriving a solution based on a totally rational, logical and scientific approach ? Can it find some gaps in understanding, find some other, hitherto overlooked determining factors ? Most research in academia takes this approach and does succeed in finding better solutions than those in the industry. The industry is sometimes too focused on finding cost-effective, workable and time-bound solutions, to the detriment of doing something that will prove to be profitable and efficient for the long term. The hard, complex problems in software suffer in this regard much more than the run-of-the-mill kind of software, such as the CRUDs.
As a sunshine industry, software is still evolving, with new practices being thought of and proposed at regular intervals. It should be a place full of such opportunities for betterment.

2/ Allowing maximum configurability but not exposing it to users.

Customers use software built be developers but they usually end up complaining more often about features not implemented according to what they wanted and less often about things that don't seem to work correctly as per specifications. If something does not work correctly as per specifications, its a clear bug and the developers feel obliged to make the corrections. However, the point about missed specifications, misunderstood specifications and conventions is not so easy to rectify.
Can a solution be to allow for configuring everything and anything that is feasible ? Users will be quick to complain about huge configuration choices and complicated installation tasks. So the configuration out-of-the-box is chosen to be a vanilla, typically acceptable one and not exposed to users at all. If some user needs something different that how the software behaves, we will need to customize and configure but we will be very likely to find some way out without making code changes since we just made it as configurable as possible, even when we never expected the users to change it.
When I thought of this, I was really thinking about issues faced with various customers and the varied preferences each one has with respect to the same software. Some customers could complain about software components needing bi-directional access in network ports and interfering with firewalls, others cribbing about GUI of administration consoles and the layouts. Somehow, it is a bad bet for the developer to make a design choice, code something accordingly, and then face urgent situations because of making those choices if some customer expected otherwise. Even if we take care to implement according to some well-known standards and conventions, are we in a position to deny something to a prospective customer (and maintain same specs) if they don't like how it behaves ?

Friday, November 26, 2010

Software Product Features

Software Product Features, Qualities and Best Practices

This document is about the desirable features a software product should have and the practices to be followed during its conception, design and development. They are, in a way, guidelines at a most generic level, applicable to most software. Certainly, this document is incremental in nature.

For a diverse, dispersed team trying to develop software, it is important to follow certain common practices, so that the integration phase is simplified in the later stages. This document aims to just lay out broad principles and techniques for achieving that. I just gathered several software features, processes, tools and techniques that I have found impressive. I need to expand on each section in much more detail.

The various design and coding issues that are to be taken care of are listed below along with their desired solutions/techniques,

A. Platform –
Hardware as well as Software infrastructure should be chosen carefully to satisfy all functional or other (financial…) constraints. For now, we consider only PC desktops and open systems. The software itself should be developed to be compliant with widest possible range of infrastructure. The common way to achieve this is to use conditional compilation possible in C++ like language to allow different versions of platform specific code to run without changes on different platforms. To maintain portability for C++ like code for instance on 64 bit OS, it is imperative to use explicit type definitions for data types of different sizes. If any third party software is being used, such as RDBMS, it is preferred to use a wrapper library, such as SQLAPI. For XML parsing, it is certainly preferred to use Xerces. Follow common denominator among database standards like for an RDBMS, using ANSI SQL data-types and wrapping any table/column names by appropriate prefixes and namespaces to avoid name clashes and conflicting with keywords in a particular database.

B. Architecture (assuming PC-based software!)
It could be web-based, Desktop (n tier) or custom (both). The first two should be clear. The third requires explanation. As usual, we try here to separate the engine (business logic) from the UI part. The UI part is further separated into ‘Content’ versus the ‘Presentation’. The Content expresses the widgets (components) to be used and also their placement on the screen. The presentation takes care of converting the content in a format suitable for the medium. Thus, given a screen, the Content can be expressed in the form of an XMLdlg (XML document specifying the dialog look and feel). The web presentation layer is then a generic code to convert the XMLdlg to HTML, suitable for browser display. The desktop presentation is again generic, and converts XMLdlg into MFC/ Xmotif API calls.
Most ‘saleable and business’ software is of transactional nature. The business logic generally consists of certain objects going through phases of operation (like guests doing check-in, stay and check-out in a Hotel mgmt software). I propose to keep the business logic generic enough by making use of the Workflow infrastructure, so that the main engine could be reused for many other applications once done. If a client-server/distributed architecture is involved, it is better to observe restrictions like – Not having server call-backs on client for compliance with firewalls, SSL etc. Use distributed components to enhance scalability and fault tolerance (through replication), load-balancing (also through replication). Consider service oriented architecture for generic interfaces. Create clean, terse and generic interfaces, about which clients can query and find out. Importance of “loose coupling and high cohesion” cannot be understated. Consider deployment constraints like - blocking of ports, physical memory, utilization of any resources like multi-core processors, threading models, and be scalable.

C. Framework
A common framework binds the various application components together and provides a consistent set of rules/coding conventions that each part of the application can follow. A good framework should support – wrapping functions (for platform specific areas, threading/locking, middleware, file-systems and other applications like databases), support component creation & extension (like COM, CAA), and help integrate well with third party software (like SSL), provide utility functions (like math libraries), central and localized logging engine.

C. Configurability
This refers to the ability to quickly change certain aspects of the software according to the customers’ environment. The usage of XMLdlg as mentioned above, results in Configurability of the software to web-based or desktop-based without changing any code at all. Other practices for ensuring maximum Configurability are commonly known ones, like not hard-coding essential parameters of operation (Server names, database ports etc.). Using plain-text Settings files is ok, but preferred is XML.

D. Localization
The product should allow complete user interface (All visible text) to be in any local languages (Hindi, Marathi etc.). The common technique to achieve this is to set up text files (Nls files), which map ‘Keys’, used in code, versus the ‘specific language text’. Each language will have its own Nls file. The local language for the software should, in turn, be configurable! The software should not assume common conventions like e.g.- the decimal point. Certain languages have ‘comma’ instead of ‘dot’ as decimal point. The suitable data-structures are programming language dependent but should be Unicode capable.

E. Customizability
It adds more control to specific behaviour of the software. For e.g.- one banking customer could have two types of accounts (Saving, Current) and some other customer might have four types (Saving, Current, Salary and Fixed Deposit). There is no reason why the first customer should see all the four account types while it in the process to create new bank accounts. The software design should be such that no code changes should be required for such customization. Again, read customizable information from a text file, not hard-coding.

F. Licensing
It differs from Customizability or Configurability. It is the ability to have different versions of the software like Light Demo version, typical version and Full-featured version for different customers. Further granularity should be added, like enabling specific functions to certain customers by the facility of ‘environment flags’.

G. Code placement (directory hierarchy)
Related code should go into specific directories (frameworks). Code of the same functionality should go into a single directory inside a framework (module). Each module should compile into a .dll or .so (or a Java package jar!). The interfaces to be shared between frameworks should be collected in a directory at the framework level of the hierarchy. We may need to create special make-files. The build should allow maximum granularity and flexibility in packaging.

H. Testing
The Unit testing should include creation of automatic regression tests. Testing scripts or code should be gathered in its own test framework. For automatic test, typically a shell or batch script starts a separate Test program executable. The executable contains calls to the functional code. The test results in an output trace. The shell script then compares the output trace to a reference trace, known to be correct output. If there is a difference, the test is failed and regression is automatically detected. Such debug tracing should be enabled at the time of doing tests or debugging, not in production. If UI is to be tested, this facility will need support from within the UI wrappers/framework classes which will run in a special mode and not accept any User inputs while tests are being run. Only the recorded UI inputs can be re-dispatched by the test UI to the functionality.

I. Accessibility
It is recommended to maintain compatibility with an accessibility aiding software like JAWS, popularly used by visually impaired persons.

J. Scripting Engine
A scripting engine is vital for some applications, in which either admin operations are involved or if a large number of inputs are required for an operation. This is even more important when UI is not feasible for some operations. A Javascript compliant scripting engine can be used to expose script objects to user scripts, referring to the Mozilla projects’ Javascript engine.

K. Version Control

1/ http://en.wikipedia.org/wiki/Distributed_revision_control -
Main advantages - a/ Some project members can have privileges to decide on what changes to merge.
b/ Web of Trust - Changes from many repositories can be merged based on quality of changes.
c/ Network not required all the time - Check-ins can be into a private workspace/View.

Some open source software is available which will fulfill main requirements of distributed development (version control, release management) & security.

2/ The entire application source code can be split into workspaces, that can be logical partitioning of source code (organized by directories) that each developer can work on. Each workspace can be a set of directories, for which the developer can check-out source code. The source code is not available for all other directories, for which the individual developer has no check-out privileges. Therefore, the developer must get all executables/libraries for the protected source to be able to build and run in local workspace.

3/ Intellectual property/unauthorized usage will have to be additionally protected by licensing.
Licensing can be done in two ways -
a/ Simpler is to have environment flags, which are read and verified by the application to detect authorization.
b/ Some licensing utility can be built, that generates a license for a developer machine. The source code protected part of the software will check on the license before allowing an application startup.
This license checking component of the application (like other protected source code) will only be available to all developers as libraries/executable.

4/ Options -
a/ CVS / Open CVS - Open source, Client-Server - http://www.nongnu.org/cvs/
It has emphasis on security and source code correctness, web interface available as well desktop.
b/ Subversion - Open Source, Client-Server - http://subversion.tigris.org/
Offers directory level security, It's not strictly distributed, can use Apache/HTTP as web server & desktop.
c/ Git - Open source, distributed CVS - http://git-scm.com/
It offers desktop & web hosting.

L. Release Management
Also see section on “Version Control”.

M. Report Generation
TODO - Can we have configurable report generation ?

N. Documentation
This needs to be done for 5 areas/users - Interfaces, Code, Users, Administrators and Implementers. Create Release Notes for a release and a Pre-Release Bulletin. Follow the coding standards needed by ‘Doxygen’ or some other common tool.

O. Logging
Logs need to be generated for the Administrators and Users principally and support enabling of special DEBUG logs for bug fixing with developers.

P. Administration console or UI
For use by the administrators, offering authenticated and full view of the system. Should be able to configure it at runtime as well as install and uninstall just based on this – from the cradle to the grave!

Q. Security
Integrate with other authentication mechanisms like PAM, LDAP, SSO and NTLM.

R. Programming Language
It should be garbage collected, efficient and clean, which focuses on minimizing the pain for the developer, and makes it easy to code. It should allow the developer to focus on the algorithm than make a mess with low level management and intricate syntax. There may be multiple languages used as per suitability and the functionality that is being offered by that component.

S. Build Management
Daily builds. Multi-level workspaces. Limited workspace view for each project and for each developer. Views and check-outs to workspaces authenticated on the developer roles. Unit and Integration testing at each level in the workspace hierarchy. Automated tests to run in highest level daily.

T. Quality Metrics
Metrics based on proportion of failing automation tests (memory leaks, memory overwrites and null pointer checks if applicable). Test quality based on Code coverage. Performance tests to detect performance regression in addition to functionality.

U. Aspiration stuff
Some very advanced features … User Exits and Public Interfaces! For customers, who are programmers themselves and who want to build upon the existing product or make acute customizations (and other reasons), User Exits and Public Interfaces should be provided. User Exits are calls that the existing software makes to code written by customers. It allows extreme levels of customizations. Public interfaces are interfaces placed at the Framework level. These interfaces are implemented by the vendors and provided to the customers (along with usual libraries). The customers can compile their code, which makes calls to these interfaces.
There is a need for a special tool to record automatic regression tests, which involve the UI. The tool should be able to detect if the software is being run in a Test mode (as opposed to an interactive mode). On such occasions, it should be able to create a trace dump of the UI for test comparisons as stated earlier.
Another possibility is interfacing with Scanning or OCR hardware, which allows direct data entry?

That is all for now. The key thread running through the whole document is the indisputable need to invest enough into design and architecture so that the involvement in improving / customizing / bug-fixing the software is kept to a minimum. Some of the techniques mentioned above are well known, while others are a product of experience and a bit of imagination. These guidelines will drive further software development.


Hrishikesh Kulkarni.
20th Jan 2007

Last Updated – 8th Dec 2010
Updated – 26th Nov 2010
Updated – 11th Apr 2009
Updated – 4th Feb 2009
Updated – 21st Jan 2007