Thursday, October 21, 2010

The man who loved china - Simon Winchester

I just finished reading a lively biography of Prof. Joseph Needham of Cambridge (the old one). Titled "The man who loved china", written by Simon Winchester. Winchester describes the life and work of the Prof. Needham in manner that kept me hooked on to it for the whole week, until I finished it. He is plain blunt at times, but always colorful in describing the life and times of Prof. Needham.
I hadn't heard about the 'sinophile' professor before I read this book. But the man is most remarkable. The professor began as a researcher in bio-chemistry at the Caius college, Cambridge. Prof. Needham was eclectically intellectual, charming, outspoken and eccentric. When he had made his mark in the top-line research in his area in his early years, he shifted track. Enticed and egged on by his chinese mistress, he learnt the Chinese language and dabbled with calligraphy. He undertook a dangerous mission into China at the height of japanese invasion in the second world war. He pursued a deep study of China, its history of scientific development and eventually creating systematic and detailed notes. 24 large volumes of the scientific and intellectual history of China were written and published by him. He smoked cigars, researched, ventured, lectured and lived a life of fruitful writing till the very end.

He lived a long life, but I found the sheer volume of his research and intellectual output staggering. His roving eye for pretty women did not waver even at age 95. The author tells that at age 94, when the professors wife died, he married his chinese mistress, after having been in courtship for more than 50 years. When she too died, he made proposals to three other women and all of them politely declined.

I got glimpses of the life of the 'dons' at Cambridge university from the book. It sounds like utopia for the serious researchers - wonder if the IITs are a poor imitation.

The sad part is that Prof. Needham was a socialist (but not a communist according to the author. I tend to disagree). He was right from the "red's nest" that cambridge seems to have become prior to 1950s. I am dismayed by the fact that communism had so much of appeal to the intellectual class, in that period of history. Its not as if the murderous excesses of the Stalins and Maos were unknown to the world at large. But their heads were buried deep in the sand. Prof. Needham was found to be an unequivocal and fanatic supporter of almost everything that Chinese government stood for. He was certainly charmed and in love with chinese history, science and language, but mistook the voice of the chinese communist governments as the voice of the chinese people.

There is also a fact that the professor could not really get to the analysis of why chinese contemporary science lagged behind so much, even as science in its history was so far ahead of europeans. One notable cause was that chinese did not develop a competitive mercantile class who needed to innovate to stay in business. The biggest ambition of medival chinese youth was to join the corrupt and burgeoning bureaucracy and in that manner, earn the security and continuance of the government establishment. Justs struck me that this idea runs parallel to the same trend in maharashtrian youth. And the lag of marathis among other communities is also evident.

Prof. Needham has published the most comprehensive and systematic treatment of chinese scientific history. I don't think I would have the stamina to read up on all of those 24 volumes, but Winchester's byte sized offering suits me great. Its something similar to what I thought on reading "Koba the dread" by Martin Amis. I could not imagine myself reading all the volumes by Solzenitsyn on life in the Gulags. Amis's work covered all of it in a far more colourful and concise manner. Compilations from research are needed but commentaries on those same research subjects, if well-written are much more worth my time. A big thank you to the Amis's and Winchester's.

Saturday, October 2, 2010

Project Euler problem involving optimization

This is about my try at a moderate level programming problem from Project Euler, problem # 15.
With usual approach (and attitude), I thought I should be done in 15 minutes on this one, but stretched me for six times that estimate.
Problem is about finding the number of routes through a n*n grid, starting from top-left and ending at bottom-right (for n=20).

My sequence of attempts -
1/ Simplest recursive function that traversed right and down and counted routes on reaching destination. Too bad, it took unacceptably long for n=20 case. In retrospect, I should have avoided coding this approach, knowing the size of inputs.
2/ Making a recurrence relation, with the hope that I would not even need a program to calculate this. That is, if the problem of size N can be expressed in terms of problems of (size < N) and the if I could get a closed form, simple calculation would have done it.
I got this -
For a grid size (m*n),
T(m, n) = T(m-1, n)+T(m, n-1) ..... m <> n
2*T(m-1, n) ..............m==n

For the case when m==n, we have utilized the symmetry : T(m-1, n)=T(m, n-1).

Bringing this recurrence to a closed form is beyond my technical means today, so I modified the code to follow this recurrence and calculate. This too failed since the size n=20 was overwhelming my 64-bit dual core server even then.

3/ After almost giving up after some frustrating debugging and some old fashioned pen-paper workout of the recursion, I wanted to create a cache of T(m, n) values that were getting calculated. Certainly, many values were being repeatedly calculated in various "branches" of the recursion. In this case, the cache needed was no more bigger than n*n=400. So now the code only calculated a T(m, n) if it was being caclulated for the first time and re-used that.
Now the code was terminating in a flash even for large input sizes, but throwing wrong answers.
4/ The last mistake was about the return data-type of the function, that needs to be kept in min for almost all Euler problems. It was a silly 'int'. C++ did not warn of truncation, never mind what the compiler option was.

So I have a few things to take home today, caching at appropriate places being the most important of them.

Wednesday, September 1, 2010

long time


Just made it a point to put in a snap of my little princess here, born on 2nd Nov 2009 ...

Saturday, June 20, 2009

File locking protocol with only shell scripting primitives

Writing this post out of some faint feeling of guilt and disappointment of not being able to keep up to the writing habit and discipline. It happens to be my first post ofthe year 2009. A lot many things have happened over these months, but as usual,the more things change, the more they seem to remain the same. So I jump directly to the point that pushed me to post this.


Encountered a problem today, regarding certain shell script running on a linux box.The script could be invoked by many users simultaneously. Now the problem is that the script used to write some text to a file. The changes could be updates through a redirect ('>>') or even deletes through 'gnu_sed'. As you might have guessed, the file being written to by the script was common to all users as well. I was looking for a way to serialize the 'write' access. Several incidents showed that multiple concurrent users eventually messed up the file contents through the concurrent shell script invocations.


Now, someone suggested a file locking mechanism. Each invocation of the script will lock the file and other invocations wait till it is released. I don't know for sure, but there is no shell interface for locking some file. I am aware of seeing something like that interface only at the system/C programming level through fcntl/flock calls. I had to think for a while before getting to the following algorithm, that is purely based on shell scripting primitives and guarantees a predictable first-in-first-out ordering. It also ensures that starvation cannot occur. Deadlocks are out of question, with there being only one resource to lock.


Here it is,
1/ Lets assume that a shell script (S) modifies the file (F) and we have a set of processes {P0, P1, ... Pn} that can concurrently invoke S to modify F according to some arbitrary logic.
2/ We make a kind of wrapper script (S2) to control access to S and add some logic.
3/ S2 works like this,
It creates a temporary file in the same directory with a name whose uniqueness is guaranteed e.g - a file named lock., pid - known quantity of the process P.
4/ Next, it checks for the creation dates on all lock.<*> files in that directory.
case 1 : If it finds that the date on its own lock file is earliest, it can invoke the script.
case 2 : If it finds another lock file with an earlier date, it sleeps for a configurable duration and repeats step '4/' once it wakes up. If the dates of another lock file co-incide and they are earliest among all locks, we can make the process with the lower pid the winner. The higher pid process waits and repeats 4/ after sleep.
5/ S2 finally invokes S which does all he work in exclusive mode for 'writes'.
6/ Delete the lock file created by self (having the own pid) after done and S returns control to S2.



We can assert that the FIFO ordering will be maintained, ensured by having the creation dates as the decider.Starvation is prevented by having the processes first create a lock file and thus marking the availability for ascript invocation.A process also has the luxury of relinquishing lock on the file by simply deleting its lock file, regaining lock by simply re-creating the lock file when it needs and queueing up again for access. If a process wants to justread the file, we can allow the script to make a copy of the file and release the lock as soon as possible.



Issues -
1/ The script S should be 'well-behaved'. Otherwise, it can simply use 'exit' to finish the processinstead of returning control to S2 for cleaning up the lock. file of that process. Ifany such pid file is left out in the system ,we can end up with a total starvation because noprocess will find its lock file better than the dead process' lock. A simple workaround is thatif a process P finds that is is deprived of a lock because of only one other process P, it verifies if the process who owns the lock P is still alive with the 'ps -ax' commands.If the process P is not alive, this process P can delete the lock file created by the dead process and assumeto be owner. Only one process will do this check because only one process will be in the 'second' position from gaining access.



2/ Another problem is much tougher to crack. The overall response/efficiency of this locking protocol heavily dependson the sleep time for the processes. The process has to go to a sleep if it finds that the script is already locked. We need the sleep time to be equal to the approximate time that the script S takes to run. If the sleep time is too low, processes will spend computing resources checking locks too frequently, only to find that theyhave to continue sleeping. If it is too high, the processing is under-utilised. In any case, the sleep has to happen, since I cannot find a way that the OS notifies the sleepingprocesses with this scheme. The OS does not participate in any locking here so it cannot help. Overall, If the requesting processes are assumed to be arriving frequently and steadily, then I will prefer having the sleep time on the higher-end. This is because once enough processes are sleeping, there will be a steady state when enough processes keep waking up and continue processing without the under-utilization problem. On the other hand, if the incoming processes are bursty in nature, then I think the sleep time needs tobe kept on the lower-end.
The above protocol certanly solves the central problem mentionned in the link, http://members.toast.net/art.ross/rute/node24.html.


- QED -

Friday, September 26, 2008

Challenge problem - From a classic Steven Krantz book -

I have simply been unable to crack this one, particularly the general version of this problem... any hints ?
You have six points on a piece of paper. Every point will be joined to every other point by either by a red line segment or by a blue line segment. You can colour any line in anyway you choose. Show that there will have to be either be a triangle that is all blue or a triangle that is all red.
Formulate a generalization of the last problem. Suppose that you have k colors. How many points are required to guarantee that the process of joining all possible pairs of points with line segment segments of one of these colors will guarantee that there is a triangle of just one color?

Saturday, September 13, 2008

Interesting problems - first batch

Housie Magic -
This one arose from a party in my company, where we played Housie. The game goes like this - Every participant is distributed Housie tickets, each of which has 15 random numbers (distinct on a given ticket) printed on them. Numbers are between 1 to 100. Next, random numbers in the same range are selected and the numbers are called out. All participants who have the number on their ticket cross them. The first participant who gets all numbers crossed on their own ticket wins, and gets the prize.
Certainly, all numbers on the ticket are random and those called are random too. So everyone has an equal chance of winning. But there is a twist .....
What if you are allowed to take up more than one ticket ? You then get a bigger range of numbers to cross. Does that increase your chances of winning the prize ? Remember that to win, you need to have all numbers on any single ticket crossed.
Comments welcome ...
Sorting -
I ask this one for interviews and get varying answers... Given an array of integers (of size N), what is the most efficient algorithm to find out the 'K' largest numbers ? Note that K <= N, K > 0.
We don't want the largest K numbers to be sorted by themselves, just find the largest set. Is there a way to find the K numbers in less than O(N*K) ?

Wednesday, May 7, 2008

IPL - Manoranjan ka Baap

I and at least the few gentlemen around me have begun to wonder when the IPL circus is going to end ... Not that I don't like cricket - I have watched matches even in my university exam days. With IPL, its like if you love chocolate cake, you are asked to eat it everyday for two months. The organisers should have known how cricket tournaments tend drag on and on.

I watch cricket to get a sense of competition, aggression, and the drive to win among the teams. To give you an idea, remember the classic moments - Javed Miandad's last ball match winning six, Sachin's consecutive Sharjah centuries against Australia or Kluesener's 1999 world cup exploits. But the overwhelming feeling I get while watching IPL is that of a money-spinning exercise, the junk food of cricket. You don't get to savour the chocolate cake, because you are made to eat it rapidly before another helping arrives.

Saving grace - IPL's all purely market-driven. If it becomes unpopular, it won't sustain even with Cricket Board's backing ... Nobody needs to kill it... like dead rubber Test Matches (imagine a Test match played between India - Sri Lanka on flat, dusty wicket). If spectators stick with it, it will live, else it will die on its own.