Reflections and Novelties: April 2020

I wish that we, Haskell programmers, never fall into the ditch that Alan Perlis had imagined when he said that a functional programmer was someone who knew the value of everything and the cost of nothing. Haskell is a compiled language and runs with nearly as nice speed as C++. When I am learning Haskell, I do not want to take up learning a new language for merely the promise of elegant and short expression, but for comparable run time performance too. Ability to write concurrent processing code with no more difficulty than a single threaded program should be great for improving runtime performance. Over and above, it should also help deliver software with lesser development time and definitely lesser number of bugs. Haskell as a pure functional programming language promises all of that.

However, I would be far less obsessed with Haskell if I can't find a way to create software with superb runtime performance. Now, Haskell can show surprising runtime behavior and can shoot you in one foot and blow up the other one. For larger pieces of code, it can be tricky to predict runtime performance because of lazy evaluation and non-strict function arguments. I ran into such problems while understanding a very basic Haskell function that helps 'reduce' a list or some complex structure into a single value - "fold" and its variants foldl and foldr. An excellent introduction to the folds as well as comments on efficiency are written here. This writeup adds to the explanation with that wiki. Lazy evaluation is known to consume a lot of memory while holding references to pending computations so understanding how to use folds is critical.

To understand folds, it is necessary to also understand the two concepts of lazy evaluation and non-strict function arguments. Both of them are quite unusual for any programmer born and bred in C++ or Java.

Lazy evaluation - Haskell lazily evaluates expressions implying that it will do no processing until and unless the programmer needs to print the result to the console or save it to the database or some such thing. Further, Haskell will only evaluate to an extent the printing requires and no more. So it is possible to declare and manipulate an infinitely sized list of objects in Haskell as long as you require processing only a finite number of them -

sumOf10Primes = sum (take 10 primes)
    primes = [x | x <- [2..], divisors x == [x]]
    divisors x = [d | d <- [2..x], x `mod` d == 0]
    -- Dutifully prints 129

    last [1..]
    -- gone forever !

Non strict functions - The lazy evaluation described previously is made possible due to the non-strictness nature of some of the functions involved. The function 'take' did not touch the second argument it was passed the moment it has done returning 10 numbers. 'take' does not evaluate its second argument strictly, though it is strict about the first one. Basically, if a function is passed 'undefined' as one of the arguments and still sometimes completes successfully is non-strict in that argument.

Folds - Finally coming to the pitfalls for a novice programmer when using foldl or the foldr variant. The following piece of code taken from this wiki exhausts available RAM

foldl (+) 0 [1..100000000]
  -- Sum to 100M, are you crazy ? At least on my Ubuntu with GHCi ver 7.10.3
  -- bloats memory, freezes GHCi for several seconds even after a ^C...

  foldr (+) 0 [1..100000000]
  -- foldr doesn't help either ! 
  -- A stack overflow exception is seen fairly quickly.

Well, both foldl and foldr failed but in apparently different ways. foldl tried to allocate memory for its so called thunks - unevaluated chain of functions piled up in memory. foldr exhausted stack while doing constructor replacement and not being tail recursive. A different implementation of the foldl described in the wiki works well in under 2 seconds, even for 100M numbers

foldl' f z []     = z
  foldl' f z (x:xs) = let z' = z `f` x in seq z' $ foldl' f z' xs

  sum' = foldl' (+) 0

Here, foldl' worked by suppressing lazy evaluation (or if you prefer, by forcing eager evaluation), through 'seq', a primitive function. Otherwise, foldl' is very much like foldl. what about foldr ? It works from the opposite end of the list and maintains the initial order of the list in case the output of the fold is another list. Also, foldr can short circuit evaluation in certain conditions but foldl is condemned to process the entire list before returning anything, always. This means foldr may be used to process an infinite list but foldl can never be. There are some intricacies in how the 'seq' function, it won't work as expected if used incorrectly but that is not the topic for now.

Here is an example where the short circuiting while using foldr makes a helpful difference.

foldr (\e a -> if mod e 10 == 0 then 0 else (mod e 10) * a) 1 [1..10000000]
  -- returns 0 instantly.
  -- Short circuting based on non-strict evaluation of the second argument 
  -- of the lambda function. 
  -- It works only because of the way the lambda is implemented.

  foldr (\e a -> (mod e 10) * a) 1 [1..10^7]
  -- Even foldr does NOT work well here. 
  -- The lambda "uses" both its arguments.

Recommendations - So here we are left with three possible choices whenever we want to fold or reduce some list or a complex structure.

Use foldl' when - Performing constant memory consuming operation on inputs where short circuiting is not going to work out. For possible infinitely sized inputs consider using foldr.

Use foldr when - Almost always if foldl' is not usable. If inputs are infinitely sized, or short circuiting will lead to better performance, this is only option.

Use foldl when - Almost never ! May be needed if there is some possibility that the input arguments are undefined and their evaluation can be skipped through lazy behavior of the function passed to foldl.

Consider this concocted example -

mult _ 0 = 0
  mult x y = x * y

  foldl mult 1 [2, 3, undefined, 5, 0]
  -- works and prints 0

  foldl' mult 1 [2, 3, undefined, 5, 0]
  -- blows up for working on undefined.

Haskell being lazy, garbage collected and coming from a C++ background, I personally prefer to use functions that have predictable and low memory consumption. Strictly evaluated foldl' serves such a purpose well. Unless of course, there is a good reason to use foldr or even the god forsaken foldl.

Reflections and Novelties

Sunday, April 26, 2020

How a simplistic pattern retains hold

Saturday, April 25, 2020

Folds unfolded