There is a very important distinction between programming language paradigms. Two kinds of programming styles stand against each other:
Imperative programming
Declarative programming
The imperative programming style has its origin in Fortran, and the declarative programming paradigm's roots can be followed back to Lisp. The following sections will compare the two styles and explain why we prefer one over the other in testing.
Imperative programming
Imperative programming has become the standard of programming methodology with the rise of languages such as BASIC, Pascal, C, and C#. It means describing to the computer how to calculate a result, in a way that you would describe how to make a cake:
// imperative programming
let double_imperative numbers =
let doubled = System.Collections.Generic.List<int>()
for number in numbers do
let double = number * 2
doubled.Add(double)
doubled
This code example will double all the values in a list by first creating a result list and then doubling it number by number and adding it to the result. Once it goes through all the numbers, the result list will be returned.
In this code, there is a clear path of how double values are calculated, and the program describes how this is accomplished.
Declarative programming
Instead of instructing the computer how to calculate a result, you can describe what the result is. If the result is too complex, break it down into parts and describe the parts and then concatenate those parts. It leads you to a series of descriptions instead of a workflow of what should be done:
// declarative programming
let double numbers =
let double = (*) 2
Seq.map double numbers
This code example also doubles all the values in a list, but instead of describing how this is done, the code describes what this means:
The double function means taking a value and multiplying it by 2
Result implies by mapping double onto the numbers function
The main difference between these two methods is that imperative programming describes how and declarative programming describes what. The imperative example created a state to bake the cake, whereas the declarative example broke down the problem into describable parts.
Declarative programming is less prone to faults because there is no mutable state. It is also easier to test as you break down the problem into isolated parts that are not only individually describable but also individually testable. This way, there is a high increase in quality and testability when you use declarative programming.
Let's look at another example. You need to parse roman numerals into integers. In an imperative programming style, you would go about doing this in the following manner:
// parse a roman numeral and return its value
let parse (s : string) =
// mutable value
let mutable input = s
let mutable result = 0
let romanNumerals =
[|("M",1000); ("CM" ,900); ("D",500); ("CD",400); ("C",100 );
("XC",90); ("L",50); ("XL",40); ("X",10 ); ("IX",9); ("V",5);
("IV",4); ("I", 1)|]
while not (String.IsNullOrEmpty input) do
let mutable found = false
let mutable i = 0
// iterate over romanNumerals matching it with input string
while (not found) || i < romanNumerals.Length do
let romanNumeral, value = romanNumerals.[i]
// does input start with current romanNumeral?
if input.StartsWith(romanNumeral, StringComparison.CurrentCultureIgnoreCase) then
result <- result + value
input <- input.Substring(romanNumeral.Length)
found <- true
// iterate
i <- i + 1
// no roman numeral found at beginning of string
if (not found) then
failwith "invalid roman numeral"
result
The following is the basic usage of the parse function:
> parse "MMXIV";;val it : int = 2014
The code starts by creating two mutable variables: one is a working copy of the incoming value and the other is a result variable. The program will then work on the input variable, matching the start of the string with roman numerals and as a numeral is found adding its value to the result and removing it from the input string. Once the string is empty, the parsing is complete and the value is in the result variable.
There are a lot of moving parts in this code, and it is hard to debug because of the different states that the variables can be in.
Here's an example of how to write this in a more expressive and declarative way:
// partial active pattern to match start of string
let (|StartsWith|_|) (p : string) (s : string) =
if s.StartsWith(p, StringComparison.CurrentCultureIgnoreCase) then
Some(s.Substring(p.Length))
else
None
// parse a roman numeral and return its value
let rec parse = function
| "" -> 0
| StartsWith "M" rest -> 1000 + (parse rest)
| StartsWith "CM" rest -> 900 + (parse rest)
| StartsWith "D" rest -> 500 + (parse rest)
| StartsWith "CD" rest -> 400 + (parse rest)
| StartsWith "C" rest -> 100 + (parse rest)
| StartsWith "XC" rest -> 90 + (parse rest)
| StartsWith "L" rest -> 50 + (parse rest)
| StartsWith "XL" rest -> 40 + (parse rest)
| StartsWith "X" rest -> 10 + (parse rest)
| StartsWith "IX" rest -> 9 + (parse rest)
| StartsWith "V" rest -> 5 + (parse rest)
| StartsWith "IV" rest -> 4 + (parse rest)
| StartsWith "I" rest -> 1 + (parse rest)
| _ -> failwith "invalid roman numeral"
The following is the basic usage of the function:
> parse "MMXIV";;val it : int = 2014
This code used an F# concept called active pattern and then used pattern matching to match the beginning of the string with the different roman numerals. The value of the parsed numeral is added by parsing the rest of the string.
There are no moving parts in this example, only expressions of what each Roman numeral is worth. The functionality is easier to debug and to test.
Tail call optimization
Writing expressive code is very powerful and gives your code a lot of readability, but it can also be hurtful.
The following will create a comma-separated string out of a list of values:
// join values together to a comma separated string
let rec join = function
| [] -> ""
| hd :: [] -> hd
| hd :: tl -> hd + ", " + (join tl)
This code will expand into the following call tree.
The execution lasts for about 6 seconds. In the second test, we run the join function with 1,00,000 items:
> [1..100000] |> List.map (fun n ?> n.ToString()) |> join;;Process is terminated due to StackOverflowException.Session termination detected. Press Enter to restart.
The result is a crash because there was not enough room on the stack for that many items.
The F# compiler has something called tail call optimization to deal with these problems. You can write expressive code and still have it running safe without the risk of the StackOverflowException exception. You only need to be aware of how to write code so it becomes optimized:
// join values together to a comma separated string
let join list =
let rec _join res = function
| [] -> res
| hd :: tl -> _join (res + ", " + hd) tl
if list = List.Empty then
""
else
(_join (List.head list) (List.tail list))
The structure of this function is somewhat different. Instead of returning the result, the result is built up into a result argument for each recursion. The fact that the recursive call is the last call of the function makes it possible for the compiler to optimize it.
I'm using an inner function join because of two reasons. First, to avoid exposing the res argument to the caller, as it should not be a part of the function definition.
The other is to simplify the match case, as the first iteration should only add the head element of the list to the result.
We can write a test to make sure that it can handle a lot of items:
[<Test>]
let ``should be able to handle 100000 items`` () =
let values = [1..100000] |> List.map (fun n -> n.ToString())
(fun () -> (join values) |> ignore) |> should not' (throw typeof<System.StackOverflowException>)
This gives us an idea of whether the function is able to handle the amount of data we're expecting to throw at it, but it really doesn't say whether it will eventually overflow the stack.
You can use a tool such as ilspy to decompile the assemblies and look at the Intermediate Language (IL) code. Let's take a look at what our first example looks like in C# after decompiling it:
public static string join(FSharpList<string> _arg1)
{
if (_arg1.TailOrNull == null)
{
return "";
}
if (_arg1.TailOrNull.TailOrNull == null)
{
return _arg1.HeadOrDefault;
}
FSharpList<string> tl = _arg1.TailOrNull;
string hd = _arg1.HeadOrDefault;
return hd + ", " + _1232OS_04_08.join(tl);
}
The last line is the culprit where the method makes a call to itself. Now, let's look at the second example with the tail recursion optimization:
public static string join(FSharpList<string> list)
{
FSharpFunc<string, FSharpFunc<FSharpList<string>, string>> _join = new _1232OS_04_09._join@7();
return FSharpFunc<string, FSharpList<string>>.InvokeFast<string>(_join, ListModule.Head<string>(list), ListModule.Tail<string>(list));
}
[Serializable]
internal class _join@7 : OptimizedClosures.FSharpFunc<string, FSharpList<string>, string>
{
internal _join@7()
{
}
public override string Invoke(string res, FSharpList<string> _arg1)
{
while (true)
{
FSharpList<string> fSharpList = _arg1;
if (fSharpList.TailOrNull == null)
{
break;
}
FSharpList<string> fSharpList2 = fSharpList;
FSharpList<string> tl = fSharpList2.TailOrNull;
string hd = fSharpList2.HeadOrDefault;
string arg_33_0 = res + ", " + hd;
_arg1 = tl;
res = arg_33_0;
}
return res;
}
}
F# is not very beautiful anymore after it got decompiled to C#, but the message here is clear. In the second example, the compiler optimized the recursive call into a while(true) loop, and this solved the whole problem with the stack overflow.
When writing expressive, declarative code, it is important to understand tail recursion. It is one of those things that will bite you, not while writing the code but in production where the data is much more diverse and hard to predict.
It is important to understand when to strive for tail call optimization and how to verify that it's actually there.
Parallel execution
The following was quoted in 2005:
"The free lunch is over."
--Herb Sutter
Since this happened, we have been looking for a way to do parallel execution in a simple way. Our computers aren't getting faster, but they are gaining more and more parallel execution power. This is, however, useless unless we find a way of utilizing this power.
We struggle to make our applications scale out to several CPU cores. Our tools have threads, locks, mutexes, and semaphores, and still we're not able to write concurrent programs that work well. This is because using those tools is very hard. Predicting the execution flow in a concurrent program is very hard and so is debugging a concurrent program.
The result of threading and locks leads to problems such as race conditions and deadlocks. One of the major caveats for concurrent programming is the state, and a mutable state is the beloved best friend of imperative programming. There is a major difference in parallelizing the imperative and declarative programs.
You have two reasons for your application to become concurrent:
Your program is CPU-bound
Your program is I/O-bound
For this purpose, F# has the async computational expression:
// download contents to url
let download url =
printfn "Start: %s" url
async {
// create client
let webClient = new WebClient()
try
let uri = System.Uri(url)
// download string
return! webClient.AsyncDownloadString(uri)
finally
printfn "Finish: %s" url
webClient.Dispose()
}
We can quite easily provide a test for this code if we are aware of the consequences of its side effects:
[<Test>]
let ``should download three urls in parallel`` () =
let baseUrl = "http://blog.mikaellundin.name"
let paths = [
"/2014/08/31/how-to-fix-your-sony-pulse-headset.html";
"/2014/05/02/tdd-is-not-dead.html";
"/2014/04/25/bugs-and-defects.html"]
// build urls
let urls = paths |> List.map (fun path -> baseUrl + path)
// test & await
let result = urls |> List.map (download) |> Async.Parallel |> Async.RunSynchronously
// assert
Assert.That((result |> (Seq.nth 0)), Is.StringContaining "Sony Pulse Headset")
Assert.That((result |> (Seq.nth 1)), Is.StringContaining "Writing code is not easy")
Assert.That((result |> (Seq.nth 2)), Is.StringContaining "Lexical Errors")
The important aspect here is that execution of individual computations don't need to end in the same order as they begin, but they will still render the same result.
This can be seen in the test output, as follows.
Writing code that is without side effects can help when you need to do concurrent programming because it will simplify the scenarios of having no state and needing no lock mechanisms. The declarative way of writing asynchronous code in F# makes it easier to test, as parallelization can be isolated within a function member and tested from a black box perspective.