官术网_书友最值得收藏!

Expressiveness through functional programming

There is a very important distinction between programming language paradigms. Two kinds of programming styles stand against each other:

  • Imperative programming
  • Declarative programming

The imperative programming style has its origin in Fortran, and the declarative programming paradigm's roots can be followed back to Lisp. The following sections will compare the two styles and explain why we prefer one over the other in testing.

Imperative programming

Imperative programming has become the standard of programming methodology with the rise of languages such as BASIC, Pascal, C, and C#. It means describing to the computer how to calculate a result, in a way that you would describe how to make a cake:

// imperative programming
let double_imperative numbers =
    let doubled = System.Collections.Generic.List<int>()

    for number in numbers do
        let double = number * 2
        doubled.Add(double)

    doubled 

This code example will double all the values in a list by first creating a result list and then doubling it number by number and adding it to the result. Once it goes through all the numbers, the result list will be returned.

In this code, there is a clear path of how double values are calculated, and the program describes how this is accomplished.

Declarative programming

Instead of instructing the computer how to calculate a result, you can describe what the result is. If the result is too complex, break it down into parts and describe the parts and then concatenate those parts. It leads you to a series of descriptions instead of a workflow of what should be done:

// declarative programming
let double numbers =
    let double = (*) 2
    Seq.map double numbers 

This code example also doubles all the values in a list, but instead of describing how this is done, the code describes what this means:

  • The double function means taking a value and multiplying it by 2
  • Result implies by mapping double onto the numbers function

The main difference between these two methods is that imperative programming describes how and declarative programming describes what. The imperative example created a state to bake the cake, whereas the declarative example broke down the problem into describable parts.

Declarative programming is less prone to faults because there is no mutable state. It is also easier to test as you break down the problem into isolated parts that are not only individually describable but also individually testable. This way, there is a high increase in quality and testability when you use declarative programming.

Let's look at another example. You need to parse roman numerals into integers. In an imperative programming style, you would go about doing this in the following manner:

// parse a roman numeral and return its value
let parse (s : string) =
    // mutable value
    let mutable input = s
    let mutable result = 0

    let romanNumerals = 
        [|("M",1000); ("CM" ,900); ("D",500); ("CD",400); ("C",100 ); 
        ("XC",90); ("L",50); ("XL",40); ("X",10 ); ("IX",9); ("V",5); 
        ("IV",4); ("I", 1)|]

    while not (String.IsNullOrEmpty input) do
        let mutable found = false
        let mutable i = 0

        // iterate over romanNumerals matching it with input string
        while (not found) || i < romanNumerals.Length do
            let romanNumeral, value = romanNumerals.[i] 

            // does input start with current romanNumeral?
            if input.StartsWith(romanNumeral, StringComparison.CurrentCultureIgnoreCase) then
                result <- result + value
                input <- input.Substring(romanNumeral.Length)
                found <- true

            // iterate
            i <- i + 1

        // no roman numeral found at beginning of string
        if (not found) then
            failwith "invalid roman numeral"

    result 

The following is the basic usage of the parse function:

> parse "MMXIV";;
val it : int = 2014

The code starts by creating two mutable variables: one is a working copy of the incoming value and the other is a result variable. The program will then work on the input variable, matching the start of the string with roman numerals and as a numeral is found adding its value to the result and removing it from the input string. Once the string is empty, the parsing is complete and the value is in the result variable.

There are a lot of moving parts in this code, and it is hard to debug because of the different states that the variables can be in.

Here's an example of how to write this in a more expressive and declarative way:

// partial active pattern to match start of string
let (|StartsWith|_|) (p : string) (s : string) =
    if s.StartsWith(p, StringComparison.CurrentCultureIgnoreCase) then
        Some(s.Substring(p.Length))
    else
        None

// parse a roman numeral and return its value
let rec parse = function
| "" -> 0
| StartsWith "M" rest  -> 1000 + (parse rest)
| StartsWith "CM" rest -> 900 + (parse rest)
| StartsWith "D" rest  -> 500 + (parse rest)
| StartsWith "CD" rest -> 400 + (parse rest)
| StartsWith "C" rest  -> 100 + (parse rest)
| StartsWith "XC" rest -> 90 + (parse rest)
| StartsWith "L" rest  -> 50 + (parse rest)
| StartsWith "XL" rest -> 40 + (parse rest)
| StartsWith "X" rest  -> 10 + (parse rest)
| StartsWith "IX" rest -> 9 + (parse rest)
| StartsWith "V" rest  -> 5 + (parse rest)
| StartsWith "IV" rest -> 4 + (parse rest)
| StartsWith "I" rest  -> 1 + (parse rest)
| _ -> failwith "invalid roman numeral" 

The following is the basic usage of the function:

> parse "MMXIV";;
val it : int = 2014

This code used an F# concept called active pattern and then used pattern matching to match the beginning of the string with the different roman numerals. The value of the parsed numeral is added by parsing the rest of the string.

There are no moving parts in this example, only expressions of what each Roman numeral is worth. The functionality is easier to debug and to test.

Tail call optimization

Writing expressive code is very powerful and gives your code a lot of readability, but it can also be hurtful.

The following will create a comma-separated string out of a list of values:

// join values together to a comma separated string
let rec join = function
| [] -> ""
| hd :: [] -> hd
| hd :: tl -> hd + ", " + (join tl) 

This code will expand into the following call tree.

> ["1"; "2"; "3"; "4"; "5"] |> join;;
join ["1"; "2"; "3"; "4"; "5"]
<? join ["2"; "3"; "4"; "5"]
  <? join ["3"; "4"; "5"]
    <? join ["4"; "5"]
      <? join ["5"]
        <? join []

There are two problems with this:

  • It is very inefficient to perform these many function calls
  • Each function call will occupy a record on the stack, leading to stack overflow for large recursion trees

We can show this by running the following tests. First, we run the join function with 30,000 items with timing on:

> #time;;
> [1..30000] |> List.map (fun n ?> n.ToString()) |> join;;
Real: 00:00:06.288, CPU: 00:00:06.203, GC gen0: 140, gen1: 61, gen2: 60
val it : string = "1, 2, .... 30000"

The execution lasts for about 6 seconds. In the second test, we run the join function with 1,00,000 items:

> [1..100000] |> List.map (fun n ?> n.ToString()) |> join;;
Process is terminated due to StackOverflowException.
Session termination detected. Press Enter to restart.

The result is a crash because there was not enough room on the stack for that many items.

The F# compiler has something called tail call optimization to deal with these problems. You can write expressive code and still have it running safe without the risk of the StackOverflowException exception. You only need to be aware of how to write code so it becomes optimized:

// join values together to a comma separated string
let join list =
    let rec _join res = function
    | [] -> res
    | hd :: tl -> _join (res + ", " + hd) tl

    if list = List.Empty then
        ""
    else
        (_join (List.head list) (List.tail list))

The structure of this function is somewhat different. Instead of returning the result, the result is built up into a result argument for each recursion. The fact that the recursive call is the last call of the function makes it possible for the compiler to optimize it.

I'm using an inner function join because of two reasons. First, to avoid exposing the res argument to the caller, as it should not be a part of the function definition.

The other is to simplify the match case, as the first iteration should only add the head element of the list to the result.

We can write a test to make sure that it can handle a lot of items:

[<Test>]
let ``should be able to handle 100000 items`` () =
    let values = [1..100000] |> List.map (fun n -> n.ToString())
    (fun () -> (join values) |> ignore) |> should not' (throw typeof<System.StackOverflowException>)

This gives us an idea of whether the function is able to handle the amount of data we're expecting to throw at it, but it really doesn't say whether it will eventually overflow the stack.

You can use a tool such as ilspy to decompile the assemblies and look at the Intermediate Language (IL) code. Let's take a look at what our first example looks like in C# after decompiling it:

public static string join(FSharpList<string> _arg1)
{
  if (_arg1.TailOrNull == null)
  {
    return "";
  }
  if (_arg1.TailOrNull.TailOrNull == null)
  {
    return _arg1.HeadOrDefault;
  }
  FSharpList<string> tl = _arg1.TailOrNull;
  string hd = _arg1.HeadOrDefault;
 return hd + ", " + _1232OS_04_08.join(tl);
} 

The last line is the culprit where the method makes a call to itself. Now, let's look at the second example with the tail recursion optimization:

public static string join(FSharpList<string> list)
{
  FSharpFunc<string, FSharpFunc<FSharpList<string>, string>> _join = new _1232OS_04_09._join@7();
  return FSharpFunc<string, FSharpList<string>>.InvokeFast<string>(_join, ListModule.Head<string>(list), ListModule.Tail<string>(list));
}

[Serializable]
internal class _join@7 : OptimizedClosures.FSharpFunc<string, FSharpList<string>, string>
{
  internal _join@7()
  {
  }
  public override string Invoke(string res, FSharpList<string> _arg1)
  {
 while (true)
    {
      FSharpList<string> fSharpList = _arg1;
      if (fSharpList.TailOrNull == null)
      {
        break;
      }
      FSharpList<string> fSharpList2 = fSharpList;
      FSharpList<string> tl = fSharpList2.TailOrNull;
      string hd = fSharpList2.HeadOrDefault;
      string arg_33_0 = res + ", " + hd;
      _arg1 = tl;
      res = arg_33_0;
    }
    return res;
  }
} 

F# is not very beautiful anymore after it got decompiled to C#, but the message here is clear. In the second example, the compiler optimized the recursive call into a while(true) loop, and this solved the whole problem with the stack overflow.

When writing expressive, declarative code, it is important to understand tail recursion. It is one of those things that will bite you, not while writing the code but in production where the data is much more diverse and hard to predict.

It is important to understand when to strive for tail call optimization and how to verify that it's actually there.

Parallel execution

The following was quoted in 2005:

 

"The free lunch is over."

 
  --Herb Sutter

Since this happened, we have been looking for a way to do parallel execution in a simple way. Our computers aren't getting faster, but they are gaining more and more parallel execution power. This is, however, useless unless we find a way of utilizing this power.

We struggle to make our applications scale out to several CPU cores. Our tools have threads, locks, mutexes, and semaphores, and still we're not able to write concurrent programs that work well. This is because using those tools is very hard. Predicting the execution flow in a concurrent program is very hard and so is debugging a concurrent program.

The result of threading and locks leads to problems such as race conditions and deadlocks. One of the major caveats for concurrent programming is the state, and a mutable state is the beloved best friend of imperative programming. There is a major difference in parallelizing the imperative and declarative programs.

You have two reasons for your application to become concurrent:

  • Your program is CPU-bound
  • Your program is I/O-bound

For this purpose, F# has the async computational expression:

// download contents to url
let download url =
    printfn "Start: %s" url
    async {
        // create client
        let webClient = new WebClient()

        try
            let uri = System.Uri(url)

            // download string
            return! webClient.AsyncDownloadString(uri)
                
        finally
            printfn "Finish: %s" url
            webClient.Dispose()
    }

We can quite easily provide a test for this code if we are aware of the consequences of its side effects:

[<Test>]
let ``should download three urls in parallel`` () =
    let baseUrl = "http://blog.mikaellundin.name"
    let paths = [
        "/2014/08/31/how-to-fix-your-sony-pulse-headset.html";
        "/2014/05/02/tdd-is-not-dead.html";
        "/2014/04/25/bugs-and-defects.html"]

    // build urls
    let urls = paths |> List.map (fun path -> baseUrl + path)

    // test & await
    let result = urls |> List.map (download) |> Async.Parallel |> Async.RunSynchronously

    // assert
    Assert.That((result |> (Seq.nth 0)), Is.StringContaining "Sony Pulse Headset")
    Assert.That((result |> (Seq.nth 1)), Is.StringContaining "Writing code is not easy")
    Assert.That((result |> (Seq.nth 2)), Is.StringContaining "Lexical Errors") 

The important aspect here is that execution of individual computations don't need to end in the same order as they begin, but they will still render the same result.

This can be seen in the test output, as follows.

Parallel execution

Writing code that is without side effects can help when you need to do concurrent programming because it will simplify the scenarios of having no state and needing no lock mechanisms. The declarative way of writing asynchronous code in F# makes it easier to test, as parallelization can be isolated within a function member and tested from a black box perspective.

主站蜘蛛池模板: 汉源县| 城固县| 高雄县| 图片| 澎湖县| 怀宁县| 龙井市| 漳平市| 布尔津县| 顺平县| 宁都县| 永兴县| 呈贡县| 曲松县| 潼关县| 牟定县| 临夏县| 益阳市| 增城市| 开化县| 乌审旗| 昌都县| 荔波县| 灵台县| 钟祥市| 阿拉尔市| 无为县| 平山县| 龙南县| 永城市| 保德县| 平邑县| 岗巴县| 元朗区| 桐城市| 靖西县| 繁峙县| 济源市| 全南县| 东台市| 新巴尔虎左旗|