不朽情缘怎么老是输

書名： Go語(yǔ)言精進(jìn)之路：從新手到高手的編程思想、方法和技巧（2）
作者名：白明
本章字?jǐn)?shù)： 2264字
更新時(shí)間： 2022-01-04 17:42:26

46.2　順序執(zhí)行和并行執(zhí)行的性能基準(zhǔn)測(cè)試

根據(jù)是否并行執(zhí)行，Go的性能基準(zhǔn)測(cè)試可以分為兩類：順序執(zhí)行的性能基準(zhǔn)測(cè)試和并行執(zhí)行的性能基準(zhǔn)測(cè)試。

1. 順序執(zhí)行的性能基準(zhǔn)測(cè)試

其代碼寫法如下：

func BenchmarkXxx(b *testing.B) {
    // ...
    for i := 0; i < b.N; i++ {
        // 被測(cè)對(duì)象的執(zhí)行代碼
    }
}

前面對(duì)多種字符串連接方法的性能基準(zhǔn)測(cè)試就歸屬于這一類。關(guān)于順序執(zhí)行的性能基準(zhǔn)測(cè)試的執(zhí)行過程原理，可以通過下面的例子來(lái)說明：

// chapter8/sources/benchmark-impl/sequential_test.go
var (
    m     map[int64]struct{} = make(map[int64]struct{}, 10)
    mu    sync.Mutex
    round int64 = 1
)

func BenchmarkSequential(b *testing.B) {
    fmt.Printf("\ngoroutine[%d] enter BenchmarkSequential: round[%d], b.N[%d]\n",
           tls.ID(), atomic.LoadInt64(&round), b.N)
    defer func() {
        atomic.AddInt64(&round, 1)
    }()

    for i := 0; i < b.N; i++ {
        mu.Lock()
        _, ok := m[round]
        if !ok {
            m[round] = struct{}{}
            fmt.Printf("goroutine[%d] enter loop in BenchmarkSequential: round[%d], b.N[%d]\n",
                tls.ID(), atomic.LoadInt64(&round), b.N)
        }
        mu.Unlock()
    }
    fmt.Printf("goroutine[%d] exit BenchmarkSequential: round[%d], b.N[%d]\n",
           tls.ID(), atomic.LoadInt64(&round), b.N)
}

運(yùn)行這個(gè)例子：

$go test -bench . sequential_test.go

goroutine[1] enter BenchmarkSequential: round[1], b.N[1]
goroutine[1] enter loop in BenchmarkSequential: round[1], b.N[1]
goroutine[1] exit BenchmarkSequential: round[1], b.N[1]
goos: darwin
goarch: amd64
BenchmarkSequential-8
goroutine[2] enter BenchmarkSequential: round[2], b.N[100]
goroutine[2] enter loop in BenchmarkSequential: round[2], b.N[100]
goroutine[2] exit BenchmarkSequential: round[2], b.N[100]
goroutine[2] enter BenchmarkSequential: round[3], b.N[10000]
goroutine[2] enter loop in BenchmarkSequential: round[3], b.N[10000]
goroutine[2] exit BenchmarkSequential: round[3], b.N[10000]

goroutine[2] enter BenchmarkSequential: round[4], b.N[1000000]
goroutine[2] enter loop in BenchmarkSequential: round[4], b.N[1000000]
goroutine[2] exit BenchmarkSequential: round[4], b.N[1000000]

goroutine[2] enter BenchmarkSequential: round[5], b.N[65666582]
goroutine[2] enter loop in BenchmarkSequential: round[5], b.N[65666582]
goroutine[2] exit BenchmarkSequential: round[5], b.N[65666582]
65666582           20.6 ns/op
PASS
ok         command-line-arguments 1.381s

我們看到：

BenchmarkSequential被執(zhí)行了多輪（見輸出結(jié)果中的round值）；
每一輪執(zhí)行，for循環(huán)的b.N值均不相同，依次為1、100、10000、1000000和65666582；
除b.N為1的首輪，其余各輪均在一個(gè)goroutine（goroutine[2]）中順序執(zhí)行。

默認(rèn)情況下，每個(gè)性能基準(zhǔn)測(cè)試函數(shù)（如BenchmarkSequential）的執(zhí)行時(shí)間為1秒。如果執(zhí)行一輪所消耗的時(shí)間不足1秒，那么go test會(huì)按就近的順序增加b.N的值：1、2、3、5、10、20、30、50、100等。如果當(dāng)b.N較小時(shí)，基準(zhǔn)測(cè)試執(zhí)行可以很快完成，那么go test基準(zhǔn)測(cè)試框架將跳過中間的一些值，選擇較大的值，比如像這里b.N從1直接跳到100。選定新的b.N之后，go test基準(zhǔn)測(cè)試框架會(huì)啟動(dòng)新一輪性能基準(zhǔn)測(cè)試函數(shù)的執(zhí)行，直到某一輪執(zhí)行所消耗的時(shí)間超出1秒。上面例子中最后一輪的b.N值為65666582，這個(gè)值應(yīng)該是go test根據(jù)上一輪執(zhí)行后得到的每次循環(huán)平均執(zhí)行時(shí)間計(jì)算出來(lái)的。go test發(fā)現(xiàn)，如果將上一輪每次循環(huán)平均執(zhí)行時(shí)間與再擴(kuò)大100倍的N值相乘，那么下一輪的執(zhí)行時(shí)間會(huì)超出1秒很多，于是go test用1秒與上一輪每次循環(huán)平均執(zhí)行時(shí)間一起估算出一個(gè)循環(huán)次數(shù)，即上面的65666582。

如果基準(zhǔn)測(cè)試僅運(yùn)行1秒，且在這1秒內(nèi)僅運(yùn)行10輪迭代，那么這些基準(zhǔn)測(cè)試運(yùn)行所得的平均值可能會(huì)有較高的標(biāo)準(zhǔn)偏差。如果基準(zhǔn)測(cè)試運(yùn)行了數(shù)百萬(wàn)或數(shù)十億次迭代，那么其所得平均值可能趨于準(zhǔn)確。要增加迭代次數(shù)，可以使用-benchtime命令行選項(xiàng)來(lái)增加基準(zhǔn)測(cè)試執(zhí)行的時(shí)間。

下面的例子中，我們通過go test的命令行參數(shù)-benchtime將1秒這個(gè)默認(rèn)性能基準(zhǔn)測(cè)試函數(shù)執(zhí)行時(shí)間改為2秒：

$go test -bench . sequential_test.go -benchtime 2s
...

goroutine[2] enter BenchmarkSequential: round[4], b.N[1000000]
goroutine[2] enter loop in BenchmarkSequential: round[4], b.N[1000000]
goroutine[2] exit BenchmarkSequential: round[4], b.N[1000000]

goroutine[2] enter BenchmarkSequential: round[5], b.N[100000000]
goroutine[2] enter loop in BenchmarkSequential: round[5], b.N[100000000]
goroutine[2] exit BenchmarkSequential: round[5], b.N[100000000]
100000000          20.5 ns/op
PASS
ok         command-line-arguments 2.075s

我們看到性能基準(zhǔn)測(cè)試函數(shù)執(zhí)行時(shí)間改為2秒后，最終輪的b.N的值可以增大到100000000。

也可以通過-benchtime手動(dòng)指定b.N的值，這樣go test就會(huì)以你指定的N值作為最終輪的循環(huán)次數(shù)：

$go test -v -benchtime 5x -bench . sequential_test.go
goos: darwin
goarch: amd64
BenchmarkSequential

goroutine[1] enter BenchmarkSequential: round[1], b.N[1]
goroutine[1] enter loop in BenchmarkSequential: round[1], b.N[1]
goroutine[1] exit BenchmarkSequential: round[1], b.N[1]

goroutine[2] enter BenchmarkSequential: round[2], b.N[5]
goroutine[2] enter loop in BenchmarkSequential: round[2], b.N[5]
goroutine[2] exit BenchmarkSequential: round[2], b.N[5]
BenchmarkSequential-8            5             5470 ns/op
PASS
ok        command-line-arguments 0.006s

上面的每個(gè)性能基準(zhǔn)測(cè)試函數(shù)（如BenchmarkSequential）雖然實(shí)際執(zhí)行了多輪，但也僅算一次執(zhí)行。有時(shí)候考慮到性能基準(zhǔn)測(cè)試單次執(zhí)行的數(shù)據(jù)不具代表性，我們可能會(huì)顯式要求go test多次執(zhí)行以收集多次數(shù)據(jù)，并將這些數(shù)據(jù)經(jīng)過統(tǒng)計(jì)學(xué)方法處理后的結(jié)果作為最終結(jié)果。通過-count命令行選項(xiàng)可以顯式指定每個(gè)性能基準(zhǔn)測(cè)試函數(shù)執(zhí)行次數(shù)：

$go test -v -count 2 -bench . benchmark_intro_test.go
goos: darwin
goarch: amd64
BenchmarkConcatStringByOperator
BenchmarkConcatStringByOperator-8       12665250            89.8 ns/op
BenchmarkConcatStringByOperator-8       13099075            89.7 ns/op
BenchmarkConcatStringBySprintf
BenchmarkConcatStringBySprintf-8         2781075             433 ns/op
BenchmarkConcatStringBySprintf-8         2662507             433 ns/op
BenchmarkConcatStringByJoin
BenchmarkConcatStringByJoin-8           23679480            49.1 ns/op
BenchmarkConcatStringByJoin-8           24135014            49.6 ns/op
PASS
ok         command-line-arguments 8.225s

上面的例子中每個(gè)性能基準(zhǔn)測(cè)試函數(shù)都被執(zhí)行了兩次（當(dāng)然每次執(zhí)行實(shí)質(zhì)上都會(huì)運(yùn)行多輪，b.N不同），輸出了兩個(gè)結(jié)果。

2. 并行執(zhí)行的性能基準(zhǔn)測(cè)試

并行執(zhí)行的性能基準(zhǔn)測(cè)試的代碼寫法如下：

func BenchmarkXxx(b *testing.B) {
    // ...
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            // 被測(cè)對(duì)象的執(zhí)行代碼
        }
    }
}

并行執(zhí)行的基準(zhǔn)測(cè)試主要用于為包含多goroutine同步設(shè)施（如互斥鎖、讀寫鎖、原子操作等）的被測(cè)代碼建立性能基準(zhǔn)。相比于順序執(zhí)行的基準(zhǔn)測(cè)試，并行執(zhí)行的基準(zhǔn)測(cè)試更能真實(shí)反映出多goroutine情況下，被測(cè)代碼在goroutine同步上的真實(shí)消耗。比如下面這個(gè)例子：

// chapter8/sources/benchmark_paralell_demo_test.go

var n1 int64

func addSyncByAtomic(delta int64) int64 {
    return atomic.AddInt64(&n1, delta)
}

func readSyncByAtomic() int64 {
    return atomic.LoadInt64(&n1)
}

var n2 int64
var rwmu sync.RWMutex

func addSyncByMutex(delta int64) {
    rwmu.Lock()
    n2 += delta
    rwmu.Unlock()
}

func readSyncByMutex() int64 {
    var n int64
    rwmu.RLock()
    n = n2
    rwmu.RUnlock()
    return n
}

func BenchmarkAddSyncByAtomic(b *testing.B) {
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            addSyncByAtomic(1)
        }
    })
}

func BenchmarkReadSyncByAtomic(b *testing.B) {
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            readSyncByAtomic()
        }
    })
}

func BenchmarkAddSyncByMutex(b *testing.B) {
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            addSyncByMutex(1)
        }
    })
}

func BenchmarkReadSyncByMutex(b *testing.B) {
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            readSyncByMutex()
        }
    })
}

運(yùn)行該性能基準(zhǔn)測(cè)試：

$go test -v -bench . benchmark_paralell_demo_test.go -cpu 2,4,8
goos: darwin
goarch: amd64
BenchmarkAddSyncByAtomic
BenchmarkAddSyncByAtomic-2        75208119              15.3 ns/op
BenchmarkAddSyncByAtomic-4        70117809              17.0 ns/op
BenchmarkAddSyncByAtomic-8        68664270              15.9 ns/op
BenchmarkReadSyncByAtomic
BenchmarkReadSyncByAtomic-2       1000000000           0.744 ns/op
BenchmarkReadSyncByAtomic-4       1000000000           0.384 ns/op
BenchmarkReadSyncByAtomic-8       1000000000           0.240 ns/op
BenchmarkAddSyncByMutex
BenchmarkAddSyncByMutex-2         37533390              31.4 ns/op
BenchmarkAddSyncByMutex-4         21660948              57.5 ns/op
BenchmarkAddSyncByMutex-8         16808721              72.6 ns/op
BenchmarkReadSyncByMutex
BenchmarkReadSyncByMutex-2        35535615              32.3 ns/op
BenchmarkReadSyncByMutex-4        29839219              39.6 ns/op
BenchmarkReadSyncByMutex-8        29936805              39.8 ns/op
PASS
ok         command-line-arguments 12.454s

上面的例子中通過-cpu 2,4,8命令行選項(xiàng)告知go test將每個(gè)性能基準(zhǔn)測(cè)試函數(shù)分別在GOMAXPROCS等于2、4、8的情況下各運(yùn)行一次。從測(cè)試的輸出結(jié)果，我們可以很容易地看出不同被測(cè)函數(shù)的性能隨著GOMAXPROCS增大之后的性能變化情況。

和順序執(zhí)行的性能基準(zhǔn)測(cè)試不同，并行執(zhí)行的性能基準(zhǔn)測(cè)試會(huì)啟動(dòng)多個(gè)goroutine并行執(zhí)行基準(zhǔn)測(cè)試函數(shù)中的循環(huán)。這里也用一個(gè)例子來(lái)說明一下其執(zhí)行流程：

// chapter8/sources/benchmark-impl/paralell_test.go
var (
    m     map[int64]int = make(map[int64]int, 20)
    mu    sync.Mutex
    round int64 = 1
)

func BenchmarkParalell(b *testing.B) {
    fmt.Printf("\ngoroutine[%d] enter BenchmarkParalell: round[%d], b.N[%d]\n",
           tls.ID(), atomic.LoadInt64(&round), b.N)
    defer func() {
        atomic.AddInt64(&round, 1)
    }()

    b.RunParallel(func(pb *testing.PB) {
        id := tls.ID()
        fmt.Printf("goroutine[%d] enter loop func in BenchmarkParalell: round[%d], b.N[%d]\n", tls.ID(), atomic.LoadInt64(&round), b.N)
        for pb.Next() {
            mu.Lock()
            _, ok := m[id]
            if !ok {
                m[id] = 1
            } else {
                m[id] = m[id] + 1
            }
            mu.Unlock()
        }

        mu.Lock()
        count := m[id]
        mu.Unlock()

        fmt.Printf("goroutine[%d] exit loop func in BenchmarkParalell: round[%d], loop[%d]\n", tls.ID(), atomic.LoadInt64(&round), count)
    })

    fmt.Printf("goroutine[%d] exit BenchmarkParalell: round[%d], b.N[%d]\n",
        tls.ID(), atomic.LoadInt64(&round), b.N)
}

以-cpu=2運(yùn)行該例子：

$go test -v  -bench . paralell_test.go -cpu=2
goos: darwin
goarch: amd64
BenchmarkParalell

goroutine[1] enter BenchmarkParalell: round[1], b.N[1]
goroutine[2] enter loop func in BenchmarkParalell: round[1], b.N[1]
goroutine[2] exit loop func in BenchmarkParalell: round[1], loop[1]
goroutine[3] enter loop func in BenchmarkParalell: round[1], b.N[1]
goroutine[3] exit loop func in BenchmarkParalell: round[1], loop[0]
goroutine[1] exit BenchmarkParalell: round[1], b.N[1]

goroutine[4] enter BenchmarkParalell: round[2], b.N[100]
goroutine[5] enter loop func in BenchmarkParalell: round[2], b.N[100]
goroutine[5] exit loop func in BenchmarkParalell: round[2], loop[100]
goroutine[6] enter loop func in BenchmarkParalell: round[2], b.N[100]
goroutine[6] exit loop func in BenchmarkParalell: round[2], loop[0]
goroutine[4] exit BenchmarkParalell: round[2], b.N[100]

goroutine[4] enter BenchmarkParalell: round[3], b.N[10000]
goroutine[7] enter loop func in BenchmarkParalell: round[3], b.N[10000]
goroutine[8] enter loop func in BenchmarkParalell: round[3], b.N[10000]
goroutine[8] exit loop func in BenchmarkParalell: round[3], loop[4576]
goroutine[7] exit loop func in BenchmarkParalell: round[3], loop[5424]
goroutine[4] exit BenchmarkParalell: round[3], b.N[10000]

goroutine[4] enter BenchmarkParalell: round[4], b.N[1000000]
goroutine[9] enter loop func in BenchmarkParalell: round[4], b.N[1000000]
goroutine[10] enter loop func in BenchmarkParalell: round[4], b.N[1000000]
goroutine[9] exit loop func in BenchmarkParalell: round[4], loop[478750]
goroutine[10] exit loop func in BenchmarkParalell: round[4], loop[521250]
goroutine[4] exit BenchmarkParalell: round[4], b.N[1000000]

goroutine[4] enter BenchmarkParalell: round[5], b.N[25717561]
goroutine[11] enter loop func in BenchmarkParalell: round[5], b.N[25717561]
goroutine[12] enter loop func in BenchmarkParalell: round[5], b.N[25717561]
goroutine[12] exit loop func in BenchmarkParalell: round[5], loop[11651491]
goroutine[11] exit loop func in BenchmarkParalell: round[5], loop[14066070]
goroutine[4] exit BenchmarkParalell: round[5], b.N[25717561]
BenchmarkParalell-2       25717561               43.6 ns/op
PASS
ok         command-line-arguments 1.176s

我們看到，針對(duì)BenchmarkParalell基準(zhǔn)測(cè)試的每一輪執(zhí)行，go test都會(huì)啟動(dòng)GOMAXPROCS數(shù)量的新goroutine，這些goroutine共同執(zhí)行b.N次循環(huán)，每個(gè)goroutine會(huì)盡量相對(duì)均衡地分擔(dān)循環(huán)次數(shù)。

官术网_书友最值得收藏!

Go語(yǔ)言精進(jìn)之路：從新手到高手的編程思想、方法和技巧（2）

46.2 順序執(zhí)行和并行執(zhí)行的性能基準(zhǔn)測(cè)試

46.2　順序執(zhí)行和并行執(zhí)行的性能基準(zhǔn)測(cè)試