- Go語(yǔ)言精進(jìn)之路:從新手到高手的編程思想、方法和技巧(2)
- 白明
- 2264字
- 2022-01-04 17:42:26
46.2 順序執(zhí)行和并行執(zhí)行的性能基準(zhǔn)測(cè)試
根據(jù)是否并行執(zhí)行,Go的性能基準(zhǔn)測(cè)試可以分為兩類:順序執(zhí)行的性能基準(zhǔn)測(cè)試和并行執(zhí)行的性能基準(zhǔn)測(cè)試。
1. 順序執(zhí)行的性能基準(zhǔn)測(cè)試
其代碼寫法如下:
func BenchmarkXxx(b *testing.B) { // ... for i := 0; i < b.N; i++ { // 被測(cè)對(duì)象的執(zhí)行代碼 } }
前面對(duì)多種字符串連接方法的性能基準(zhǔn)測(cè)試就歸屬于這一類。關(guān)于順序執(zhí)行的性能基準(zhǔn)測(cè)試的執(zhí)行過程原理,可以通過下面的例子來(lái)說明:
// chapter8/sources/benchmark-impl/sequential_test.go var ( m map[int64]struct{} = make(map[int64]struct{}, 10) mu sync.Mutex round int64 = 1 ) func BenchmarkSequential(b *testing.B) { fmt.Printf("\ngoroutine[%d] enter BenchmarkSequential: round[%d], b.N[%d]\n", tls.ID(), atomic.LoadInt64(&round), b.N) defer func() { atomic.AddInt64(&round, 1) }() for i := 0; i < b.N; i++ { mu.Lock() _, ok := m[round] if !ok { m[round] = struct{}{} fmt.Printf("goroutine[%d] enter loop in BenchmarkSequential: round[%d], b.N[%d]\n", tls.ID(), atomic.LoadInt64(&round), b.N) } mu.Unlock() } fmt.Printf("goroutine[%d] exit BenchmarkSequential: round[%d], b.N[%d]\n", tls.ID(), atomic.LoadInt64(&round), b.N) }
運(yùn)行這個(gè)例子:
$go test -bench . sequential_test.go goroutine[1] enter BenchmarkSequential: round[1], b.N[1] goroutine[1] enter loop in BenchmarkSequential: round[1], b.N[1] goroutine[1] exit BenchmarkSequential: round[1], b.N[1] goos: darwin goarch: amd64 BenchmarkSequential-8 goroutine[2] enter BenchmarkSequential: round[2], b.N[100] goroutine[2] enter loop in BenchmarkSequential: round[2], b.N[100] goroutine[2] exit BenchmarkSequential: round[2], b.N[100] goroutine[2] enter BenchmarkSequential: round[3], b.N[10000] goroutine[2] enter loop in BenchmarkSequential: round[3], b.N[10000] goroutine[2] exit BenchmarkSequential: round[3], b.N[10000] goroutine[2] enter BenchmarkSequential: round[4], b.N[1000000] goroutine[2] enter loop in BenchmarkSequential: round[4], b.N[1000000] goroutine[2] exit BenchmarkSequential: round[4], b.N[1000000] goroutine[2] enter BenchmarkSequential: round[5], b.N[65666582] goroutine[2] enter loop in BenchmarkSequential: round[5], b.N[65666582] goroutine[2] exit BenchmarkSequential: round[5], b.N[65666582] 65666582 20.6 ns/op PASS ok command-line-arguments 1.381s
我們看到:
- BenchmarkSequential被執(zhí)行了多輪(見輸出結(jié)果中的round值);
- 每一輪執(zhí)行,for循環(huán)的b.N值均不相同,依次為1、100、10000、1000000和65666582;
- 除b.N為1的首輪,其余各輪均在一個(gè)goroutine(goroutine[2])中順序執(zhí)行。
默認(rèn)情況下,每個(gè)性能基準(zhǔn)測(cè)試函數(shù)(如BenchmarkSequential)的執(zhí)行時(shí)間為1秒。如果執(zhí)行一輪所消耗的時(shí)間不足1秒,那么go test會(huì)按就近的順序增加b.N的值:1、2、3、5、10、20、30、50、100等。如果當(dāng)b.N較小時(shí),基準(zhǔn)測(cè)試執(zhí)行可以很快完成,那么go test基準(zhǔn)測(cè)試框架將跳過中間的一些值,選擇較大的值,比如像這里b.N從1直接跳到100。選定新的b.N之后,go test基準(zhǔn)測(cè)試框架會(huì)啟動(dòng)新一輪性能基準(zhǔn)測(cè)試函數(shù)的執(zhí)行,直到某一輪執(zhí)行所消耗的時(shí)間超出1秒。上面例子中最后一輪的b.N值為65666582,這個(gè)值應(yīng)該是go test根據(jù)上一輪執(zhí)行后得到的每次循環(huán)平均執(zhí)行時(shí)間計(jì)算出來(lái)的。go test發(fā)現(xiàn),如果將上一輪每次循環(huán)平均執(zhí)行時(shí)間與再擴(kuò)大100倍的N值相乘,那么下一輪的執(zhí)行時(shí)間會(huì)超出1秒很多,于是go test用1秒與上一輪每次循環(huán)平均執(zhí)行時(shí)間一起估算出一個(gè)循環(huán)次數(shù),即上面的65666582。
如果基準(zhǔn)測(cè)試僅運(yùn)行1秒,且在這1秒內(nèi)僅運(yùn)行10輪迭代,那么這些基準(zhǔn)測(cè)試運(yùn)行所得的平均值可能會(huì)有較高的標(biāo)準(zhǔn)偏差。如果基準(zhǔn)測(cè)試運(yùn)行了數(shù)百萬(wàn)或數(shù)十億次迭代,那么其所得平均值可能趨于準(zhǔn)確。要增加迭代次數(shù),可以使用-benchtime命令行選項(xiàng)來(lái)增加基準(zhǔn)測(cè)試執(zhí)行的時(shí)間。
下面的例子中,我們通過go test的命令行參數(shù)-benchtime將1秒這個(gè)默認(rèn)性能基準(zhǔn)測(cè)試函數(shù)執(zhí)行時(shí)間改為2秒:
$go test -bench . sequential_test.go -benchtime 2s ... goroutine[2] enter BenchmarkSequential: round[4], b.N[1000000] goroutine[2] enter loop in BenchmarkSequential: round[4], b.N[1000000] goroutine[2] exit BenchmarkSequential: round[4], b.N[1000000] goroutine[2] enter BenchmarkSequential: round[5], b.N[100000000] goroutine[2] enter loop in BenchmarkSequential: round[5], b.N[100000000] goroutine[2] exit BenchmarkSequential: round[5], b.N[100000000] 100000000 20.5 ns/op PASS ok command-line-arguments 2.075s
我們看到性能基準(zhǔn)測(cè)試函數(shù)執(zhí)行時(shí)間改為2秒后,最終輪的b.N的值可以增大到100000000。
也可以通過-benchtime手動(dòng)指定b.N的值,這樣go test就會(huì)以你指定的N值作為最終輪的循環(huán)次數(shù):
$go test -v -benchtime 5x -bench . sequential_test.go goos: darwin goarch: amd64 BenchmarkSequential goroutine[1] enter BenchmarkSequential: round[1], b.N[1] goroutine[1] enter loop in BenchmarkSequential: round[1], b.N[1] goroutine[1] exit BenchmarkSequential: round[1], b.N[1] goroutine[2] enter BenchmarkSequential: round[2], b.N[5] goroutine[2] enter loop in BenchmarkSequential: round[2], b.N[5] goroutine[2] exit BenchmarkSequential: round[2], b.N[5] BenchmarkSequential-8 5 5470 ns/op PASS ok command-line-arguments 0.006s
上面的每個(gè)性能基準(zhǔn)測(cè)試函數(shù)(如BenchmarkSequential)雖然實(shí)際執(zhí)行了多輪,但也僅算一次執(zhí)行。有時(shí)候考慮到性能基準(zhǔn)測(cè)試單次執(zhí)行的數(shù)據(jù)不具代表性,我們可能會(huì)顯式要求go test多次執(zhí)行以收集多次數(shù)據(jù),并將這些數(shù)據(jù)經(jīng)過統(tǒng)計(jì)學(xué)方法處理后的結(jié)果作為最終結(jié)果。通過-count命令行選項(xiàng)可以顯式指定每個(gè)性能基準(zhǔn)測(cè)試函數(shù)執(zhí)行次數(shù):
$go test -v -count 2 -bench . benchmark_intro_test.go goos: darwin goarch: amd64 BenchmarkConcatStringByOperator BenchmarkConcatStringByOperator-8 12665250 89.8 ns/op BenchmarkConcatStringByOperator-8 13099075 89.7 ns/op BenchmarkConcatStringBySprintf BenchmarkConcatStringBySprintf-8 2781075 433 ns/op BenchmarkConcatStringBySprintf-8 2662507 433 ns/op BenchmarkConcatStringByJoin BenchmarkConcatStringByJoin-8 23679480 49.1 ns/op BenchmarkConcatStringByJoin-8 24135014 49.6 ns/op PASS ok command-line-arguments 8.225s
上面的例子中每個(gè)性能基準(zhǔn)測(cè)試函數(shù)都被執(zhí)行了兩次(當(dāng)然每次執(zhí)行實(shí)質(zhì)上都會(huì)運(yùn)行多輪,b.N不同),輸出了兩個(gè)結(jié)果。
2. 并行執(zhí)行的性能基準(zhǔn)測(cè)試
并行執(zhí)行的性能基準(zhǔn)測(cè)試的代碼寫法如下:
func BenchmarkXxx(b *testing.B) { // ... b.RunParallel(func(pb *testing.PB) { for pb.Next() { // 被測(cè)對(duì)象的執(zhí)行代碼 } } }
并行執(zhí)行的基準(zhǔn)測(cè)試主要用于為包含多goroutine同步設(shè)施(如互斥鎖、讀寫鎖、原子操作等)的被測(cè)代碼建立性能基準(zhǔn)。相比于順序執(zhí)行的基準(zhǔn)測(cè)試,并行執(zhí)行的基準(zhǔn)測(cè)試更能真實(shí)反映出多goroutine情況下,被測(cè)代碼在goroutine同步上的真實(shí)消耗。比如下面這個(gè)例子:
// chapter8/sources/benchmark_paralell_demo_test.go var n1 int64 func addSyncByAtomic(delta int64) int64 { return atomic.AddInt64(&n1, delta) } func readSyncByAtomic() int64 { return atomic.LoadInt64(&n1) } var n2 int64 var rwmu sync.RWMutex func addSyncByMutex(delta int64) { rwmu.Lock() n2 += delta rwmu.Unlock() } func readSyncByMutex() int64 { var n int64 rwmu.RLock() n = n2 rwmu.RUnlock() return n } func BenchmarkAddSyncByAtomic(b *testing.B) { b.RunParallel(func(pb *testing.PB) { for pb.Next() { addSyncByAtomic(1) } }) } func BenchmarkReadSyncByAtomic(b *testing.B) { b.RunParallel(func(pb *testing.PB) { for pb.Next() { readSyncByAtomic() } }) } func BenchmarkAddSyncByMutex(b *testing.B) { b.RunParallel(func(pb *testing.PB) { for pb.Next() { addSyncByMutex(1) } }) } func BenchmarkReadSyncByMutex(b *testing.B) { b.RunParallel(func(pb *testing.PB) { for pb.Next() { readSyncByMutex() } }) }
運(yùn)行該性能基準(zhǔn)測(cè)試:
$go test -v -bench . benchmark_paralell_demo_test.go -cpu 2,4,8 goos: darwin goarch: amd64 BenchmarkAddSyncByAtomic BenchmarkAddSyncByAtomic-2 75208119 15.3 ns/op BenchmarkAddSyncByAtomic-4 70117809 17.0 ns/op BenchmarkAddSyncByAtomic-8 68664270 15.9 ns/op BenchmarkReadSyncByAtomic BenchmarkReadSyncByAtomic-2 1000000000 0.744 ns/op BenchmarkReadSyncByAtomic-4 1000000000 0.384 ns/op BenchmarkReadSyncByAtomic-8 1000000000 0.240 ns/op BenchmarkAddSyncByMutex BenchmarkAddSyncByMutex-2 37533390 31.4 ns/op BenchmarkAddSyncByMutex-4 21660948 57.5 ns/op BenchmarkAddSyncByMutex-8 16808721 72.6 ns/op BenchmarkReadSyncByMutex BenchmarkReadSyncByMutex-2 35535615 32.3 ns/op BenchmarkReadSyncByMutex-4 29839219 39.6 ns/op BenchmarkReadSyncByMutex-8 29936805 39.8 ns/op PASS ok command-line-arguments 12.454s
上面的例子中通過-cpu 2,4,8命令行選項(xiàng)告知go test將每個(gè)性能基準(zhǔn)測(cè)試函數(shù)分別在GOMAXPROCS等于2、4、8的情況下各運(yùn)行一次。從測(cè)試的輸出結(jié)果,我們可以很容易地看出不同被測(cè)函數(shù)的性能隨著GOMAXPROCS增大之后的性能變化情況。
和順序執(zhí)行的性能基準(zhǔn)測(cè)試不同,并行執(zhí)行的性能基準(zhǔn)測(cè)試會(huì)啟動(dòng)多個(gè)goroutine并行執(zhí)行基準(zhǔn)測(cè)試函數(shù)中的循環(huán)。這里也用一個(gè)例子來(lái)說明一下其執(zhí)行流程:
// chapter8/sources/benchmark-impl/paralell_test.go var ( m map[int64]int = make(map[int64]int, 20) mu sync.Mutex round int64 = 1 ) func BenchmarkParalell(b *testing.B) { fmt.Printf("\ngoroutine[%d] enter BenchmarkParalell: round[%d], b.N[%d]\n", tls.ID(), atomic.LoadInt64(&round), b.N) defer func() { atomic.AddInt64(&round, 1) }() b.RunParallel(func(pb *testing.PB) { id := tls.ID() fmt.Printf("goroutine[%d] enter loop func in BenchmarkParalell: round[%d], b.N[%d]\n", tls.ID(), atomic.LoadInt64(&round), b.N) for pb.Next() { mu.Lock() _, ok := m[id] if !ok { m[id] = 1 } else { m[id] = m[id] + 1 } mu.Unlock() } mu.Lock() count := m[id] mu.Unlock() fmt.Printf("goroutine[%d] exit loop func in BenchmarkParalell: round[%d], loop[%d]\n", tls.ID(), atomic.LoadInt64(&round), count) }) fmt.Printf("goroutine[%d] exit BenchmarkParalell: round[%d], b.N[%d]\n", tls.ID(), atomic.LoadInt64(&round), b.N) }
以-cpu=2運(yùn)行該例子:
$go test -v -bench . paralell_test.go -cpu=2 goos: darwin goarch: amd64 BenchmarkParalell goroutine[1] enter BenchmarkParalell: round[1], b.N[1] goroutine[2] enter loop func in BenchmarkParalell: round[1], b.N[1] goroutine[2] exit loop func in BenchmarkParalell: round[1], loop[1] goroutine[3] enter loop func in BenchmarkParalell: round[1], b.N[1] goroutine[3] exit loop func in BenchmarkParalell: round[1], loop[0] goroutine[1] exit BenchmarkParalell: round[1], b.N[1] goroutine[4] enter BenchmarkParalell: round[2], b.N[100] goroutine[5] enter loop func in BenchmarkParalell: round[2], b.N[100] goroutine[5] exit loop func in BenchmarkParalell: round[2], loop[100] goroutine[6] enter loop func in BenchmarkParalell: round[2], b.N[100] goroutine[6] exit loop func in BenchmarkParalell: round[2], loop[0] goroutine[4] exit BenchmarkParalell: round[2], b.N[100] goroutine[4] enter BenchmarkParalell: round[3], b.N[10000] goroutine[7] enter loop func in BenchmarkParalell: round[3], b.N[10000] goroutine[8] enter loop func in BenchmarkParalell: round[3], b.N[10000] goroutine[8] exit loop func in BenchmarkParalell: round[3], loop[4576] goroutine[7] exit loop func in BenchmarkParalell: round[3], loop[5424] goroutine[4] exit BenchmarkParalell: round[3], b.N[10000] goroutine[4] enter BenchmarkParalell: round[4], b.N[1000000] goroutine[9] enter loop func in BenchmarkParalell: round[4], b.N[1000000] goroutine[10] enter loop func in BenchmarkParalell: round[4], b.N[1000000] goroutine[9] exit loop func in BenchmarkParalell: round[4], loop[478750] goroutine[10] exit loop func in BenchmarkParalell: round[4], loop[521250] goroutine[4] exit BenchmarkParalell: round[4], b.N[1000000] goroutine[4] enter BenchmarkParalell: round[5], b.N[25717561] goroutine[11] enter loop func in BenchmarkParalell: round[5], b.N[25717561] goroutine[12] enter loop func in BenchmarkParalell: round[5], b.N[25717561] goroutine[12] exit loop func in BenchmarkParalell: round[5], loop[11651491] goroutine[11] exit loop func in BenchmarkParalell: round[5], loop[14066070] goroutine[4] exit BenchmarkParalell: round[5], b.N[25717561] BenchmarkParalell-2 25717561 43.6 ns/op PASS ok command-line-arguments 1.176s
我們看到,針對(duì)BenchmarkParalell基準(zhǔn)測(cè)試的每一輪執(zhí)行,go test都會(huì)啟動(dòng)GOMAXPROCS數(shù)量的新goroutine,這些goroutine共同執(zhí)行b.N次循環(huán),每個(gè)goroutine會(huì)盡量相對(duì)均衡地分擔(dān)循環(huán)次數(shù)。
- JavaScript+DHTML語(yǔ)法與范例詳解詞典
- 新編Premiere Pro CC從入門到精通
- Getting Started with SQL Server 2012 Cube Development
- Learning Python by Building Games
- Symfony2 Essentials
- Learning Python Design Patterns
- 深入理解Android:Wi-Fi、NFC和GPS卷
- 從Java到Web程序設(shè)計(jì)教程
- JSP程序設(shè)計(jì)實(shí)例教程(第2版)
- 區(qū)塊鏈項(xiàng)目開發(fā)指南
- HTML5+CSS3+jQuery Mobile APP與移動(dòng)網(wǎng)站設(shè)計(jì)從入門到精通
- Mastering Concurrency Programming with Java 9(Second Edition)
- MySQL 8從零開始學(xué)(視頻教學(xué)版)
- 數(shù)據(jù)結(jié)構(gòu):Python語(yǔ)言描述
- 原型設(shè)計(jì):打造成功產(chǎn)品的實(shí)用方法及實(shí)踐