舉報

會員
Hands-On GPU Programming with Python and CUDA
Hands-OnGPUProgrammingwithPythonandCUDAhitsthegroundrunning:you’llstartbylearninghowtoapplyAmdahl’sLaw,useacodeprofilertoidentifybottlenecksinyourPythoncode,andsetupanappropriateGPUprogrammingenvironment.You’llthenseehowto"query"theGPU’sfeaturesandcopyarraysofdatatoandfromtheGPU’sownmemory.Asyoumakeyourwaythroughthebook,you’lllaunchcodedirectlyontotheGPUandwritefullblownGPUkernelsanddevicefunctionsinCUDAC.You’llgettogripswithprofilingGPUcodeeffectivelyandfullytestanddebugyourcodeusingNsightIDE.Next,you’llexploresomeofthemorewell-knownNVIDIAlibraries,suchascuFFTandcuBLAS.Withasolidbackgroundinplace,youwillnowapplyyournew-foundknowledgetodevelopyourveryownGPU-baseddeepneuralnetworkfromscratch.You’llthenexploreadvancedtopics,suchaswarpshuffling,dynamicparallelism,andPTXassembly.Inthefinalchapter,you’llseesometopicsandapplicationsrelatedtoGPUprogrammingthatyoumaywishtopursue,includingAI,graphics,andblockchain.Bytheendofthisbook,youwillbeabletoapplyGPUprogrammingtoproblemsrelatedtodatascienceandhigh-performancecomputing.
目錄(201章)
倒序
- coverpage
- Title Page
- Dedication
- About Packt
- Why subscribe?
- Packt.com
- Contributors
- About the author
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Why GPU Programming?
- Technical requirements
- Parallelization and Amdahl's Law
- Using Amdahl's Law
- The Mandelbrot set
- Profiling your code
- Using the cProfile module
- Summary
- Questions
- Setting Up Your GPU Programming Environment
- Technical requirements
- Ensuring that we have the right hardware
- Checking your hardware (Linux)
- Checking your hardware (windows)
- Installing the GPU drivers
- Installing the GPU drivers (Linux)
- Installing the GPU drivers (Windows)
- Setting up a C++ programming environment
- Setting up GCC Eclipse IDE and graphical dependencies (Linux)
- Setting up Visual Studio (Windows)
- Installing the CUDA Toolkit
- Installing the CUDA Toolkit (Linux)
- Installing the CUDA Toolkit (Windows)
- Setting up our Python environment for GPU programming
- Installing PyCUDA (Linux)
- Creating an environment launch script (Windows)
- Installing PyCUDA (Windows)
- Testing PyCUDA
- Summary
- Questions
- Getting Started with PyCUDA
- Technical requirements
- Querying your GPU
- Querying your GPU with PyCUDA
- Using PyCUDA's gpuarray class
- Transferring data to and from the GPU with gpuarray
- Basic pointwise arithmetic operations with gpuarray
- A speed test
- Using PyCUDA's ElementWiseKernel for performing pointwise computations
- Mandelbrot revisited
- A brief foray into functional programming
- Parallel scan and reduction kernel basics
- Summary
- Questions
- Kernels Threads Blocks and Grids
- Technical requirements
- Kernels
- The PyCUDA SourceModule function
- Threads blocks and grids
- Conway's game of life
- Thread synchronization and intercommunication
- Using the __syncthreads() device function
- Using shared memory
- The parallel prefix algorithm
- The naive parallel prefix algorithm
- Inclusive versus exclusive prefix
- A work-efficient parallel prefix algorithm
- Work-efficient parallel prefix (up-sweep phase)
- Work-efficient parallel prefix (down-sweep phase)
- Work-efficient parallel prefix — implementation
- Summary
- Questions
- Streams Events Contexts and Concurrency
- Technical requirements
- CUDA device synchronization
- Using the PyCUDA stream class
- Concurrent Conway's game of life using CUDA streams
- Events
- Events and streams
- Contexts
- Synchronizing the current context
- Manual context creation
- Host-side multiprocessing and multithreading
- Multiple contexts for host-side concurrency
- Summary
- Questions
- Debugging and Profiling Your CUDA Code
- Technical requirements
- Using printf from within CUDA kernels
- Using printf for debugging
- Filling in the gaps with CUDA-C
- Using the Nsight IDE for CUDA-C development and debugging
- Using Nsight with Visual Studio in Windows
- Using Nsight with Eclipse in Linux
- Using Nsight to understand the warp lockstep property in CUDA
- Using the NVIDIA nvprof profiler and Visual Profiler
- Summary
- Questions
- Using the CUDA Libraries with Scikit-CUDA
- Technical requirements
- Installing Scikit-CUDA
- Basic linear algebra with cuBLAS
- Level-1 AXPY with cuBLAS
- Other level-1 cuBLAS functions
- Level-2 GEMV in cuBLAS
- Level-3 GEMM in cuBLAS for measuring GPU performance
- Fast Fourier transforms with cuFFT
- A simple 1D FFT
- Using an FFT for convolution
- Using cuFFT for 2D convolution
- Using cuSolver from Scikit-CUDA
- Singular value decomposition (SVD)
- Using SVD for Principal Component Analysis (PCA)
- Summary
- Questions
- The CUDA Device Function Libraries and Thrust
- Technical requirements
- The cuRAND device function library
- Estimating π with Monte Carlo
- The CUDA Math API
- A brief review of definite integration
- Computing definite integrals with the Monte Carlo method
- Writing some test cases
- The CUDA Thrust library
- Using functors in Thrust
- Summary
- Questions
- Implementation of a Deep Neural Network
- Technical requirements
- Artificial neurons and neural networks
- Implementing a dense layer of artificial neurons
- Implementation of the softmax layer
- Implementation of Cross-Entropy loss
- Implementation of a sequential network
- Implementation of inference methods
- Gradient descent
- Conditioning and normalizing data
- The Iris dataset
- Summary
- Questions
- Working with Compiled GPU Code
- Launching compiled code with Ctypes
- The Mandelbrot set revisited (again)
- Compiling the code and interfacing with Ctypes
- Compiling and launching pure PTX code
- Writing wrappers for the CUDA Driver API
- Using the CUDA Driver API
- Summary
- Questions
- Performance Optimization in CUDA
- Dynamic parallelism
- Quicksort with dynamic parallelism
- Vectorized data types and memory access
- Thread-safe atomic operations
- Warp shuffling
- Inline PTX assembly
- Performance-optimized array sum
- Summary
- Questions
- Where to Go from Here
- Furthering your knowledge of CUDA and GPGPU programming
- Multi-GPU systems
- Cluster computing and MPI
- OpenCL and PyOpenCL
- Graphics
- OpenGL
- DirectX 12
- Vulkan
- Machine learning and computer vision
- The basics
- cuDNN
- Tensorflow and Keras
- Chainer
- OpenCV
- Blockchain technology
- Summary
- Questions
- Assessment
- Chapter 1 Why GPU Programming?
- Chapter 2 Setting Up Your GPU Programming Environment
- Chapter 3 Getting Started with PyCUDA
- Chapter 4 Kernels Threads Blocks and Grids
- Chapter 5 Streams Events Contexts and Concurrency
- Chapter 6 Debugging and Profiling Your CUDA Code
- Chapter 7 Using the CUDA Libraries with Scikit-CUDA
- Chapter 8 The CUDA Device Function Libraries and Thrust
- Chapter 9 Implementation of a Deep Neural Network
- Chapter 10 Working with Compiled GPU Code
- Chapter 11 Performance Optimization in CUDA
- Chapter 12 Where to Go from Here
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-06-10 19:26:12
推薦閱讀
- Linux運維之道(第3版)
- Kubernetes修煉手冊
- Linux設備驅動開發詳解(第2版)
- Haskell Financial Data Modeling and Predictive Analytics
- Learn Helm
- 開源安全運維平臺OSSIM疑難解析:入門篇
- 玩到極致 iPhone 4S完全攻略
- Alfresco 4 Enterprise Content Management Implementation
- 深入理解eBPF與可觀測性
- Docker+Kubernetes應用開發與快速上云
- 奔跑吧 Linux內核(入門篇)
- 數據中心系統工程及應用
- Mastering Reactive JavaScript
- 統信UOS應用開發進階教程
- 大學計算機應用基礎實踐教程(Windows 7+MS Office 2010)
- Java EE 7 Developer Handbook
- Java EE 8 High Performance
- Python機器學習系統構建(原書第3版)
- Drupal 7 Mobile Web Development Beginner’s Guide
- Embedded Systems Architecture
- Linux指令從入門到精通(“十二五”國家重點圖書出版規劃項目)
- 計算機操作系統實用教程
- UG NX 4.0基礎培訓標準教程
- 第一行代碼:Linux命令行
- MacTalk·人生元編程
- Linux C編程從初學到精通
- Kubernetes生產化實踐之路
- 電腦辦公(Windows 10 + Office 2010)入門與提高(超值版)
- 系統安裝與重裝直通車
- 計算機應用基礎上機指導與習題集(微課版)