官术网_书友最值得收藏!

Configuring kernel parameters

For starting multiple threads on the device in parallel, we have to configure parameters in the kernel call, which are written inside the kernel launch operator. They specify the number of blocks and the number of threads per block. We can launch many blocks in parallel with many threads in each block. Normally, there is a limit of 512 or 1,024 threads per block. Each block runs on the streaming multiprocessor, and threads in one block can communicate with one another via shared memory. The programmer can't choose which multiprocessor will execute a particular block and in which order blocks or threads will execute. 

Suppose you want to start 500 threads in parallel; what is the modification that you can make to the kernel launch syntax that was shown previously? One option is to start one block of 500 threads via the following syntax:

gpuAdd<< <1,500> >> (1,4, d_c)

We can also start 500 blocks of one thread each or two blocks of 250 threads each. Accordingly, you have to modify values in the kernel launch operator. The programmer has to be careful that the number of threads per block does not go beyond the maximum supported limit of your GPU device. In this book, we are targeting computer vision applications where we need to work on two-and three-dimensional images. Here, it would be great if blocks and threads are not one-dimensional but more than that for better processing and visualization. 

GPU supports a three-dimensional grids of blocks and three-dimensional blocks of threads. It has the following syntax:

mykernel<< <dim3(Nbx, Nby,Nbz), dim3(Ntx, Nty,Ntz) > >> ()  

Here Nbx, Nby, and Nbz  indicate the number of blocks in a grid in the direction of the xy, and z axes, respectively. Similarly, Ntx, Nty, and Ntz indicate the number of threads in a block in the direction of the x, y, and z axes. If the y and z dimensions are not specified, they are taken as 1 by default. So, for example, to process an image, you can start a 16 x 16 grid of blocks, all containing 16 x 16 threads. The syntax will be as follows:

mykernel << <dim3(16,16),dim3(16,16)> >> ()

To summarize, the configuration of the number of blocks and the number of threads is very important while launching the kernel. It should be chosen with proper care depending on the application that we are working on and the GPU resources. The next section will explain some important CUDA functions added over regular ANSI C functions.

主站蜘蛛池模板: 岫岩| 水富县| 利辛县| 凤台县| 伊金霍洛旗| 乌审旗| 泸溪县| 博乐市| 上犹县| 白朗县| 射阳县| 黄山市| 寿光市| 池州市| 阳城县| 包头市| 鄂托克旗| 莱州市| 淮安市| 葫芦岛市| 高安市| 中山市| 平远县| 石首市| 象山县| 锦州市| 军事| 滦南县| 旬阳县| 曲靖市| 舞阳县| 颍上县| 元谋县| 治多县| 平安县| 绩溪县| 贵州省| 衢州市| 永宁县| 岢岚县| 湘乡市|