官术网_书友最值得收藏!

CUDA program structure

We have seen a very simple Hello, CUDA! program earlier, that showcased some important concepts related to CUDA programs. A CUDA program is a combination of functions that are executed either on the host or on the GPU device. The functions that do not exhibit parallelism are executed on the CPU, and the functions that exhibit data parallelism are executed on the GPU. The GPU compiler segregates these functions during compilation. As seen in the previous chapter, functions meant for execution on the device are defined using the __global__ keyword and compiled by the NVCC compiler, while normal C host code is compiled by the C compiler. A CUDA code is basically the same ANSI C code with the addition of some keywords needed for exploiting data parallelism.

So, in this section, a simple two-variable addition program is taken to explain important concepts related to CUDA programming, such as kernel calls, passing parameters to kernel functions from host to device, the configuration of kernel parameters, CUDA APIs needed to exploit data parallelism, and how memory allocation takes place on the host and the device. 

主站蜘蛛池模板: 古蔺县| 和龙市| 濮阳市| 新疆| 吉木萨尔县| 万盛区| 加查县| 丹寨县| 永善县| 桃园县| 辽阳县| 淮南市| 瑞昌市| 庆元县| 嘉禾县| 确山县| 达拉特旗| 江陵县| 辽宁省| 萨迦县| 广河县| 滨州市| 韶关市| 大港区| 南阳市| 乐业县| 罗田县| 临桂县| 东台市| 岢岚县| 改则县| 庆阳市| 平潭县| 横山县| 拉孜县| 武隆县| 青田县| 东明县| 余姚市| 前郭尔| 图们市|