官术网_书友最值得收藏!

Global memory

All blocks have read and write access to global memory. This memory is slow but can be accessed from anywhere in your device code. The concept of caching is used to speed up access to a global memory. All memories allocated using cudaMalloc will be a global memory. The following simple example demonstrates how you can use a global memory from your program:

#include <stdio.h>
#define N 5

__global__ void gpu_global_memory(int *d_a)
{
d_a[threadIdx.x] = threadIdx.x;
}

int main(int argc, char **argv)
{
int h_a[N];
int *d_a;

cudaMalloc((void **)&d_a, sizeof(int) *N);
cudaMemcpy((void *)d_a, (void *)h_a, sizeof(int) *N, cudaMemcpyHostToDevice);

gpu_global_memory << <1, N >> >(d_a);
cudaMemcpy((void *)h_a, (void *)d_a, sizeof(int) *N, cudaMemcpyDeviceToHost);

printf("Array in Global Memory is: \n");
for (int i = 0; i < N; i++)
{
printf("At Index: %d --> %d \n", i, h_a[i]);
}
return 0;
}

This code demonstrates how you can write in global memory from your device code. The memory is allocated using cudaMalloc from the host code and a pointer to this array is passed as a parameter to the kernel function. The kernel function populates this memory chunk with values of the thread ID. This is copied back to host memory for printing. The result is shown as follows: 

As we are using global memory, this operation will be slow. There are advanced concepts to speed up this operation which will be explained later on. In the next section, we will explain local memory and registers that are unique to all threads.

主站蜘蛛池模板: 合肥市| 隆子县| 怀集县| 枣强县| 哈巴河县| 富顺县| 丹寨县| 林甸县| 伊通| 温泉县| 洪江市| 浦东新区| 贵德县| 平顶山市| 无棣县| 运城市| 安远县| 枣阳市| 伊宁市| 江安县| 张北县| 宣恩县| 原平市| 宣汉县| 东丰县| 东乌| 衡山县| 揭西县| 边坝县| 敦化市| 阳高县| 邹城市| 阳朔县| 望谟县| 石嘴山市| 东山县| 阿鲁科尔沁旗| 兴仁县| 奉贤区| 疏附县| 宽甸|