본문 바로가기
lesson/CUDA Pararrel Programming

CUDA 프로그램 작동 원리

by Peter Choi 2024. 6. 18.
반응형

CUDA 프로그램의 작동 원리는 3가지로 나뉘어 집니다. 

 

첫 번째로 호스트에서 디바이스 데이터로 데이터가 복사됩니다.

 

두 번째는 GPU 연산입니다.

 

세 번째는 디바이스에서 호스트로 데이터를 복사하는 과정입니다.

 

cudaError_t cudaMalloc (void** ptr, size_t size)

여기에서 cudaError_t는 enum cudaError 안에 명세되어 잇습니다.

 

 

enum cudaError

 

CUDA error types Values

 

cudaSuccess = 0The API call returned with no errors. In the case of query calls, this also means that the operation being queried is complete (see cudaEventQuery() and cudaStreamQuery()).

 
cudaErrorInvalidValue = 1This indicates that one or more of the parameters passed to the API call is not within an acceptable range of values.
 
cudaErrorMemoryAllocation = 2The API call failed because it was unable to allocate enough memory or other resources to perform the requested operation.
 
cudaErrorInitializationError = 3The API call failed because the CUDA driver and runtime could not be initialized.
 
cudaErrorCudartUnloading = 4This indicates that a CUDA Runtime API call cannot be executed because it is being called during process shut down, at a point in time after CUDA driver has been unloaded.
 
cudaErrorProfilerDisabled = 5This indicates profiler is not initialized for this run. This can happen when the application is running with external profiling tools like visual profiler.
 
cudaErrorProfilerNotInitialized = 6Deprecated

This error return is deprecated as of CUDA 5.0. It is no longer an error to attempt to enable/disable the profiling via cudaProfilerStart or cudaProfilerStop without initialization.

 
cudaErrorProfilerAlreadyStarted = 7Deprecated

This error return is deprecated as of CUDA 5.0. It is no longer an error to call cudaProfilerStart() when profiling is already enabled.

 
cudaErrorProfilerAlreadyStopped = 8Deprecated

This error return is deprecated as of CUDA 5.0. It is no longer an error to call cudaProfilerStop() when profiling is already disabled.

 
cudaErrorInvalidConfiguration = 9This indicates that a kernel launch is requesting resources that can never be satisfied by the current device. Requesting more shared memory per block than the device supports will trigger this error, as will requesting too many threads or blocks. See cudaDeviceProp for more device limitations.
 
이외에도 많은 오류들이 있습니다. 아래의 링크에서 확인해보세요.
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038
 

CUDA Runtime API :: CUDA Toolkit Documentation

When the /p flags parameter of cudaExternalSemaphoreWaitParams contains this flag, it indicates that waiting an external semaphore object should skip performing appropriate memory synchronization operations over all the external memory objects that are imp

docs.nvidia.com

 

디바이스 메모리 할당/초기화/해제 예제

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>

void checkDeviceMemory(void)
{
	size_t free, total;

	cudaMemGetInfo(&free, &total);
	printf("Device memory (free/total) = %lld/%lld bytes\n", free, total);
}

int main(void)
{
	int* dDataPtr;
	cudaError_t errorCode;

	checkDeviceMemory();
	errorCode = cudaMalloc(&dDataPtr, sizeof(int)*1024*1024); //4*1024*1024=4MB
	printf("cudaMalloc - &s\n", cudaGetErrorName(errorCode));
	checkDeviceMemory();

	errorCode = cudaMemset(dDataPtr, 0, sizeof(int)*1024*1024);
	printf("cudaMemset - %s\n", cudaGetErrorName(errorCode));

	errorCode = cudaFree(dDataPtr);
	printf("cudaFree - %s\n", cudaGetErrorName(errorCode));
	checkDeviceMemory();
}

 

 

반응형

'lesson > CUDA Pararrel Programming' 카테고리의 다른 글

CUDA 툴킷 설치 방법  (0) 2024.06.18

댓글