![]() ![]() So If we try to call kernel after the memcpy, the kernel may or may not get valid data. I think the problem is that memcpy can't grantee that data has really been transfered into device, because there isn't any explicit synchronization API inside the loop. can we call kernel after memcpy? I don't think so. Let's recapture the core profiling code: // get pointer mapped to device buffer cmDevData So I doubt there is some problems on the profiling method. We all know that the highest bandwidth of PCI-e x16 Gen2 interface is 8000MB/s. The reported bandwidth becomes 12540.5MB/s. ![]() (MEMCOPY_ITERATIONS is changed from 100 to 10000 in case timer is not so precise.) When the transfer size is reduced to 1MB by: The measured host-to-device bandwidth is 6430.0MB/s when transfer size is 33.5MB. copy data from host to device by memcpyįor(unsigned int i = 0 i < MEMCOPY_ITERATIONS i++)ĬiErrNum = clEnqueueUnmapMemObject(cqCommandQueue, cmDevData, dm_idata, 0, NULL, NULL) Void* dm_idata = clEnqueueMapBuffer(cqCommandQueue, cmDevData, CL_TRUE, CL_MAP_WRITE, 0, memSize, 0, NULL, NULL, &ciErrNum) get pointer mapped to device buffer cmDevData H_data = (unsigned char*)clEnqueueMapBuffer(cqCommandQueue, cmPinnedData, CL_TRUE, CL_MAP_READ, 0, memSize, 0, NULL, NULL, &ciErrNum) get pointer mapped to host buffer cmPinnedData (initialize cmPinnedData with some data).ĬmDevData = clCreateBuffer(cxGPUContext, CL_MEM_READ_WRITE, memSize, NULL, &ciErrNum) I also list them here and add some comments and context code: //create a buffer cmPinnedData in hostĬmPinnedData = clCreateBuffer(cxGPUContext, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, memSize, NULL, &ciErrNum) The core test loop on Host-to-Device bandwidth is at around line 736~748. bandwidthtest -memory=pinned -access=mapped I am inspecting pinned memory and mapped accessing mode, which can be tested by invoke: ![]() The experiment is carried on in an Ubuntu 12.04 64-bits computer. #Netmap enable zero copy with host stack how toNvidia has offered an example about how to profile bandwidth between Host and Device, you can find codes here: (search "bandwidth"). ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |