C++, the GPU, and Thrust: Sorting Numbers on the GPU

Sep 26, 2009, 02:04 (0 Talkback[s])
"The below code is based on the first example from the Thrust website with additions to show the input and sorted output on standard error. Notice that there is a host_vector and device_vector, these represent std::vector like containers which use the main memory and VRAM respectively. The thrust::sort() call with transfer control from the CPU to the GPU and the sort will be processed on the graphics card. Once the sort is complete, execution will begin again on the CPU at the line after thrust::sort() call. As the second last line of main() shows, you can directly access an element from the device vector from code running on the CPU, but as it involves accessing the VRAM from the CPU it will be a slow operation. It is faster to copy the whole device vector back into main memory (a host vector) before iterating over its elements.

"You can clearly see the host and device (RAM and VRAM) vectors used in the code to move the input and output data around. You might be wondering where are these kernel functions that were mentioned in the introduction of the series. The closest you get to one in this example is the invocation of thrust::sort which provides the same functionality as std::sort. While the outcome is the same, thrust::sort compiles its code to work on the GPU, in particular a version of thrust::less is used for element comparison."

