“The below code is based on the first example from the Thrust
website with additions to show the input and sorted output on
standard error. Notice that there is a host_vector and
device_vector, these represent std::vector like containers which
use the main memory and VRAM respectively. The thrust::sort() call
with transfer control from the CPU to the GPU and the sort will be
processed on the graphics card. Once the sort is complete,
execution will begin again on the CPU at the line after
thrust::sort() call. As the second last line of main() shows, you
can directly access an element from the device vector from code
running on the CPU, but as it involves accessing the VRAM from the
CPU it will be a slow operation. It is faster to copy the whole
device vector back into main memory (a host vector) before
iterating over its elements.“You can clearly see the host and device (RAM and VRAM) vectors
used in the code to move the input and output data around. You
might be wondering where are these kernel functions that were
mentioned in the introduction of the series. The closest you get to
one in this example is the invocation of thrust::sort which
provides the same functionality as std::sort. While the outcome is
the same, thrust::sort compiles its code to work on the GPU, in
particular a version of thrust::less is used for element
comparison.”
C++, the GPU, and Thrust: Sorting Numbers on the GPU
By
Get the Free Newsletter!
Subscribe to Developer Insider for top news, trends, & analysis