Clever trick, and since all the recursion happens at compile time, the resulting code is very efficient. The second adder function unpacks one argument at a time because it is defined to take one parameter and then a parameter pack. Note that to terminate the recursion we define the “base case” function template adder(T v), so that when the parameter pack is just a single parameter it just returns its value. (I borrowed the adder example from an excellent post on variadic templates by Eli Bendersky.)Īdder demonstrates how a variadic parameter pack can be unpacked recursively to operate on each parameter in turn. The _global_ function Kernel is a variadic template function which just forwards its parameter pack to the function adder, which is where the really interesting use of variadic templates happens. I’ll finish up with a little program that demonstrates a few things. Of course, you can also define kernel functions and _device_ functions with variadic arguments. This simplified launch code is currently available in a development branch of Hemi, which you can find on Github.
Here we leave the launch configuration up to the runtime, and if we write our kernel in a portable way, this code can be made fully portable. hemi::cudaLaunch(xyzw_frequency, count, text, int n) Using hemi::cudaLaunch, I can launch any _global_ kernel, regardless of how many parameters it has, like this (here I’m launching my xyzw_frequency kernel from my post The Power of C++11 in CUDA 7. C++11 also lets you query the number of parameters in a pack using sizeof.(). and pass them to our kernel function when we launch it. Inside the function, we unpack the parameters using args. Here you can see how we access the types of the arguments ( Arguments.) in the definition our variadic template function, in order to specify the type signature of the kernel function pointer *f. args)ĬudaLaunch(ExecutionPolicy(), f, args.) and a wrapper for default policy - i.e.
Void cudaLaunch(const ExecutionPolicy &policy, configureGrid uses the CUDA Occupancy API to choose grid/block dimensions That meant you had to limit the maximum number of arguments allowed, as well as the amount of code you had to maintain. To do the same thing before C++11 (and CUDA 7) required providing multiple implementations of cudaLaunch, one for each number of arguments we wanted to support. We can use it to refer to the type signature of our kernel function pointer f, and to the arguments of cudaLaunch. Void cudaLaunch(const ExecutionPolicy &p, Well, we can do that like this: template But to launch arbitrary kernels, we have to support arbitrary type signatures.
In that case we can let the library decide how to launch the kernel, simplifying our code. There are many cases where you don’t care to specify the number and size of thread blocks-you just want to run a kernel with “enough” threads to fully utilize the GPU, or to cover your data size. In my case, I want to provide simpler launch semantics in the Hemi library.
These facilities did not enable type safety and can be difficult to use.Īs an example, let’s say we want to abstract the launching of GPU kernels. Before C++11, the only way to write variadic functions was with the ellipsis (. To do this in a typesafe manner for polymorphic functions, you really need to take a variable number of types in a template. There are times when you need to write functions that take a variable number of arguments: variadic functions. In this post, I’ll cover variadic templates. In my post “ The Power of C++11 in CUDA 7” I covered some of the major new features of C++11, such as lambda functions, range-based for loops, and automatic type deduction ( auto). This means that you can use C++11 features not only in your host code compiled with nvcc, but also in device code.
CUDA 7 adds C++11 feature support to nvcc, the CUDA C++ compiler.