DirectX Ray Tracing, Vulkan’s NV Ray Tracing extension, and OptiX (or collectively, the RTX APIs) build on the same execution model for running user code to trace and process rays. The user creates a Shader Binding Table (SBT), which consists of a set of shader function handles and embedded parameters for these functions. The shaders in the table are executed depending on whether or not a geometry was hit by a ray, and which geometry was hit. When a geometry is hit, a set of parameters specified on both the host and device side of the application combine to determine which shader is executed. The RTX APIs provide a great deal of flexibility in how the SBT can be set up and indexed into during rendering, leaving a number of options open to applications. However, with incorrect SBT access leading to crashes and difficult bugs, sparse examples or documentation, and subtle differences in naming and SBT setup between the APIs, properly setting up and accessing the SBT is an especially thorny part of the RTX APIs for new users.
In this post we’ll look at the similarities and differences of each ray tracing API’s shader binding table to gain a fundamental understanding of the execution model. I’ll then present an interactive tool for constructing the SBT, building a scene which uses it, and executing trace calls on the scene to see which hit groups and miss shaders are called. Finally, we’ll look at how this model can be brought back to the CPU using Embree, to potentially build a unified low-level API for ray tracing.
To motivate why the RTX APIs use a shader binding table, we need to look at how ray tracing differs from rasterization. In a rasterizer we can batch objects by the shader they use and thus always know the set of shaders which must be called to render a set of objects. However, in a ray tracer we don’t know which object a ray will hit when we trace it, and thus need the entire scene available in memory (or some proxy of it) along with a function to call for each object which can process intersections with it. Our ray tracer needs access to all of the shaders which might be called for the scene, and a way to associate them with the objects in the scene. Each of the RTX APIs implements this using the Shader Binding Table. An analogy in the rasterization pipeline is bindless rendering, where the required data (textures, buffers) is uploaded to the GPU and accessed as needed by ID at runtime in the shader. In some sense, our shader dispatch is now “bindless”. The RTX execution pipeline is shown below.
The different shaders used in the ray tracing pipeline are:
The intersection, any hit and closest hit shaders are used together as a Hit Group to describe how to process rays and intersections with a geometry. The closest hit shader is required, while the intersection and any hit shaders are optional.
Note: for geometry not using an any hit shader, explicitly disable it using the corresponding force opaque or disable any hit geometry, instance or ray flags, otherwise an empty any hit shader will be called.
For a detailed overview and other interesting applications and use cases, see the Ray Tracing Gems Book, or check out the Introduction to DirectX Ray Tracing course given at SIGGRAPH 2018, or the Optix 7 Tutorial given at SIGGRAPH 2019.
The Shader Binding Table contains the entire set of shaders which may be called when ray tracing the scene, along with embedded parameters to be passed to these shaders. Each pair of shader functions and embedded parameters is referred to as a Shader Record. Since it’s common for geometries to share the same shader code but access different data, the embedded parameters in the record can be used to pass such data to the shaders. Thus, there should be at least one Shader Record in the table for each unique combination of shader functions and embedded parameters. It is possible to write the same shader record multiple times in the table, and this may be necessary depending on how the instances and geometries in the scene are setup. Finally, it is also possible to use the instance and geometry IDs available in the shaders to perform indirect access into other tables containing the scene data.
A Shader Record combines one or more shader functions with a set of parameters to be passed to these functions when they’re called by the runtime. Each shader record is written into the SBT as a set of function handles followed by the embedded parameters. While the size of the handles, alignment requirements for the records and parameters which can be embedded in the table differ across the RTX APIs, the functionality of the shader record is the same.
The Ray Generation shader record consists of a single function referring to the ray generation shader to be called, along with any desired embedded parameters for the function. While some parameters can be passed in the shader record, for parameters that get updated each frame (e.g., the camera position) it is better to pass them separately through a different globally accessible buffer. While multiple ray generation shaders can be written into the table, only one can be called for a each launch.
Each Hit Group shader record consists of a Closest Hit shader, Any Hit shader (optional) and Intersection shader (optional), followed by the set of embedded parameters to be made available to the three shaders. As the hit group which should be called is dependent on the instance and geometry which were hit and the ray type, the indexing rules for hit groups are the most complicated. The rules for hit group indexing are discussed in detail below.
The Miss shader record consists of a single function referring to the miss shader to be used, along with any desired embedded parameters for the function, similar to the ray generation record. The miss shader to call is selected by the ray type, though is specified separately from the hit group ray type to allow greater flexibility. This flexibility can be used to implement optimizations for occlusion rays, for example.
The main point of difficulty in setting up the SBT and scene geometry is understanding how the two are coupled together, i.e., if a geometry is hit by a ray, which shader record is called? The shader record to call is determined by parameters set on the instance, trace ray call and the order of geometries in the bottom-level acceleration structure. These parameters are set on both the host and device during different parts of the scene and pipeline setup and execution, making it difficult to see how they fit together.
When setting up the scene, the bottom level acceleration structured referenced by each instance can contain an array of geometries. The index of each geometry in this array is the geometry’s ID (). Each instance can be assigned a starting offset within the SBT () where its sub-table of hit group records start.
When tracing a ray on the device we can specify an additional SBT offset for the ray (), often referred to as the ray “type”, an SBT stride to apply (), typically referred to as the number of ray “types”, and the miss shader index to call ().
The equation used to determine which hit group record is called when a ray with SBT offset and stride hits a geometry with ID in an instance with offset is:
is the starting address of the table containing the hit group records, and is the stride between hit group records (in bytes) in the SBT.
Note: If you’re coming from Ray Tracing Gems, in 3.10 the parameter is referred to as , and is referred to as . While the equations are the same, the distinction of which parameters come from the ray, geometry and instance are clearer when written as above.
The ray offset () and stride () parameters are set per-ray when you call trace ray on the device. In a typical ray tracer, is the ray “type”, e.g., primary (0) or occlusion (1), and is the total number of ray types, in this example, 2. These parameters allow us to change which hit group is called based on the desired ray query. For example, we can often perform a cheaper intersection test for occlusion rays since we only care if the object was hit, but don’t need the exact hit point. The hit groups for the different ray types of the geometry are written consecutively in the SBT, so the ray stride is used to step the next geometry’s set of hit groups. This acts like a flattened 2D array of elements. In a typical ray tracer where we would have a separate primary and occlusion hit group record per-geometry, this stride would be 2.
The instance offset () and geometry ID () come from how each instance and bottom-level acceleration structure are configured when setting up the scene on the host. Each instance is assigned a base offset into the SBT, , which defines where its sub-table of hit group records begins. Note that this is not multiplied by in Equation 1. The geometry id, , is set implicitly as the index of the geometry in the bottom-level acceleration structure being instanced, and is multiplied by . In a typical ray tracer with two ray types (primary and occlusion), a hit group record for each ray type per-geometry and where instances do not share hit group records, the offset for instance can be calculated as:
Where are the number of geometries in instance .
The hit group records in the SBT would then be written in order by instance and the geometry order within the instance’s bottom-level acceleration structure, with separate primary and occlusion hit groups. A scene with two instances, the first with one geometry and the second with two, would have its hit group records laid out as shown below.
The indexing rules for miss shader records are far simpler than for hit groups. When tracing a ray we pass an additional miss shader offset, which is just the index of the miss shader to call if the ray does not hit an object.
As with the hit group records, is the starting address of the table containing the miss records and is the stride between miss records in bytes.
Now that we have a unified terminology to work with across the RTX APIs and took a general look at how the Shader Binding Table works, we’ll dive into the API-specific details of the SBT for each API. In each API the SBT is passed as a set of one or more buffers of shader records to the launch call (each record type can be in a separate buffer, or all in one), with information specifying the starting address and stride of each group of records. Each record consists of an API specific shader handle followed by any embedded parameters for the record. The biggest difference between the APIs is in how the embedded parameters for a shader record are specified on the host and retrieved on the device, and the types of parameters which can be embedded. The sizes of the shader record handles and their alignment requirements can also differ between the APIs.
For more documentation about the DXR API, also see the MSDN DXR Documentation, the DXR HLSL Documentation and the DXR Specification. Here we’ll just focus on the parts specific to the Shader Binding Table indexing.
In DXR, the parameters embedded in the shader record can be 8-byte handles (e.g., buffers, textures, etc.) or pairs of 4-byte constants (a single 4-byte constant must be padded to 8-bytes). The mapping of these input parameters from the shader record to the shader “registers” is specified using a Local Root Signature. The registers used for the local root signature parameters should not overlap with those used by the global root signature, which is shared by all shaders. One way to avoid conflicts is to use separate register spaces for the global and local root signature parameter registers.
The shader handle size is defined by
D3D12_SHADER_IDENTIFIER_SIZE_IN_BYTES (32 bytes),
the shader record alignment requirement is
D3D12_RAYTRACING_SHADER_TABLE_BYTE_ALIGNMENT (64 bytes).
The maximum size allowed for the stride is 4096 bytes, placing an upper bound on the number of
parameters which can be embedded for a shader record.
Instances in DXR are specified through the
The parameters which effect the SBT indexing are:
InstanceContributionToHitGroupIndex: This sets the instance’s SBT offset,
InstanceMask: While the mask does not effect which hit group is called, it can be used to skip traversal of instances entirely, by masking them out of the traversal
In the ray generation, closest hit and miss shaders the HLSL TraceRay function can be called to trace rays through the scene.
TraceRay takes the acceleration structure to trace against, a set of ray flags to adjust the traversal being performed, the instance mask, SBT indexing parameters, the ray and the payload to be updated by the closest hit or miss shaders. The parameters which effect the SBT indexing are:
InstanceInclusionMask: This mask effects which instances are masked out by and’ing it with each instance’s
RayContributionToHitGroupIndex: This is the ray’s SBT offset, .
MultiplierForGeometryContributionToHitGroupIndex: This is the ray’s SBT stride .
MissShaderIndex: This is the miss shader index to call, .
In Vulkan, the parameters embedded in the shader record can only be 4-byte constants,
but do not require extra padding to 8-bytes as in DXR. The embedded parameters are
accessed in the shader through a special buffer type declared with the
For example, if we wanted to pass a material ID for the geometry in the shader record
we could declare the buffer as follows:
The size of the shader handles and alignment requirements for the shader records
are queried at runtime by querying the
On my desktop with an RTX 2070 and Nvidia driver 441.12 the shader handle size
is 16 bytes and the shader record alignment requirement is 64 bytes.
The maximum allowed size for the shader record stride is 4096 bytes.
Instances in Vulkan are specified through the same structure layout as in DXR (see Vulkan Spec on Acceleration Structures). However, a definition is not provided in the headers and we must declare our own struct which matches the specified layout:
The parameters which effect the SBT indexing are:
instance_offset: This sets the instance’s SBT offset,
mask: While the mask does not effect which hit group is called, it can be used to skip traversal of instances entirely, by masking them out of the traversal
In the ray generation, closest hit and miss shaders the function traceNV from the GLSL NV Ray Tracing extension can be called to trace rays.
traceNV takes the acceleration structure to trace against, a set of ray flags to adjust the traversal being performed, the instance mask, SBT indexing parameters, the ray parameters and the index of the ray payload to be updated by the closest hit or miss shaders. The parameters which effect the SBT indexing are:
cullMask: This mask effects which instances are masked out by and’ing it with each instance’s
sbtRecordOffset: This is the ray’s SBT offset, .
sbtRecordStride: This is the ray’s SBT stride .
missIndex: This is the miss shader index to call, .
In contrast to HLSL, the ray payloads are specified as a special shader input/output
variable, where the value of
payload passed to traceNV selects which
one will be used. For example:
In OptiX, the parameters embedded in the shader record can be arbitrary structs,
potentially containing CUDA device pointers or texture handles.
A pointer to the embedded parameters for the shader can be retrieved in the shader with by calling
optixGetSbtDataPointer(), which returns a
void* to the portion of the SBT
after the shader handle.
The size of the shader handle is defined by
OPTIX_SBT_RECORD_HEADER_SIZE (32 bytes),
the shader record alignment requirement is
OPTIX_SBT_RECORD_ALIGNMENT (16 bytes).
Instances in OptiX are specified through the
The parameters which effect the SBT indexing are:
sbtOffset: This sets the instance’s SBT offset,
visibilityMask: While the mask does not effect which hit group is called, it can be used to skip traversal of instances entirely, by masking them out of the traversal
In the ray generation, closest hit and miss shaders the optixTrace function can be called to trace rays through the scene.
optixTrace takes the acceleration structure to trace against, a set of ray flags to adjust the traversal being performed, the instance mask, SBT indexing parameters, the ray parameters and up to 8 unsigned 32-bit values which are passed by reference through registers to the closest hit and miss shaders. To pass a struct larger than 32 bytes it’s possible to pass a pointer to a stack variable in the calling shader through by splitting it into two 32-bit ints, and then packing the pointer back together in the closest hit or miss shader. The parameters which effect the SBT indexing are:
visibilityMask: This mask effects which instances are masked out by and’ing it with each instance’s
SBToffset: This is the ray’s SBT offset, .
SBTstride: This is the ray’s SBT stride .
missSBTIndex: This is the miss shader index to call, .
Now that we’ve discussed the how the SBT works and what parts of the SBT, instance and trace ray setup are similar or different between the RTX APIs, lets do some hands on activities! Using the interactive tool below you can build a shader binding table, setup a scene, set the trace ray parameters and see which hit groups are called for the different geometries. Use this tool to explore different possible configurations for the SBT, scene and trace ray parameters to get a better understanding of how the different parameters can be combined for different renderer and scene configurations.
Here are some suggested configurations to try setting up:
Here you can add new hit and miss records with the buttons below, or remove them by double-clicking the record. Click a record to select it and add or remove parameters. Add parameters using the buttons below or double click a parameter to remove it. When you select a hit group record the instance containing the geometry which would call the record when hit by a ray for the current scene and trace ray setup is selected in the scene setup widget. If more than one geometry share the same record, the first one will be highlighted. The hit groups records which can be called by the currently selected instance in the scene setup widget are highlighted in light blue. The miss shader which will be called for the current trace ray call is also highlighted in light purple.
You can also change the ray tracing API to see how the different handle sizes and alignment requirements effect the SBT layout in memory. While it is also possible to use separate buffers for the ray generation, hit group and miss records I’ve kept them all in one buffer here to simplify the visualization.
Shader Record Parameters:
Shader Record Parameters:
Shader Record Parameters:
Here you can setup the scene you want to trace rays against by adding or removing instances, changing the number of geometries within instances, or changing each instance’s mask. To add an instance use the button below, to remove one double click on its BVH icon or geometries. Select an instance by clicking on it to modify its SBT offset, number of geometries or visibility mask with the inputs below. Setting to the recommended offset will set it to match a configuration like that shown in Figure 2, using Equations 2 and 3. The geometry ID () of each geometry in the instance is displayed next to the geometry in the widget.
The hit groups accessed by the selected instance will also be highlighted in light blue in the shader binding table. Click a specific geometry in the scene to see the corresponding hit group which will be called when intersected by a ray traced in the current trace ray call. If a geometry would access an out of bounds hit group record for the current trace call, it will be highlighted in red. If a geometry in the instance potentially accesses an out of bounds hit group record (i.e., across the ray stride) a warning will be displayed when it is selected. Instances which are masked out of the current ray traversal will be grayed out.
Here you can setup your trace ray call to set the ray SBT offset, stride and miss index. After setting up the trace call click on geometries in the scene to see which hit group will be called! The select miss shader button will select the miss shader which will be called in the SBT widget above.
Now that we’ve seen the similarities between the RTX API’s shader binding table setup, this leads us to an interesting question: If we wanted to write a unified programming model for the RTX APIs, we’d implement something to wrap over the RTX execution model and shader binding table, but what about the CPU? On the CPU we can use Embree to accelerate ray traversal to act as our “hardware-accelerated” API, ISPC as our SPMD programming language for vectorization and TBB for multi-threading; however, the natural way to write our CPU ray tracer differs significantly from how we’d write an RTX one, since we don’t have the rest of the RTX execution model, shader binding table and so on. But since we’re on the CPU we have pretty much full control over how the code is setup and run, so what if we just implemented the same execution model on top of Embree, ISPC and TBB?
I’ve begun exploring exactly this idea in the embree-sbt branch of ChameleonRT,
which now support enough features to re-implement my original Embree path tracer backend for
ChameleonRT in this
Embree-SBT model (two ray types, shader record parameters, opaque geometry).
Since ISPC is somewhat similar to CUDA, I’ve followed
the OptiX style SBT in my implementation. The shader handles are just
ISPC function pointers which are passed a
void* to the region
following the shader handle which contains any user-provided struct of embedded parameters.
On the ISPC side I provide a
trace ray wrapper function
rtcOccluded based on the ray flags and computes Equations 1 or 4
to determine which hit group or miss shader to call from the shader binding table.
What’s really exciting to note here is that not only does this work,
but in my limited testing it actually seems to perform similar to or even slightly
better than my original implementation!
Now that we’ve gotten an understanding of the RTX and SBT execution model and even seen how this model can be successfully brought back to the CPU, we find ourselves pointed in a pretty exciting direction. Although not discussed in this post, the rest of the host-side RTX APIs (e.g., setting up geometries) are quite similar, and we can implement anything we want on top of Embree to make it fit in. The most challenging differences between the APIs to hide are the different languages used to write the shaders (HLSL, GLSL, CUDA, ISPC), the different ways the shader modules are setup, and how they receive parameters embedded in the shader record and from global state. If we squint a little bit the four languages are actually very similar, but we’re still left with difficult differences to hide in how the host sets up the shader modules and parameters, and how those parameters are received by the shaders.
To unify these final differences it seems like what I really need is a programming language similar to HLSL, GLSL, ISPC and CUDA, but which gives me enough information that I can hook up the shader record and global parameters the user wants across all four APIs. The compiler would then output HLSL, GLSL, ISPC or CUDA as appropriate for the selected backend. Since the rest of the APIs are so similar, I think this is not a big stretch to implement, but will take some careful API and language design. My end goal is to write a single host and device code path for my path tracer in ChameleonRT which can run on all three RTX APIs and Embree. I’ve got to learn about compilers to do that, but watch this blog or follow me on Twitter for updates! If you have questions or comments about this post, Twitter or email are the best ways to get in touch.
Published: 20 November 2019