Basics of Compute Shaders in Unity

Prerequisites

Novice knowledge of Unity and C#
Novice knowledge of shaders

Goal

The introduction to compute shaders and their use in Unity
Structure of compute shaders
A compute shader that can convert an image into grayscale

Compute Shader Basics

Compute Shaders allows us to run programs on the GPU outside of the normal graphics pipeline. This is usually referred to as GPGPU or Compute. It is a powerful way to speed up tasks that can be parallelized and Unity has an awesome framework that allows us to easily interact with it.

Compared to the normal graphics pipelines which involves stages such as Vertex and Fragment we now only work on a single stage called a Kernel. A kernel program is no longer working in object or pixel space but rather a more generalized space where we interact directly with the data structures we use. We will for example no longer index into our textures using a UV coordinate but rather use the direct pixel index.

Since we no longer use the normal ShaderLab structure to create our GPU program we are closer to an HLSL implementation than usual. Lets break down the core aspects of how we create a Compute Shader we can call from C# in Unity.

The Compute Shader

#pragma kernel CSMain

[numthreads(1,1,1)]
void CSMain(uint3 id : SV_DispatchThreadID)
{

}

This is the smallest amount of code you need to create a Compute Shader. It wont be able to do anything useful for you since we haven’t defined any inputs or outputs.

We start by typing #pragma kernel CSMain which tells Unity that we will define a callable function called CSMain. Any function we create that we want to be called from C# needs to be defined with a #pragma.

Next up is [numthreads(1,1,1)] and the use of this will seem confusing at first. This defines the size of each group that will be ran on the size of the input data we tell the program to run on. It is used in tandem with the C# call computeShader.Dispatch(id, x, y, z) where the x, y, z and (1,1,1) parts will be used together by the input assembler to split the workload into groups that can run the program in parallel.

An input size of (512, 512, 1) and a group size of (32, 32, 1) requires that we call into Dispatch with computeShader.Dispatch(512 / 32, 512 / 32, 1) in order to properly divide the workload between the groups.

Lets do some pseudo-language to explain this in more detail:

// Create a 1024x1024 texture
Texture tex = new Texture(1024, 1024);

// Set numthreads to work on data in groups of 32x32x1
[numthreads(32, 32, 1)]
void CSMain(...){...}

// We locate the program we want to run
int programID = computeShader.FindKernel("CSMain");

// We now have to figure out the size of the data 
// we want the compute shader to run on.
// Since the Compute shader is configured to run on sections of 32x32 "items"
// we need to divide our texture size by this value.
computeShader.Dispatch(programID, 1024 / 32, 1024 / 32, 1);

For now this is enough to create the base understanding of a Compute Shader and how they can be used. Certain quirks will appear when our texture is not in Power by Two size. In this case we need some different parameters if we want to run a compute shader on a texture that is, say, 1920x1080 in order to avoid running our program outside the textures buffer.

Converting a Texture to Grayscale

Now lets take a look at how we can pass data to and from the Compute Shader. We don’t have access to Properties {...} as we do in regular shaders so we need to do some manual work here.

Several categories comes up when dealing with binding variables to shaders. Some types are stored in cbuffers such as a simple float or vector as these are stored in a more generalized location in GPU memory. Data such as a texture however is stored in a more specific way on the GPU, known as UAVs. As such they need to be bound to a specific program in order for us to use them.

Lets expand the bare compute shader we made above and add in some variables so we can convert a texture into grayscale.

// Declare our program
#pragma kernel ConvertToGrayscale

// Grayscale conversion
float ToGrayscale(float3 rgb) {
    static const float3 rgb2grayscale = float3(0.3, 0.59, 0.11);
    return rgb * rgb2grayscale;
}

// Our input parameters
Texture2D<float4> _InputTexture;
RWTexture2D<float4> _OutputTexture;
float _Intensity;

// Our program
[numthreads(32,32,1)]
void ConvertToGrayscale(uint3 id : SV_DispatchThreadID) {
    float4 textureSample = _InputTexture[id.xy];
    float3 grayscale = ToGrayscale(textureSample.rgb) * _Intensity;
    _OutputTexture[id.xy] = float4(grayscale, textureSample.a);
}

For a compute shader called from different contexts other than Unity we usually need a bit more syntax than what we have here. Unity handles several thing such as binding our texture UAVs to the proper registers and creating cbuffers that stores simpler inputs such a a float or vector.

We now have an actual example that could have some real world use case. It will take the contents of the texture we pass into _InputTexture, convert it to grayscale and then output that into _OutputTexture.

We define a function, ToGrayscale, that will only be accessible from inside this file. This function is standard for converting RGB to grayscale by using a weighted distribution. You might notice that the sum of the values inside rgb2grayscale is equal to 1, so even though more of the green channel ends up in the final image, the total intensity is still equal to 1.

Next up we define our first variables that are accessible from outside of the shader, namely Texture2D<float4> _InputTexture; and RWTexture2D<float4> _OutputTexture;. The important part for us is that we want a texture we can both read from and write into, but this puts some restrictions as to what type of texture we can send to it from C#. A regular Texture2D is not usable in this context so we need to make use of a RenderTexture to do our work here. The one we have defined as Texture2D<float4> accepts a regular Unity Texture2D, but we can only retreive data from it.

As an example I’ve also included a float called _Intensity as an example of how to bind a primitive value from C# to be accessible in our Compute Shader.

We can now send some data into and get some data out of our Compute Shader, nice!

The texture we pass into _InputTexture needs the Read/Write Enabled option enabled on the image asset settings. If not we can’t actually read from it.

ConvertToGrayscale Kernel

Lets take a closer look at the program we run to convert the texture into grayscale:

...
[numthreads(32,32,1)]
void ConvertToGrayscale(uint3 id : SV_DispatchThreadID) {
    float4 textureSample = _InputTexture[id.xy];
    float3 grayscale = ToGrayscale(textureSample.rgb) * _Intensity;
    _OutputTexture[id.xy] = float4(grayscale, textureSample.a);
}

We start by sampling the _InputTexture to get a value from the texture. One big difference between a Compute Shader and a regular shader is that we now operate on unsigned integers and not the normal UV space that is between 0 and 1. Do not fret though, as we can index our texture by using the array accessor. In other use cases we might have to convert to and from “UV Space” but we can ignore this for now.

We then call the ToGrayscale function, this step is no different than what we would do in a regular shader. We pass in the sampled colour and multiply it by our intensity value.

Lastly we need to put the grayscale value back into the _OutputTexture variable in order to access it from outside the Compute program. Both _InputTexture and _OutputTexture would need to be the exact same size in this instance.

Calling the Compute Shader from C#

Lets go back into C# land and see how we can configure and dispatch our ConvertToGrayscale program. This can be called from anywhere as long as we can locate the ComputeShader asset, but in this example I’m using a stripped down MonoBehaviour.

[SerializeField] ComputeShader grayscaleCompute;
int grayscaleKernel;

void Awake()
{
    // We cache the index of the kernel
    grayscaleKernel = grayscaleCompute.FindKernel("ConvertToGrayscale");
}

// Create a grayscale texture from the given texture
Texture2D ConvertToGrayscale(Texture2D texture, float intensity = 1f)
{
    // Create and configure a RenderTexture as the _OutputTexture target
    RenderTexture output = 
        new RenderTexture(texture.width, texture.height, 
        1, RenderTextureFormat.ARGB32, 0);
    output.enableRandomWrite = true;
    output.Create();

    // Set the variables on the compute shader
    // Textures requires that we pass in the kernel ID
    grayscaleCompute.SetTexture(grayscaleKernel, "_OutputTexture", output);
    grayscaleCompute.SetTexture(grayscaleKernel, "_InputTexture", texture);
    // Primitives such as "float" does not need the kernel ID
    grayscaleCompute.SetFloat("_Intensity", intensity);

    // Create the thread group sizes
    // We make sure the value is at least 1, in case the size of the
    // texture is less than 32 as we defined in '[numthreads(...)]'.
    Vector2 groupSize = new Vector2(
        Mathf.Max(1, texture.width / 32),
        Mathf.Max(1, texture.height / 32),
    );

    // We dispatch our compute program
    grayscaleCompute.Dispatch(grayscaleKernel, groupSize.x, groupSize.y, 1);

    // We copy our output from the compute into a new Texture2D containing 
    // the grayscale image
    Texture2D result = new Texture2D(texture.width, texture.height);
    Graphics.CopyTexture(output, result);

    return result;
}

To set the texture variables we declared on the shader we call on grayscaleCompute.SetTexture(...). There is also access to a few more variable types we can set, such as SetFloat(...) and SetColor(...). Some of them requires a specific kernel to be assigned to while others are “global” for the entire Compute Shader. Exactly why you need to assign to a specific kernel or program is loosely described in the section above, but I wont go into more detail on it here.

Summary

Compute Shaders can be applied to a ton of different tasks that are efficient when ran in parallel. Right now we have done work that could be ran from a regular shader but avoiding having to spin up the entire graphics pipeline can be benefitial.

There is technically very few limits as to what you can do in a Compute Shader. Of course, any task you do end up running should be able to run in parallel and there is limited support for sequential algorithms where the result of index (10, 10) depends on index (20, 20) but more advanced topics also covers solutions for this.

Upcoming texts will cover more in-depth topics. Some topics you can look out for are:

A Multi-Gradient Texture Generator
Simple GPU Particles
Custom 2D Lights