Soda Blog

Image Pair MSE with Compute Shaders

This is a short writeup on designing a GLSL compute shader to calculate the Mean-Squared-Error between a pair of images from a graphics API. I was trying to perform regression testing for my renderer by comparing scene images rendered by Vulkan and OpenGL backends, in theory they should be pixel perfect.

The MSE Compute Shader

The compute shader itself is actually super short, so I'll dump the full GLSL code here before discussing the details. The shader is written once in Vulkan GLSL dialect and transpiled into OpenGL GLSL when using OpenGL as backend.

#version 460

layout (local_size_x = 32, local_size_y = 32, local_size_z = 1) in;

layout (set = 0, binding = 0) buffer uResult
{
	uint error[];
} Result;

layout (set = 0, binding = 1, rgba8) uniform readonly image2D uImage1;
layout (set = 0, binding = 2, rgba8) uniform readonly image2D uImage2;

void main()
{
	vec3 pixel1 = imageLoad(uImage1, ivec2(gl_GlobalInvocationID.xy)).rgb;
	vec3 pixel2 = imageLoad(uImage2, ivec2(gl_GlobalInvocationID.xy)).rgb;

	vec3 dist = pixel1 - pixel2;
	uint dist_squared = uint(dot(dist, dist) * 10000.0);

	uint workgroup = gl_WorkGroupID.y * gl_NumWorkGroups.x + gl_WorkGroupID.x;
	atomicAdd(Result.error[workgroup], dist_squared); 
}

Each workgroup stores its error partial sum in the Result storage buffer. We will sum them on the CPU side later. Note that I tried to store errors as uint instead of float, this is due to floats not having atomic operation support (Vulkan has VK_EXT_shader_atomic_float, not sure for OpenGL). Since we upscaled the squared error by 10000.0, we need to remember to downscale on the CPU side.

It is very important that we use unorm image formats here, limiting the maximum squared distance between two pixels to 3. Now we can see that each workgroup partial sum has an upper bound of 3 * local_size_x * local_size_y = 3072 in my case. My upscale factor of 10000.0 does not cause an overflow for 32-bit uints.

Now we have a portable MSE compute shader that does not rely on any extensions!.

Result on CPU side

After the dispatch has finished, we accumulate all partial sums and divide by the number of total pixels on the CPU side.

double squared_errors = 0.0;

storage_data = (uint32_t*)vi_buffer_map(storage_buffer);
for (uint32_t i = 0; i < workgroup_x * workgroup_y; i++)
{
	squared_errors += (storage_data[i] / 1e4);
}

vi_buffer_unmap(storage_buffer);
double mse = squared_errors / (width1 * height1);

It was quite easy to get this up and running so I figured I should do a writeup, and one should be able to easily extend this code path to support other image error metrics such as PSNR.