This is a short writeup on designing a GLSL compute shader to calculate the Mean-Squared-Error between a pair of images from a graphics API. I was trying to perform regression testing for my renderer by comparing scene images rendered by Vulkan and OpenGL backends, in theory they should be pixel perfect.
The compute shader itself is actually super short, so I'll dump the full GLSL code here before discussing the details. The shader is written once in Vulkan GLSL dialect and transpiled into OpenGL GLSL when using OpenGL as backend.
#version 460 layout (local_size_x = 32, local_size_y = 32, local_size_z = 1) in; layout (set = 0, binding = 0) buffer uResult { uint error[]; } Result; layout (set = 0, binding = 1, rgba8) uniform readonly image2D uImage1; layout (set = 0, binding = 2, rgba8) uniform readonly image2D uImage2; void main() { vec3 pixel1 = imageLoad(uImage1, ivec2(gl_GlobalInvocationID.xy)).rgb; vec3 pixel2 = imageLoad(uImage2, ivec2(gl_GlobalInvocationID.xy)).rgb; vec3 dist = pixel1 - pixel2; uint dist_squared = uint(dot(dist, dist) * 10000.0); uint workgroup = gl_WorkGroupID.y * gl_NumWorkGroups.x + gl_WorkGroupID.x; atomicAdd(Result.error[workgroup], dist_squared); }
Each workgroup stores its error partial sum in the Result
storage buffer. We will sum them
on the CPU side later. Note that I tried to store errors as uint
instead of float
, this is
due to floats not having atomic operation support (Vulkan has VK_EXT_shader_atomic_float
,
not sure for OpenGL). Since we upscaled the squared error by 10000.0
, we need to remember
to downscale on the CPU side.
It is very important that we use unorm image formats here, limiting the maximum squared
distance between two pixels to 3. Now we can see that each workgroup partial sum has an
upper bound of 3 * local_size_x * local_size_y = 3072
in my case. My upscale factor of
10000.0
does not cause an overflow for 32-bit uint
s.
Now we have a portable MSE compute shader that does not rely on any extensions!.
After the dispatch has finished, we accumulate all partial sums and divide by the number of total pixels on the CPU side.
double squared_errors = 0.0; storage_data = (uint32_t*)vi_buffer_map(storage_buffer); for (uint32_t i = 0; i < workgroup_x * workgroup_y; i++) { squared_errors += (storage_data[i] / 1e4); } vi_buffer_unmap(storage_buffer); double mse = squared_errors / (width1 * height1);
It was quite easy to get this up and running so I figured I should do a writeup, and one should be able to easily extend this code path to support other image error metrics such as PSNR.