Hello,
We using compute shaders to calculate fractal noise. The problem is, that we can't compute larger inputs than 4096 vectors. After we exceed this line, the shader returns the same value for all remainders, or skips it.
We made a video showing this on a visual output:
(external link: amd compute shader bug - YouTube)
As you can see, if the resolution exceeds 64x64 (4096) it breaks.
When we have no input buffer (means just creating random test values in shader, and putting them into the output buffer), we don't get these problems at all. So I'm pretty sure its input buffer bug.
Here's how we run it from input to the finished result:
public float[] GetValues(Vector4[] input) { // Takes Vec4 1D array as input. Finished noise is 1D float array as output GL.UseProgram(program); // Generate Input Buffers int inBuffer = GL.GenBuffer(); // First buffer contains the vec4 data and is our "problem child" GL.BindBuffer(BufferTarget.ArrayBuffer, inBuffer); // No difference using ArrayBuffer or ShaderStorageBuffer GL.BufferData(BufferTarget.ArrayBuffer, new IntPtr(Vector4.SizeInBytes * input.Length), input, BufferUsageHint.StaticDraw); GL.BindBufferBase(BufferTarget.ShaderStorageBuffer, 0, inBuffer); // Bind buffer to shader location 0 int inPermBuffer = GL.GenBuffer(); // Second input is permutation data, its size is only a kb and makes no problems GL.BindBuffer(BufferTarget.ArrayBuffer, inPermBuffer); GL.BufferData(BufferTarget.ArrayBuffer, new IntPtr(sizeof(int) * permutation.Length), permutation, BufferUsageHint.StaticDraw); GL.BindBufferBase(BufferTarget.ShaderStorageBuffer, 1, inPermBuffer); // Bind buffer to shader location 1 //Generate Ouput Buffer float[] result = new float[input.Length]; int outBuffer = GL.GenBuffer(); // The buffer which contains the result GL.BindBuffer(BufferTarget.ArrayBuffer, outBuffer); GL.BufferData(BufferTarget.ArrayBuffer, new IntPtr(sizeof(float) * input.Length), result, BufferUsageHint.StaticCopy); GL.BindBufferBase(BufferTarget.ShaderStorageBuffer, 2, outBuffer); // Bind buffer to shader location 3 // Start compute GL.DispatchCompute((int)Math.Ceiling(input.Length / 256.0), 1, 1); GL.MemoryBarrier(MemoryBarrierMask.ShaderStorageBarrierBit); // Getting pointer to result data IntPtr outBufferPointer = GL.MapBuffer(BufferTarget.ShaderStorageBuffer, BufferAccess.ReadOnly); // Copy the result to our managed result variable; Marshal.Copy(outBufferPointer, result, 0, input.Length); // Exiting buffer access GL.UnmapBuffer(BufferTarget.ShaderStorageBuffer); // Clean up GL.DeleteBuffer(inBuffer); GL.DeleteBuffer(inPermBuffer); GL.DeleteBuffer(outBuffer); return result; }
It's C# code, but it easy to read for c++ guys anyway
Here is how we implemented our buffers in GLSL and how we acces them in the main function:
#version 430 core struct vertex { vec4 pos; }; layout(std430,binding = 0) readonly buffer iBuffer { vertex Vectors[]; }; layout(std430,binding = 1) readonly buffer pBuffer
{ int Permutation[]; }; layout(std430,binding = 2) writeonly buffer oBuffer { float Output[]; }; layout (local_size_x = 256) in; void main() { vec4 in_pos = Vectors[gl_GlobalInvocationID.x].pos; Output[gl_GlobalInvocationID.x] = /*Tons of instructions using in_pos here*/ ; }
We are experimenting for 2 days now to get this working on AMD cards (NV -> no problems at all).
Are we using the storage shaders wrong, or is this really a memory bug on AMD cards ?
Hardware: 7970
Driver: Tested with latest stable and latest beta driver.
I appreciate all comments