[OpengGL 4.3] Compute Shader 4 time slower than equivalent fragment shader

Hi,

I'm trying to port a gaussian blur fragment shader to a gaussian blur compute shader.

According to GPU Perf Studio the original fragment shader is bandwidth bound on a hd7750 so such a port make sense.

However the new compute shader is 4 time slower than the fragment shader.

I use a local size of 8x8 with a shared memory of 48x32 vec4 (do I reduce occupancy by allocating too much LDS ?). The dispatch grid is 210x32.
According to gpu perf studio I'm ALU bound and reducing the loop count in the kernel reduce the execution time.

This is surprising as the computation are the same as in the original fragment shader.

Here are the fragment shaders and the compute shaders.

I tried to output 4 pixels in the compute shader but it didn't change anything.

Is there something I'm doing wrong ?

________________________

uniformsampler2Dtex;

uniformvec2pixel;

uniformfloatsigma=5.;

outvec4FragColor;

voidmain()

{

vec2uv=gl_FragCoord.xy*pixel;

floatX=uv.x;

floatY=uv.y;

floatg0,g1,g2;

g0=1.0/(sqrt(2.0*3.14)*sigma);

g1=exp(-0.5/(sigma*sigma));

g2=g1*g1;

vec4sum=texture(tex,vec2(X,Y))*g0;

g0*=g1;

g1*=g2;

for(inti=1;i<9;i++){

sum+=texture(tex,vec2(X-i*pixel.x,Y))*g0;

sum+=texture(tex,vec2(X+i*pixel.x,Y))*g0;

g0*=g1;

g1*=g2;

}

FragColor=sum;

}

________________________

// Debug Name:
/*------------------- Shader 141 -------------------*/
#version 430
//C:\Users\vljn_000\Documents\GitHub\stk-code\bld\bin\Release/../../../data/shaders/gaussian.comp
#define VSLayer

uniform layout(size1x16) restrict readonly image2D source;
uniform layout(size1x16) volatile restrict writeonly image2D dest;
uniform vec2 pixel;
uniform float sigma = 5.;

layout (local_size_x = 8, local_size_y = 8) in;

shared vec4 local_src[8 + 2 * 8][32];

void main()
{
    int x = int(gl_LocalInvocationID.x), y = int(gl_LocalInvocationID.y);
    for (int i = 0; i < 4; i++)
    {
        ivec2 uv = ivec2(gl_GlobalInvocationID.x, gl_GlobalInvocationID.y * 4 + i);
        local_src[x][y + i * 8] = imageLoad(source, ivec2(uv) - ivec2(8, 0));
        local_src[x + 8][y + i * 8] = imageLoad(source, ivec2(uv));
        local_src[x + 16][y + i * 8] = imageLoad(source, ivec2(uv) + ivec2(8, 0));
    }

barrier();

    for (int i = 0; i < 4; i++)
    {
        float g0, g1, g2;
        g0 = 1.0 / (sqrt(2.0 * 3.14) * sigma);
        g1 = exp(-0.5 / (sigma * sigma));
        g2 = g1 * g1;
        vec4 sum = local_src[x + 8][y + i * 8] * g0;
        g0 *= g1;
        g1 *= g2;
        for (int j = 1; j < 8; j++) {
            sum += local_src[8 + x - j][y + i * 8] * g0;
            sum += local_src[8 + x + j][y + i * 8] * g0;
            g0 *= g1;
            g1 *= g2;
        }
        ivec2 uv = ivec2(gl_GlobalInvocationID.x, gl_GlobalInvocationID.y * 4 + i);
        imageStore(dest, ivec2(uv), sum);
    }
}

________________________

Vincent

[OpengGL 4.3] Compute Shader 4 time slower than equivalent fragment shader

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...