Hi Graham-san,
following image and list is the result of GPU Perfstudio.
According to the result, it seems to glActiveTexture and glBindTexture are the bottleneck.
Each function takes 16 to 32 milli second by GPU Perfstudio's CPU time measurement.
attached file is detailed captured data. (csv format file)