False Dependency
int sum = 0;
for (auto i = 0u; i < arr.size(); ++i) {
sum += __builtin_popcount(arr[i]);
}
^ This is Faster?
int r0 = 0, r1 = 0, r2 = 0, r3 = 0;
for (auto i = 0u; i < arr.size(); i += 4) {
r0 += __builtin_popcount(arr[i]);
r1 += __builtin_popcount(arr[i + 1]);
r2 += __builtin_popcount(arr[i + 2]);
r3 += __builtin_popcount(arr[i + 3]);
}
sum = r0 + r1 + r2 + r3;
^ This is Faster?
* The benchmark is run under AMD Ryzen 9.
* For the full benchmark code, please refer here.
* For illustration purposes only, see FAQ for more details.