False Dependency

int sum = 0;
for (auto i = 0u; i < arr.size(); ++i) {
  sum += __builtin_popcount(arr[i]);
}
^ This is Faster?
int r0 = 0, r1 = 0, r2 = 0, r3 = 0;
for (auto i = 0u; i < arr.size(); i += 4) {
  r0 += __builtin_popcount(arr[i]);
  r1 += __builtin_popcount(arr[i + 1]);
  r2 += __builtin_popcount(arr[i + 2]);
  r3 += __builtin_popcount(arr[i + 3]);
}
sum = r0 + r1 + r2 + r3;
^ This is Faster?

* The benchmark is run under AMD Ryzen 9.

* For the full benchmark code, please refer here.

* For illustration purposes only, see FAQ for more details.