True Sharing

std::atomic<int> counter{0};

# Run in 4 threads
for (int i = 0; i < 100'000; ++i) {
  counter.fetch_add(1);
}
^ This is Faster?
std::atomic<int> counter{0};

# Run in 4 threads
int local = 0;
for (int i = 0; i < 100'000; ++i) {
  local++;
}
counter.fetch_add(local);
^ This is Faster?

* The benchmark is run under AMD Ryzen 9.

* For the full benchmark code, please refer here.

* For illustration purposes only, see FAQ for more details.