True Sharing
std::atomic<int> counter{0};
# Run in 4 threads
for (int i = 0; i < 100'000; ++i) {
counter.fetch_add(1);
}
^ This is Faster?
std::atomic<int> counter{0};
# Run in 4 threads
int local = 0;
for (int i = 0; i < 100'000; ++i) {
local++;
}
counter.fetch_add(local);
^ This is Faster?
* The benchmark is run under AMD Ryzen 9.
* For the full benchmark code, please refer here.
* For illustration purposes only, see FAQ for more details.