Misalign
std::vector<int64_t> arr(1'000'000);
for (auto i = 0u; i < arr.size(); ++i) {
sum += arr[i];
}
^ This is Faster?
char buf[N * sizeof(int64_t) + 3];
char* p = buf + 3;
for (int i = 0; i < N; ++i) {
int64_t v;
memcpy(&v, p + i * sizeof(int64_t), sizeof(int64_t));
sum += v;
}
^ This is Faster?
* The benchmark is run under AMD Ryzen 9.
* For the full benchmark code, please refer here.
* For illustration purposes only, see FAQ for more details.