Misalign

std::vector<int64_t> arr(1'000'000);
for (auto i = 0u; i < arr.size(); ++i) {
  sum += arr[i];
}
^ This is Faster?
struct __attribute__((packed)) Record {
  char type;
  int64_t value;
};
Record arr[1'000'000];
for (auto i = 0; i < 1'000'000; ++i) {
  sum += arr[i].value;
}
^ This is Faster?

* The benchmark is run under AMD Ryzen 9.

* For the full benchmark code, please refer here.

* For illustration purposes only, see FAQ for more details.