References
List of all the references mentioned in all of the website, in random order.
- ARM - YIELD Instruction
- Agner Fog - Instruction Tables
- Agner Fog - Microarchitecture
- Agner Fog - Optimizing Software
- Algorithmica - Branchless Programming
- Algorithmica - Cache Associativity
- Algorithmica - Memory-Level Parallelism
- Algorithmica - Virtual Memory
- Berkeley - Cache Associativity
- C++ Core Guidelines - R.34
- CUTLASS - Swizzle
- FPGA4Fun - Pipelining
- GCC - GCC Common Function Attribute
- GCC - GCC Link Time Optimization
- Herb Sutter - GotW #91
- Inlining function
- Intel - Denormals Are Zeros
- Intel - Enhanced REP MOVSB/STOSB
- Intel - PAUSE Instruction
- Intel - Streaming Stores
- Intel 64 and IA-32 Architectures Optimization Reference Manual
- Intel Optimization Manual - Code Alignment
- Itanium C++ ABI
- LLVM - LLVM Link Time Optimization
- LWN - TLB Shootdowns
- NVIDIA - Data Transfer Optimization
- NVIDIA - Kernel Fusion
- NVIDIA - Memory Coalescing
- NVIDIA - Misaligned Access
- NVIDIA - Occupancy
- NVIDIA - Shared Memory Bank Conflicts
- No Branch
- Ridiculous Fish - Labor of Division
- Stack Overflow - POPCNT False Dependency
- StackOverflow - Measuring Cache Latencies
- StackOverflow - Why is processing a sorted array faster than processing an unsorted array?
- StackOverflow - clang ignoring attribute noinline
- System V AMD64 ABI
- The Old New Thing - An informal comparison of the three major implementations of std::string
- Travis Downs - Non-Temporal Stores
- Travis Downs - Write-Combining
- What Every Programmer Should Know About Memory
- Wikipedia - Amortized Analysis
- Wikipedia - Branch Predictor
- Wikipedia - Cache Coherence
- Wikipedia - Cache Placement Policies
- Wikipedia - Cache Prefetching
- Wikipedia - DRAM
- Wikipedia - DRAM Row Buffer
- Wikipedia - Data Structure Alignment
- Wikipedia - Denormal Number
- Wikipedia - Division Algorithm
- Wikipedia - False Sharing
- Wikipedia - Inline Expansion
- Wikipedia - Instruction Cache
- Wikipedia - Instruction Pipelining
- Wikipedia - Loop Fission
- Wikipedia - Loop Fusion
- Wikipedia - Loop Nest Optimization
- Wikipedia - Loop Tiling
- Wikipedia - MESI Protocol
- Wikipedia - Memory-Level Parallelism
- Wikipedia - Minor Page Fault
- Wikipedia - Pipeline Stall
- Wikipedia - Pointer Chasing
- Wikipedia - Priority Encoder
- Wikipedia - Profile-Guided Optimization
- Wikipedia - Register Allocation
- Wikipedia - Resource Sharing
- Wikipedia - Restrict
- Wikipedia - Retiming
- Wikipedia - Row- and Column-Major Order
- Wikipedia - Superscalar Processor
- Wikipedia - TLB Shootdown
- Wikipedia - Translation Lookaside Buffer
- Wikipedia - Virtual Method Table
- Xilinx - Pipelining
- Xilinx - Retiming
- cppreference - Static Local Variables
- division by a constant into multiplication
- docs.kernel.org - False Sharing
- easyperf - Store forwarding by example
- libdivide
- perf c2c
- perf c2c man page
- subnormal number
- unsigned integer divison