ORCID
- Vasilios Kelefouras: 0000-0001-9591-913X
Abstract
For the past several decades, optimizing compilers have been aprimary area of focus in both industry and academia. This continued research interest is a testament to the complexity of thistask, primarily stemming from the vast number of parameters thatmust be explored to attain near-optimal results. One of the keycompiler optimizations is "Register Blocking (RB)" also known as"Register-level Tiling" or "unroll-and-jam". RB can strongly reducethe number of executed Load/Store (L/S) instructions, and as aconsequence the number of data accesses in memory hierarchy,but due to its inherent complexities, fine-tuning is essential for itseffective implementation. To address this problem, in this work anew methodology is proposed for RB. The RB factors, the loopsto apply RB, the number of allocated variables/registers per arrayreference, and the loops’ ordering are generated by an analyticalmodel, leveraging the target hardware (HW) architecture details andloop kernel characteristics. The proposed methodology has beenevaluated on both embedded and general-purpose CPUs acrossseven well-known loop kernels, achieving high speedups and L/Sinstruction gains over GCC compiler, handwritten optimized codes,and the popular Pluto tool.
DOI
10.1145/3649153.3649194
Publication Date
2024-07-02
Keywords
Compiler Optimization, Register Blocking, Register Tiling, Unroll-and-Jam, High Performance Computing, Data Reuse, CPUs, Compiler Optimizations
First Page
71
Last Page
79
Recommended Citation
Anthimopulos, T., Keramidas, G., Kelefouras, V., & Stamoulis, I. (2024) 'Register Blocking: An Analytical Modelling Approach for Affine Loop Kernels', Available at: https://doi.org/10.1145/3649153.3649194