ORCID
- Vasilios Kelefouras: 0000-0001-9591-913X
Abstract
Register Blocking (RB), also known as ‘Register-level Tiling’ or ‘unroll-and-jam,’ is a key compiler optimizationfor developing efficient micro-kernels. However, applying RB effectively is a complex task due to severalchallenges. First, the exploration space of possible RB configurations is vast. Second, RB and loop permutationare interdependent; therefore, addressing both optimizations simultaneously further inflates the explorationspace. Third, the effectiveness of RB is highly dependent on the target hardware platform and the specific loopkernel being optimized. As a result, an extensive and time-consuming fine-tuning process is necessary forachieving an efficient implementation.To address these challenges, a source-to-source analytical modelling approach is proposed. The RB factors,the loops to apply RB, the number of allocated variables/registers per array reference, and the loops’ orderingare generated by an analytical model, leveraging the target hardware architecture details and loop kernelcharacteristics. The proposed methodology has been evaluated on both embedded and general-purpose CPUs,using seven well-known loop kernels and three machine learning applications. The results show significantspeedups over the GCC compiler, the Pluto tool, and related work.
DOI Link
Publication Date
2025-09-13
Publication Title
ACM Transactions on Embedded Computing Systems
Volume
24
Issue
5
ISSN
1539-9087
Acceptance Date
2025-01-01
Deposit Date
2025-12-10
Funding
This work is part of R-PODID project, supported by the Chips Joint Undertaking and its members, including the top-up funding by National Authorities of Italy, Turkey, Portugal, The Netherlands, Czech Republic, Latvia, Greece, and Romania under grant agreement No 101112338
Additional Links
https://dl.acm.org/doi/full/10.1145/3747183, https://www.scopus.com/pages/publications/105018922688
Keywords
CPUs, Compiler optimizations, data reuse, high performance computing, register blocking, register tiling, unroll-and-jam
First Page
1
Last Page
24
Recommended Citation
Anthimopoulos, T., Keramidas, G., Kelefouras, V., & Stamoulis, I. (2025) 'Register Blocking: A Source-to-Source Analytical Modelling Approach for Affine Loop Kernels', ACM Transactions on Embedded Computing Systems, 24(5), pp. 1-24. Available at: 10.1145/3747183
