ORCID

Vasilios Kelefouras: 0000-0001-9591-913X

Abstract

Register Blocking (RB), also known as ‘Register-level Tiling’ or ‘unroll-and-jam,’ is a key compiler optimizationfor developing efficient micro-kernels. However, applying RB effectively is a complex task due to severalchallenges. First, the exploration space of possible RB configurations is vast. Second, RB and loop permutationare interdependent; therefore, addressing both optimizations simultaneously further inflates the explorationspace. Third, the effectiveness of RB is highly dependent on the target hardware platform and the specific loopkernel being optimized. As a result, an extensive and time-consuming fine-tuning process is necessary forachieving an efficient implementation.To address these challenges, a source-to-source analytical modelling approach is proposed. The RB factors,the loops to apply RB, the number of allocated variables/registers per array reference, and the loops’ orderingare generated by an analytical model, leveraging the target hardware architecture details and loop kernelcharacteristics. The proposed methodology has been evaluated on both embedded and general-purpose CPUs,using seven well-known loop kernels and three machine learning applications. The results show significantspeedups over the GCC compiler, the Pluto tool, and related work.

DOI Link

10.1145/3747183

Publication Date

2025-09-13

Publication Title

ACM Transactions on Embedded Computing Systems

Volume

Issue

ISSN

1539-9087

Acceptance Date

2025-01-01

Deposit Date

2025-12-10

Funding

This work is part of R-PODID project, supported by the Chips Joint Undertaking and its members, including the top-up funding by National Authorities of Italy, Turkey, Portugal, The Netherlands, Czech Republic, Latvia, Greece, and Romania under grant agreement No 101112338

Additional Links

https://dl.acm.org/doi/full/10.1145/3747183, https://www.scopus.com/pages/publications/105018922688

Keywords

CPUs, Compiler optimizations, data reuse, high performance computing, register blocking, register tiling, unroll-and-jam

First Page

Last Page

Recommended Citation

Anthimopoulos, T., Keramidas, G., Kelefouras, V., & Stamoulis, I. (2025) 'Register Blocking: A Source-to-Source Analytical Modelling Approach for Affine Loop Kernels', ACM Transactions on Embedded Computing Systems, 24(5), pp. 1-24. Available at: 10.1145/3747183

Download

COinS

School of Engineering, Computing and Mathematics

Register Blocking: A Source-to-Source Analytical Modelling Approach for Affine Loop Kernels

ORCID

Abstract

DOI Link

Publication Date

Publication Title

Volume

Issue

ISSN

Acceptance Date

Deposit Date

Funding

Additional Links

Keywords

First Page

Last Page

Recommended Citation

Search

Browse

About

Links

School of Engineering, Computing and Mathematics

Register Blocking: A Source-to-Source Analytical Modelling Approach for Affine Loop Kernels

Authors

ORCID

Abstract

DOI Link

Publication Date

Publication Title

Volume

Issue

ISSN

Acceptance Date

Deposit Date

Funding

Additional Links

Keywords

First Page

Last Page

Recommended Citation

Share

Search

Browse

About

Links