A methodology for efficient tile size selection for
affine loop kernels

Kelefouras, Vasileios; Djemame, K; Keramidas, G; Voros, N

dc.contributor.author	Kelefouras, Vasileios
dc.contributor.author	Djemame, K
dc.contributor.author	Keramidas, G
dc.contributor.author	Voros, N
dc.date.accessioned	2022-05-10T08:43:14Z
dc.date.issued	2022-05-23
dc.identifier.issn	0091-7036
dc.identifier.issn	1573-7640
dc.identifier.uri	http://hdl.handle.net/10026.1/19210
dc.description.abstract	Reducing the number of data accesses in memory hierarchy is of paramount importance on modern computer systems. One of the key optimizations addressing this problem is loop tiling, a well-known loop transformation that enhances data locality in memory hierarchy. The selection of an appropriate tile size is tackled by using both static (analytical) and dynamic empirical (auto-tuning) methods. Current analytical models are not accurate enough to effectively model the complex modern memory hierarchies and loop kernels with diverse characteristics, while auto-tuning methods are either too time-consuming (due to the huge search space) or less accurate (when heuristics are used to reduce the search space). In this paper, we reveal two important inefficiencies of current analytical loop tiling methods and we provide the theoretical background on how current methods can address these inefficiencies. To this end, we propose a new loop tiling method for affine loop kernels where the cache size, cache line size and cache associativity are better utilized, compared to the existing methods. Our evaluation results prove the efficiency of the proposed method in terms of cache misses and execution time, against related works, icc/gcc compilers and Pluto tool, on x86 and ARM based platforms.
dc.format.extent	405-432
dc.language	en
dc.language.iso	en
dc.publisher	Springer
dc.rights	Attribution-ShareAlike 4.0 International
dc.rights.uri	http://creativecommons.org/licenses/by-sa/4.0/
dc.subject	Loop tiling
dc.subject	Data cache
dc.subject	Cache misses
dc.subject	Analytical model
dc.subject	Data reuse
dc.subject	Energy consumption
dc.title	A methodology for efficient tile size selection for affine loop kernels
dc.type	journal-article
dc.type	Journal Article
plymouth.author-url	https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000800989600001&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=11bb513d99f797142bcfeffcc58ea008
plymouth.issue	3-4
plymouth.volume	50
plymouth.publication-status	Published
plymouth.journal	International Journal of Parallel Programming
dc.identifier.doi	10.1007/s10766-022-00734-5
plymouth.organisational-group	/Plymouth
plymouth.organisational-group	/Plymouth/Faculty of Science and Engineering
plymouth.organisational-group	/Plymouth/Faculty of Science and Engineering/School of Engineering, Computing and Mathematics
plymouth.organisational-group	/Plymouth/REF 2021 Researchers by UoA
plymouth.organisational-group	/Plymouth/REF 2021 Researchers by UoA/UoA11 Computer Science and Informatics
plymouth.organisational-group	/Plymouth/Users by role
plymouth.organisational-group	/Plymouth/Users by role/Academics
dcterms.dateAccepted	2022-04-30
dc.rights.embargodate	2023-5-23
dc.identifier.eissn	1573-7640
dc.rights.embargoperiod	Not known
rioxxterms.versionofrecord	10.1007/s10766-022-00734-5
rioxxterms.licenseref.uri	http://creativecommons.org/licenses/by-sa/4.0/
rioxxterms.type	Journal Article/Review