A methodology for efficient tile size selection for affine loop kernels
dc.contributor.author | Kelefouras, Vasileios | |
dc.contributor.author | Djemame, K | |
dc.contributor.author | Keramidas, G | |
dc.contributor.author | Voros, N | |
dc.date.accessioned | 2022-05-10T08:43:14Z | |
dc.date.issued | 2022-05-23 | |
dc.identifier.issn | 0091-7036 | |
dc.identifier.issn | 1573-7640 | |
dc.identifier.uri | http://hdl.handle.net/10026.1/19210 | |
dc.description.abstract |
Reducing the number of data accesses in memory hierarchy is of paramount importance on modern computer systems. One of the key optimizations addressing this problem is loop tiling, a well-known loop transformation that enhances data locality in memory hierarchy. The selection of an appropriate tile size is tackled by using both static (analytical) and dynamic empirical (auto-tuning) methods. Current analytical models are not accurate enough to effectively model the complex modern memory hierarchies and loop kernels with diverse characteristics, while auto-tuning methods are either too time-consuming (due to the huge search space) or less accurate (when heuristics are used to reduce the search space). In this paper, we reveal two important inefficiencies of current analytical loop tiling methods and we provide the theoretical background on how current methods can address these inefficiencies. To this end, we propose a new loop tiling method for affine loop kernels where the cache size, cache line size and cache associativity are better utilized, compared to the existing methods. Our evaluation results prove the efficiency of the proposed method in terms of cache misses and execution time, against related works, icc/gcc compilers and Pluto tool, on x86 and ARM based platforms. | |
dc.format.extent | 405-432 | |
dc.language | en | |
dc.language.iso | en | |
dc.publisher | Springer | |
dc.rights | Attribution-ShareAlike 4.0 International | |
dc.rights.uri | http://creativecommons.org/licenses/by-sa/4.0/ | |
dc.subject | Loop tiling | |
dc.subject | Data cache | |
dc.subject | Cache misses | |
dc.subject | Analytical model | |
dc.subject | Data reuse | |
dc.subject | Energy consumption | |
dc.title | A methodology for efficient tile size selection for affine loop kernels | |
dc.type | journal-article | |
dc.type | Journal Article | |
plymouth.author-url | https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000800989600001&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=11bb513d99f797142bcfeffcc58ea008 | |
plymouth.issue | 3-4 | |
plymouth.volume | 50 | |
plymouth.publication-status | Published | |
plymouth.journal | International Journal of Parallel Programming | |
dc.identifier.doi | 10.1007/s10766-022-00734-5 | |
plymouth.organisational-group | /Plymouth | |
plymouth.organisational-group | /Plymouth/Faculty of Science and Engineering | |
plymouth.organisational-group | /Plymouth/Faculty of Science and Engineering/School of Engineering, Computing and Mathematics | |
plymouth.organisational-group | /Plymouth/REF 2021 Researchers by UoA | |
plymouth.organisational-group | /Plymouth/REF 2021 Researchers by UoA/UoA11 Computer Science and Informatics | |
plymouth.organisational-group | /Plymouth/Users by role | |
plymouth.organisational-group | /Plymouth/Users by role/Academics | |
dcterms.dateAccepted | 2022-04-30 | |
dc.rights.embargodate | 2023-5-23 | |
dc.identifier.eissn | 1573-7640 | |
dc.rights.embargoperiod | Not known | |
rioxxterms.versionofrecord | 10.1007/s10766-022-00734-5 | |
rioxxterms.licenseref.uri | http://creativecommons.org/licenses/by-sa/4.0/ | |
rioxxterms.type | Journal Article/Review |