Show simple item record

dc.contributor.authorKelefouras, Vasileios
dc.contributor.authorDjemame, K
dc.contributor.authorKeramidas, G
dc.contributor.authorVoros, N
dc.date.accessioned2022-05-10T08:43:14Z
dc.date.issued2022-05-23
dc.identifier.issn0091-7036
dc.identifier.issn1573-7640
dc.identifier.urihttp://hdl.handle.net/10026.1/19210
dc.description.abstract

Reducing the number of data accesses in memory hierarchy is of paramount importance on modern computer systems. One of the key optimizations addressing this problem is loop tiling, a well-known loop transformation that enhances data locality in memory hierarchy. The selection of an appropriate tile size is tackled by using both static (analytical) and dynamic empirical (auto-tuning) methods. Current analytical models are not accurate enough to effectively model the complex modern memory hierarchies and loop kernels with diverse characteristics, while auto-tuning methods are either too time-consuming (due to the huge search space) or less accurate (when heuristics are used to reduce the search space). In this paper, we reveal two important inefficiencies of current analytical loop tiling methods and we provide the theoretical background on how current methods can address these inefficiencies. To this end, we propose a new loop tiling method for affine loop kernels where the cache size, cache line size and cache associativity are better utilized, compared to the existing methods. Our evaluation results prove the efficiency of the proposed method in terms of cache misses and execution time, against related works, icc/gcc compilers and Pluto tool, on x86 and ARM based platforms.

dc.format.extent405-432
dc.languageen
dc.language.isoen
dc.publisherSpringer
dc.rightsAttribution-ShareAlike 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by-sa/4.0/
dc.subjectLoop tiling
dc.subjectData cache
dc.subjectCache misses
dc.subjectAnalytical model
dc.subjectData reuse
dc.subjectEnergy consumption
dc.titleA methodology for efficient tile size selection for affine loop kernels
dc.typejournal-article
dc.typeJournal Article
plymouth.author-urlhttps://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000800989600001&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=11bb513d99f797142bcfeffcc58ea008
plymouth.issue3-4
plymouth.volume50
plymouth.publication-statusPublished
plymouth.journalInternational Journal of Parallel Programming
dc.identifier.doi10.1007/s10766-022-00734-5
plymouth.organisational-group/Plymouth
plymouth.organisational-group/Plymouth/Faculty of Science and Engineering
plymouth.organisational-group/Plymouth/Faculty of Science and Engineering/School of Engineering, Computing and Mathematics
plymouth.organisational-group/Plymouth/REF 2021 Researchers by UoA
plymouth.organisational-group/Plymouth/REF 2021 Researchers by UoA/UoA11 Computer Science and Informatics
plymouth.organisational-group/Plymouth/Users by role
plymouth.organisational-group/Plymouth/Users by role/Academics
dcterms.dateAccepted2022-04-30
dc.rights.embargodate2023-5-23
dc.identifier.eissn1573-7640
dc.rights.embargoperiodNot known
rioxxterms.versionofrecord10.1007/s10766-022-00734-5
rioxxterms.licenseref.urihttp://creativecommons.org/licenses/by-sa/4.0/
rioxxterms.typeJournal Article/Review


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-ShareAlike 4.0 International
Except where otherwise noted, this item's license is described as Attribution-ShareAlike 4.0 International

All items in PEARL are protected by copyright law.
Author manuscripts deposited to comply with open access mandates are made available in accordance with publisher policies. Please cite only the published version using the details provided on the item record or document. In the absence of an open licence (e.g. Creative Commons), permissions for further reuse of content should be sought from the publisher or author.
Theme by 
Atmire NV