Show simple item record

dc.contributor.authorKelefouras, V
dc.contributor.authorKeramidas, G
dc.date.accessioned2022-05-01T14:19:16Z
dc.date.available2022-05-01T14:19:16Z
dc.date.issued2022-04-29
dc.identifier.issn1045-9219
dc.identifier.issn1558-2183
dc.identifier.urihttp://hdl.handle.net/10026.1/19129
dc.description.abstract

In this paper, a new method for accelerating the 2D direct Convolution operation on x86/x64 processors is presented. It includes efficient vectorization by using SIMD intrinsics, bit-twiddling optimizations, the optimization of the division operation, multi-threading using OpenMP, register blocking and the shortest possible bit-width value of the intermediate results. The proposed method, which is provided as open-source, is general and can be applied to other processor families too, e.g., Arm. The proposed method has been evaluated on two different multi-core Intel CPUs, by using twenty different image sizes, 8-bit integer computations and the most commonly used kernel sizes (3x3, 5x5, 7x7, 9x9). It achieves from 2.8×2.8× to 40×40× speedup over the Intel IPP library (OpenCV GaussianBlur and Filter2D routines), from 105 ×105× to 400 ×400× speedup over the gemm-based convolution method (by using Intel MKL int8 matrix multiplication routine), and from 8.5×8.5× to 618×618× speedup over the vslsConvExec Intel MKL direct convolution routine. The proposed method is superior as it achieves far fewer arithmetical and load/store instructions.

dc.format.extent3800-3815
dc.language.isoen
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.subjectConvolution
dc.subjectgaussian blur
dc.subjectcode optimization
dc.subjectvectorization
dc.subjectAVX
dc.subjectOpenMP
dc.subjectOpenCV
dc.subjectIntel MKL
dc.subjectIntel IPP
dc.subjecthigh performance computing (HPC)
dc.subjectimage processing
dc.titleDesign and Implementation of 2D Convolution on x86/x64 Processors
dc.typejournal-article
dc.typeArticle
plymouth.author-urlhttps://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000831139000004&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=11bb513d99f797142bcfeffcc58ea008
plymouth.issue12
plymouth.volume33
plymouth.publication-statusPublished
plymouth.journalIEEE Transactions on Parallel and Distributed Systems
dc.identifier.doi10.1109/tpds.2022.3171471
plymouth.organisational-group/Plymouth
plymouth.organisational-group/Plymouth/Faculty of Science and Engineering
plymouth.organisational-group/Plymouth/Faculty of Science and Engineering/School of Engineering, Computing and Mathematics
plymouth.organisational-group/Plymouth/REF 2021 Researchers by UoA
plymouth.organisational-group/Plymouth/REF 2021 Researchers by UoA/UoA11 Computer Science and Informatics
plymouth.organisational-group/Plymouth/Users by role
plymouth.organisational-group/Plymouth/Users by role/Academics
dcterms.dateAccepted2022-04-27
dc.rights.embargodate2022-5-5
dc.identifier.eissn1558-2183
dc.rights.embargoperiodNot known
rioxxterms.versionofrecord10.1109/tpds.2022.3171471
rioxxterms.licenseref.urihttp://www.rioxx.net/licenses/all-rights-reserved
rioxxterms.typeJournal Article/Review


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record


All items in PEARL are protected by copyright law.
Author manuscripts deposited to comply with open access mandates are made available in accordance with publisher policies. Please cite only the published version using the details provided on the item record or document. In the absence of an open licence (e.g. Creative Commons), permissions for further reuse of content should be sought from the publisher or author.
Theme by 
Atmire NV