Design and Implementation of 2D Convolution on x86/x64 Processors

Kelefouras, Vasileios; Keramidas, G

dc.contributor.author	Kelefouras, Vasileios
dc.contributor.author	Keramidas, G
dc.date.accessioned	2022-05-01T14:19:16Z
dc.date.available	2022-05-01T14:19:16Z
dc.date.issued	2022-04-29
dc.identifier.issn	1045-9219
dc.identifier.issn	1558-2183
dc.identifier.uri	http://hdl.handle.net/10026.1/19129
dc.description.abstract	In this paper, a new method for accelerating the 2D direct Convolution operation on x86/x64 processors is presented. It includes efficient vectorization by using SIMD intrinsics, bit-twiddling optimizations, the optimization of the division operation, multi-threading using OpenMP, register blocking and the shortest possible bit-width value of the intermediate results. The proposed method, which is provided as open-source, is general and can be applied to other processor families too, e.g., Arm. The proposed method has been evaluated on two different multi-core Intel CPUs, by using twenty different image sizes, 8-bit integer computations and the most commonly used kernel sizes (3x3, 5x5, 7x7, 9x9). It achieves from 2.8×2.8× to 40×40× speedup over the Intel IPP library (OpenCV GaussianBlur and Filter2D routines), from 105 ×105× to 400 ×400× speedup over the gemm-based convolution method (by using Intel MKL int8 matrix multiplication routine), and from 8.5×8.5× to 618×618× speedup over the vslsConvExec Intel MKL direct convolution routine. The proposed method is superior as it achieves far fewer arithmetical and load/store instructions.
dc.format.extent	3800-3815
dc.language.iso	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.subject	Convolution
dc.subject	gaussian blur
dc.subject	code optimization
dc.subject	vectorization
dc.subject	AVX
dc.subject	OpenMP
dc.subject	OpenCV
dc.subject	Intel MKL
dc.subject	Intel IPP
dc.subject	high performance computing (HPC)
dc.subject	image processing
dc.title	Design and Implementation of 2D Convolution on x86/x64 Processors
dc.type	journal-article
dc.type	Journal Article
plymouth.author-url	https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000831139000004&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=11bb513d99f797142bcfeffcc58ea008
plymouth.issue	12
plymouth.volume	33
plymouth.publication-status	Published
plymouth.journal	IEEE Transactions on Parallel and Distributed Systems
dc.identifier.doi	10.1109/tpds.2022.3171471
plymouth.organisational-group	/Plymouth
plymouth.organisational-group	/Plymouth/Faculty of Science and Engineering
plymouth.organisational-group	/Plymouth/Faculty of Science and Engineering/School of Engineering, Computing and Mathematics
plymouth.organisational-group	/Plymouth/REF 2021 Researchers by UoA
plymouth.organisational-group	/Plymouth/REF 2021 Researchers by UoA/UoA11 Computer Science and Informatics
plymouth.organisational-group	/Plymouth/Users by role
plymouth.organisational-group	/Plymouth/Users by role/Academics
dcterms.dateAccepted	2022-04-27
dc.rights.embargodate	2022-5-5
dc.identifier.eissn	1558-2183
dc.rights.embargoperiod	Not known
rioxxterms.versionofrecord	10.1109/tpds.2022.3171471
rioxxterms.licenseref.uri	http://www.rioxx.net/licenses/all-rights-reserved
rioxxterms.type	Journal Article/Review

Files in this item

Name:: Design_and_Implementation_of_2 ...
Size:: 5.217Mb
Format:: PDF

View/Open

Name:: UoP_Deposit_Agreement v1.1 ...
Size:: 125.4Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

School of Engineering, Computing and Mathematics

Show simple item record