ORCID
- Vasilios Kelefouras: 0000-0001-9591-913X
Abstract
In this paper, a new method for accelerating the 2D direct Convolution operation on x86/x64 processors is presented. It includes efficient vectorization by using SIMD intrinsics, bit-twiddling optimizations, the optimization of the division operation, multi-threading using OpenMP, register blocking and the shortest possible bit-width value of the intermediate results. The proposed method, which is provided as open-source, is general and can be applied to other processor families too, e.g., Arm. The proposed method has been evaluated on two different multi-core Intel CPUs, by using twenty different image sizes, 8-bit integer computations and the most commonly used kernel sizes (3x3, 5x5, 7x7, 9x9). It achieves from 2.8× to 40× speedup over the Intel IPP library (OpenCV GaussianBlur and Filter2D routines), from 105× to 400× speedup over the gemm-based convolution method (by using Intel MKL int8 matrix multiplication routine), and from 8.5× to 618× speedup over the vslsConvExec Intel MKL direct convolution routine. The proposed method is superior as it achieves far fewer arithmetical and load/store instructions.
DOI Link
Publication Date
2022-12-01
Publication Title
IEEE Transactions on Parallel and Distributed Systems
Volume
14
Issue
8
ISSN
1045-9219
Acceptance Date
2022-04-27
Deposit Date
2022-01-05
Embargo Period
2022-05-05
First Page
3800
Last Page
3815
Recommended Citation
Kelefouras, V., & Keramidas, G. (2022) 'Design and Implementation of 2D Convolution on x86/x64 Processors', IEEE Transactions on Parallel and Distributed Systems, 14(8), pp. 3800-3815. Available at: 10.1109/tpds.2022.3171471
