Design and Implementation of 2D Convolution on x86/x64 Processors

ORCID

Vasilios Kelefouras: 0000-0001-9591-913X

Abstract

In this paper, a new method for accelerating the 2D direct Convolution operation on x86/x64 processors is presented. It includes efficient vectorization by using SIMD intrinsics, bit-twiddling optimizations, the optimization of the division operation, multi-threading using OpenMP, register blocking and the shortest possible bit-width value of the intermediate results. The proposed method, which is provided as open-source, is general and can be applied to other processor families too, e.g., Arm. The proposed method has been evaluated on two different multi-core Intel CPUs, by using twenty different image sizes, 8-bit integer computations and the most commonly used kernel sizes (3x3, 5x5, 7x7, 9x9). It achieves from 2.8× to 40× speedup over the Intel IPP library (OpenCV GaussianBlur and Filter2D routines), from 105× to 400× speedup over the gemm-based convolution method (by using Intel MKL int8 matrix multiplication routine), and from 8.5× to 618× speedup over the vslsConvExec Intel MKL direct convolution routine. The proposed method is superior as it achieves far fewer arithmetical and load/store instructions.

DOI Link

10.1109/tpds.2022.3171471

Publication Date

2022-12-01

Publication Title

IEEE Transactions on Parallel and Distributed Systems

Volume

14

Issue

8

ISSN

1045-9219

Acceptance Date

2022-04-27

Deposit Date

2022-01-05

Embargo Period

2022-05-05

First Page

3800

Last Page

3815

Recommended Citation

Kelefouras, V., & Keramidas, G. (2022) 'Design and Implementation of 2D Convolution on x86/x64 Processors', IEEE Transactions on Parallel and Distributed Systems, 14(8), pp. 3800-3815. Available at: 10.1109/tpds.2022.3171471

School of Engineering, Computing and Mathematics

Design and Implementation of 2D Convolution on x86/x64 Processors

ORCID

Abstract

DOI Link

Publication Date

Publication Title

Volume

Issue

ISSN

Acceptance Date

Deposit Date

Embargo Period

First Page

Last Page

Recommended Citation

Search

Browse

About

Links

School of Engineering, Computing and Mathematics

Design and Implementation of 2D Convolution on x86/x64 Processors

Authors

ORCID

Abstract

DOI Link

Publication Date

Publication Title

Volume

Issue

ISSN

Acceptance Date

Deposit Date

Embargo Period

First Page

Last Page

Recommended Citation

Share

Search

Browse

About

Links