Design and Implementation of Deep Learning 2D Convolutions on modern CPUs

Kelefouras, V; Keramidas, G

dc.contributor.author	Kelefouras, V
dc.contributor.author	Keramidas, G
dc.date.accessioned	2023-10-03T10:45:29Z
dc.date.available	2023-10-03T10:45:29Z
dc.date.issued	2023-10-04
dc.identifier.issn	1558-2183
dc.identifier.issn	1558-2183
dc.identifier.uri	https://pearl.plymouth.ac.uk/handle/10026.1/21352
dc.description.abstract	In this article, a new method is provided for accelerating the execution of convolution layers in Deep Neural Networks. This research work provides the theoretical background to efficiently design and implement the convolution layers on x86/x64 CPUs, based on the target layer parameters, quantization level and hardware architecture. The proposed work is general and can be applied to other processor families too, e.g., Arm. The proposed work achieves high speedup values over the state of the art, which is Intel oneDNN library, by applying compiler optimizations, such as vectorization, register blocking and loop tiling, in a more efficient way. This is achieved by developing an analytical modelling approach for finding the optimization parameters. A thorough experimental evaluation has been applied on two Intel CPU platforms, for DenseNet-121, ResNet-50 and SqueezeNet (including 112 different convolution layers), and for both FP32 and int8 input/output tensors (quantization). The experimental results show that the convolution layers of the aforementioned models are executed from x1.1 up to x7.2 times faster.
dc.format.extent	3104-3116
dc.publisher	Institute of Electrical and Electronics Engineers
dc.subject	Deep neural networks
dc.subject	convolution
dc.subject	oneDNN
dc.subject	optimization
dc.subject	analytical model
dc.subject	vectorization
dc.subject	register blocking
dc.subject	loop tiling
dc.title	Design and Implementation of Deep Learning 2D Convolutions on modern CPUs
dc.type	journal-article
dc.type	Article
plymouth.issue	12
plymouth.volume	34
plymouth.publication-status	Published
plymouth.journal	IEEE Transactions on Parallel and Distributed Systems
dc.identifier.doi	10.1109/tpds.2023.3322037
plymouth.organisational-group	\|Plymouth
plymouth.organisational-group	\|Plymouth\|Faculty of Science and Engineering
plymouth.organisational-group	\|Plymouth\|Faculty of Science and Engineering\|School of Engineering, Computing and Mathematics
plymouth.organisational-group	\|Plymouth\|REF 2021 Researchers by UoA
plymouth.organisational-group	\|Plymouth\|Users by role
plymouth.organisational-group	\|Plymouth\|Users by role\|Academics
plymouth.organisational-group	\|Plymouth\|REF 2021 Researchers by UoA\|UoA11 Computer Science and Informatics
dcterms.dateAccepted	2023-09-29
dc.date.updated	2023-10-03T10:45:29Z
dc.rights.embargodate	2023-10-25
dc.identifier.eissn	1558-2183
dc.rights.embargoperiod
rioxxterms.versionofrecord	10.1109/tpds.2023.3322037

Files in this item

Name:: license.txt
Size:: 5.295Kb
Format:: Text file

View/Open

Name:: ver_3.pdf
Size:: 3.024Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

School of Engineering, Computing and Mathematics

Show simple item record