Show simple item record

dc.contributor.authorKelefouras, V
dc.contributor.authorKeramidas, G
dc.date.accessioned2023-10-03T10:45:29Z
dc.date.available2023-10-03T10:45:29Z
dc.date.issued2023-10-04
dc.identifier.issn1558-2183
dc.identifier.issn1558-2183
dc.identifier.urihttps://pearl.plymouth.ac.uk/handle/10026.1/21352
dc.description.abstract

In this article, a new method is provided for accelerating the execution of convolution layers in Deep Neural Networks. This research work provides the theoretical background to efficiently design and implement the convolution layers on x86/x64 CPUs, based on the target layer parameters, quantization level and hardware architecture. The proposed work is general and can be applied to other processor families too, e.g., Arm. The proposed work achieves high speedup values over the state of the art, which is Intel oneDNN library, by applying compiler optimizations, such as vectorization, register blocking and loop tiling, in a more efficient way. This is achieved by developing an analytical modelling approach for finding the optimization parameters. A thorough experimental evaluation has been applied on two Intel CPU platforms, for DenseNet-121, ResNet-50 and SqueezeNet (including 112 different convolution layers), and for both FP32 and int8 input/output tensors (quantization). The experimental results show that the convolution layers of the aforementioned models are executed from x1.1 up to x7.2 times faster.

dc.format.extent3104-3116
dc.publisherInstitute of Electrical and Electronics Engineers
dc.subjectDeep neural networks
dc.subjectconvolution
dc.subjectoneDNN
dc.subjectoptimization
dc.subjectanalytical model
dc.subjectvectorization
dc.subjectregister blocking
dc.subjectloop tiling
dc.titleDesign and Implementation of Deep Learning 2D Convolutions on modern CPUs
dc.typejournal-article
dc.typeArticle
plymouth.issue12
plymouth.volume34
plymouth.publication-statusPublished
plymouth.journalIEEE Transactions on Parallel and Distributed Systems
dc.identifier.doi10.1109/tpds.2023.3322037
plymouth.organisational-group|Plymouth
plymouth.organisational-group|Plymouth|Faculty of Science and Engineering
plymouth.organisational-group|Plymouth|Faculty of Science and Engineering|School of Engineering, Computing and Mathematics
plymouth.organisational-group|Plymouth|REF 2021 Researchers by UoA
plymouth.organisational-group|Plymouth|Users by role
plymouth.organisational-group|Plymouth|Users by role|Academics
plymouth.organisational-group|Plymouth|REF 2021 Researchers by UoA|UoA11 Computer Science and Informatics
dcterms.dateAccepted2023-09-29
dc.date.updated2023-10-03T10:45:29Z
dc.rights.embargodate2023-10-25
dc.identifier.eissn1558-2183
dc.rights.embargoperiod
rioxxterms.versionofrecord10.1109/tpds.2023.3322037


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record


All items in PEARL are protected by copyright law.
Author manuscripts deposited to comply with open access mandates are made available in accordance with publisher policies. Please cite only the published version using the details provided on the item record or document. In the absence of an open licence (e.g. Creative Commons), permissions for further reuse of content should be sought from the publisher or author.
Theme by 
Atmire NV