A Practical Approach for Employing Tensor Train Decomposition in Edge Devices

ORCID

Vasilios Kelefouras: 0000-0001-9591-913X

Abstract

Deep Neural Networks (DNN) have made significant advances in various fields including speech recognition and image processing. Typically, modern DNNs are both compute and memory intensive, therefore their deployment in low-end devices is a challenging task. A well-known technique to address this problem is Low-Rank Factorization (LRF), where a weight tensor is approximated by one or more lower-rank tensors, reducing both the memory size and the number of executed tensor operations. However, the employment of LRF is a multi-parametric optimization process involving a huge design space where different design points represent different solutions trading-off the number of FLOPs, the memory size, and the prediction accuracy of the DNN models. As a result, extracting an efficient solution is a complex and time-consuming process. In this work, a new methodology is presented that formulates the LRF problem as a (FLOPs vs. memory vs. prediction accuracy) Design Space Exploration (DSE) problem. Then, the DSE space is drastically pruned by removing inefficient solutions. Our experimental results prove that the design space can be efficiently pruned, therefore extract only a limited set of solutions with improved accuracy, memory, and FLOPs compared to the original (non-factorized) model. Our methodology has been developed as a stand-alone, parameterized module integrated into T3F library of TensorFlow 2.X.

DOI Link

10.1007/s10766-024-00762-3

Publication Date

2024-02-16

Publication Title

International Journal of Parallel Programming

Volume

52

Issue

1-2

ISSN

0885-7458

Acceptance Date

2024-01-19

Deposit Date

2024-06-12

Additional Links

https://www.scopus.com/pages/publications/85185122253

Keywords

Deep neural networks, Design space exploration, Low-rank factorization, Model compression, Tensor train decomposition

First Page

20

Last Page

39

Recommended Citation

Kokhazadeh, M., Keramidas, G., Kelefouras, V., & Stamoulis, I. (2024) 'A Practical Approach for Employing Tensor Train Decomposition in Edge Devices', International Journal of Parallel Programming, 52(1-2), pp. 20-39. Available at: 10.1007/s10766-024-00762-3

School of Engineering, Computing and Mathematics

A Practical Approach for Employing Tensor Train Decomposition in Edge Devices

ORCID

Abstract

DOI Link

Publication Date

Publication Title

Volume

Issue

ISSN

Acceptance Date

Deposit Date

Additional Links

Keywords

First Page

Last Page

Recommended Citation

Search

Browse

About

Links

School of Engineering, Computing and Mathematics

A Practical Approach for Employing Tensor Train Decomposition in Edge Devices

Authors

ORCID

Abstract

DOI Link

Publication Date

Publication Title

Volume

Issue

ISSN

Acceptance Date

Deposit Date

Additional Links

Keywords

First Page

Last Page

Recommended Citation

Share

Search

Browse

About

Links