ORCID
- Vasilios Kelefouras: 0000-0001-9591-913X
Abstract
Deep neural networks (DNNs) have become indispensable in many real-life applications like natural languageprocessing, and autonomous systems. However, deploying DNNs on resource-constrained devices, e.g., inRISC-V platforms, remains challenging due to the high computational and memory demands of fully connected(FC) layers, which dominate resource consumption. Low-rank factorization (LRF) offers an effective approachto compressing FC layers, but the vast design space of LRF solutions involves complex tradeoffs amongFLOPs, memory size, inference time, and accuracy, making the LRF process complex and time-consuming. Thisarticle introduces an end-to-end LRF design space exploration methodology and a specialized design tool foroptimizing FC layers on RISC-V processors. Using Tensor Train Decomposition (TTD) offered by TensorFlowT3F library, the proposed work prunes the LRF design space by excluding first, inefficient decomposition shapesand second, solutions with poor inference performance on RISC-V architectures. Compiler optimizations arethen applied to enhance custom T3F layer performance, minimizing inference time and boosting computationalefficiency. On average, our TT-decomposed layers run 3× faster than IREE and 8× faster than Pluto on thesame compressed model. This work provides an efficient solution for deploying DNNs on edge and embeddeddevices powered by RISC-V architectures.
DOI Link
Publication Date
2025-10-24
Publication Title
ACM Transactions on Embedded Computing Systems
Volume
24
Issue
6
ISSN
1539-9087
Acceptance Date
2025-08-31
Deposit Date
2025-12-09
Additional Links
https://dl.acm.org/doi/10.1145/3768624, https://dl.acm.org/doi/full/10.1145/3768624
First Page
1
Last Page
34
Recommended Citation
Anthimopoulos, T., Kokhazadeh, M., Kelefouras, V., Himpel, B., & Keramidas, G. (2025) 'Optimizing Tensor Train Decomposition in DNNs for RISC-V Architectures Using Design Space Exploration and Compiler Optimizations', ACM Transactions on Embedded Computing Systems, 24(6), pp. 1-34. Available at: 10.1145/3768624
