Enhanced Accelerator Design for Efficient CNN Processing with Improved Row-Stationary Dataflow

Lesniak, Fabian 1; Gutermann, Annina 1; Harbaum, Tanja ORCID iD icon 1; Becker, Jürgen 1
1 Institut für Technik der Informationsverarbeitung (ITIV), Karlsruher Institut für Technologie (KIT)


Efficient on-device inference of convolutional neural networks (CNNs) is becoming one of the key challenges for embedded systems, leading to the integration of specialized hardware accelerators in System-on-Chips (SoCs). Due to the memory-bound nature of convolution workloads, it is essential to optimize CNN accelerators for maximum data re-use to reduce memory bandwidth requirements. The row-stationary (RS) dataflow enhances data re-use in CNN processing by storing a subset of input activations, weights and partial sums locally within the Processing Elements (PEs). However, designs of RS accelerators are not publicly available, and many implementation details remain undisclosed. This paper introduces an open-source implementation of a CNN accelerator with RS dataflow. The complete VHDL source code is provided as well as a simulation environment that enables in-depth analysis of different workloads. We contribute an exploration of various design parameters and evaluate their impact on performance. Furthermore, we present an enhanced dataflow that is optimized for parallel processing of convolutions with a high number of channels. Our optimizations yield a performance improvement of up to 2.3x for convolutional layers of common neural networks. ... mehr

DOI: 10.5445/IR/1000172735
Veröffentlicht am 25.07.2024
Zugehörige Institution(en) am KIT Institut für Technik der Informationsverarbeitung (ITIV)
Publikationstyp Proceedingsbeitrag
Publikationsdatum 12.06.2024
Sprache Englisch
Identifikator ISBN: 979-84-00-70605-9
KITopen-ID: 1000172735
Erschienen in GLSVLSI '24: Proceedings of the Great Lakes Symposium on VLSI 2024
Veranstaltung Great Lakes Symposium on VLSI (GLSVLSI 2024), Clearwater, FL, USA, 12.06.2024 – 14.06.2024
Verlag Association for Computing Machinery (ACM)
Seiten 151 – 157
Schlagwörter hardware acceleration, convolutional neural networks, embedded systems, dataflow optimization, row-stationary dataflow
