Abstract (englisch):
Deploying artificial intelligence (AI) in the Internet of Things (IoT) is primarily limited by energy consumption. Optimized hardware architectures are therefore essential to meet the requirements of such applications. In this paper, we present LOTTA, a novel, low-power hardware accelerator for Temporal Convolutional Networks (TCNs) specifically designed for use in IoT applications. LOTTA is designed to efficiently utilize FPGA resources and can be configured at runtime via an SPI interface, providing high flexibility for different workloads. It also provides a QSPI interface to expand the weight memory and thus supports TCN network architectures that exceed the FPGA’s internal memory. In addition, a hardware-aware TCN hyperparameter search is proposed to find a TCN architecture that is well adapted to the hardware. Our evaluations on a low-power Lattice iCE40 UP5K FPGA show that LOTTA requires only 3,990 logic cells and offers a high performance of 0.12 GOP/s. Furthermore, our measurements reveal a low power consumption of 27.28 mW including 4 Mbit external non-volatile memory, which consequently underlines that our accelerator design enables the use of TCNs in IoT devices with highly constrained power consumption.