Integrating Analog accelerators in RISC-V platforms to improve the efficiency of deep neural networks
April 2020 - October 2020
  • RISC-V
  • Deep Neural Networks
  • Analog accelerator
Hardware-oriented Efficient Information Processing

Deep Neural Networks (DNNs) are today at the core of a myriad of Artificial Intelligence applications and services, and are predicted to grow at a rate of 13% by 2021. As DNNs become deeper, they require unsustainably high computational demands and power consumption. GPUs, which had been for years the best platform for NN training have hit a scalability wall; whereas edge devices also require the use of accelerators to perform the inference of very deep and complex NNs. 
To tackle both training and inference of current and future DNNs, keeping-up the pace with their growing computing needs, performance and efficiency need to keep scaling linearly. However in the era of post-Dennard scaling the only way of attaining this efficiency is by relying on novel neuromorphic and analog computing.

Analog emerging Non-Volatile Memories (eNVMs) show great potential due to their 25x lower area and 1000x energy compared to traditional SRAM, and their capabilities for 3D integration using conventional CMOS. When arranged in a crossbar, they behave as resistors that can store the DNN weights. Inputs to the NN layers can then be mapped as voltages, and the multiplication of weights (resistors) by inputs (voltages) is naturally build up as currents out of the grid, thanks to Kirchhoff’s law, dramatically accelerating the multiply-accumulate (MAC) operation, which can be performed in 1 cycle, outperforming GPUs.
The goal of this project is to perform a system-level assessment of analog eNVM accelerators when deployed in a RISC-V platform, by means of full-system simulation.
To do so, we will use and extend the gem5-X full-system simulator in two ways:
- by adding support to simulate full-system RISC-V multi-core platforms running a Linux OS
- by incorporating analog eNVMs in a RISC-V multi-core architecture. 
  • it has to be capable of handling a considerable amount of incoming data, whicn f!ows through 40Gb connections;
  • the synchronization is achieved via the best system commercially avaiiable (the WhiteRabbit protocol) to ensure that the largest amount of information is extracted from the data;
  • it has to rely on standard components and protocols to avoid vendor Iock-in problems;
  • it has to rely on standard components and protocols to avoid vendor Iock-in problems; 
    maintenance costs, increase efficiency and flexibility (using load balancing), and avoid the trasmission of already-processed data on potentially insecure lines (see Fig. 1).