DNoC is a CTI project in partnership with IOxOS (http://www.ioxos.ch). The main goal of this project is providing a development framework on FPGAs based on the Network on Chip (NoC) communication system, this framework will be part of IOxOS future product lines. It mainly focuses Xilinx FPGAs, more specifically the Ultrascale family. Further developing the Network on Chip concept to guarantee a high bandwidth data flow inside the FPGA itself, offer PCI Express GEN3 support and provide quality of service (QoS) facilities to tackle the specific real-time needs of critical applications, used in many fields such as physics, energy, transports and aeronautics.
The FPGA design is fitted on a PC board slotted in a PCI Express GEN3 port. This serial bus provides a bandwidth up to 8Gb/s per lane, each card can then use up to 16 lignes in parallel. The final implementation will provide the user with the full communication framework needed for building their critical high-throughput applications.
The project is composed of the following steps :
Development of the central switch
The switch provides 4 to 8 ports for agents to connect. Each port allows for transfers up to 4GB/s on a 128 bits interface with an internal 250MHz clock. Other than the logic needed for control, the switch is composed of a large amount of FIFO memory elements allowing for optimal packet transfers between agents. The resource usage of this component is critical since it has to be implemented side by side with the user applications sharing the FPGA limited resources. Therefore there is a need for finding balance between performance and resource usage. To do so the user can employ an optimizer developed during this project to find the best arrangement between switch size and performance for his specific application.
Realization of a PCI Express GEN3 agent and memory agent
The central switch provides ports for the agents to connect, agents can be of any kind : a memory controller, a bridge to another bus (VME for example), user defined logic, a PCI Express bridge, etc. However a critical agent has to be developed for this project, namely the PCI Express GEN3 agent in charge of communicating with the PC. It is critical since it will be needed in every application to communicate with the system from the software running on the PC. The PCI Express GEN3 standard specifies a bandwidth up to 8Gb/s per lane and allows for multiple lanes in parallel. It also offers mechanisms not present in the previous generation (GEN2). One interesting aspects is the possibility for atomic operations. It is now possible to do operations such as "fetch and add", "unconditional swap" or "swap and compare" as atomic operations besides the common "read" and "write" operations. These new features need to be taken into account when developing the agents. The memory agent in particular needs to support all the specified atomic operations, this will be implemented during the project.
Provide monitoring capabilities for the switch's internal state
The switch will be used in applications relying on the highest quality of service standards. To monitor the service, statistics on the switch internal states have to be extracted. A new mechanism for monitoring the switch internal FIFOs usage rates and providing throughput and latency statistics is developed for this project. The monitoring service has to be able to warn software when problems might occur (a full FIFO for example) and also extract relevant statistics on the system's behavior. Finding which statistics are relevant in the system critical parts and how to extract them without degrading performance is part of the exploratory phase of this project.
Optimization of the resource usage for the switch
The switch is composed of a considerable quantity of FIFOs allowing for packet transfers between agents. The bigger the FIFOs the less risk of having a full FIFO (which would diminish QoS). However it is not acceptable for the switch to occupy too much resources in the FPGA since these are needed for the user developed application. Therefore balance has to be found using the smallest FIFO possible still allowing for optimal quality of service.
Finding the best size for the FIFOs by hand isn't really efficient so an automatic optimizer is needed. The optimizer will take into account specific needs and constraints for each application and providing the best FIFO size for the central switch. A possible approach for this could be using the developed test benches.
In order to guarantee the correct behavior of all the hardware components developed in this project test benches are set up. These are written in the high-level SystemVerilog language and use a subset of UVM (Universal Verification Methodology). To relieve the users from using a full blown simulator supporting every feature of SystemVerilog the test benches don't use any constrained randomization nor coverage functions. However the test benches still provide the high levels of flexibility of UVM and allow for fast and straightforward creation of new test cases and scenarios. Finally a test bench for the central switch has been developed in VHDL, using UVVM (Universal VHDL Verification Methodology), relying only on the VHDL in order to provide the advantages of UVM language without the need for SystemVerilog support.
This test bench could be used to acquire data on multiple simulation runs with different FIFO sizes for the optimizer to work with. However simulation requires a lot of processing time and is, for this reason, impractical. To solve this a pure software version of the switch is developed emulating the real hardware version. This software simulated switch allows for fast and easy testing of multiple internal configurations (FIFO sizes, policies, etc.) in several user defined scenarios (number of agents, behavior of the agents, type of transfers, etc.).
Based upon the simulation results the best internal FIFO size can be defined by the optimizer. When using the optimizer software the user can to specify agent behavior and characteristics in a straightforward manner from within the developed framework. Once the user defined scenario is set a genetic algorithm is used for finding a good implementation. Optimization is done using a genetic algorithm and simulation to cope with the enormous input set of parameter combinations. By doing so the optimizer can provide the user with one of the smallest switch possible (in resource usage) still guaranteeing the desired quality of service, and this in a reasonable amount of time.
Visualization of the internal FIFOs states
During simulation log files are generated, these files contain all the necessary information and will show the FIFO contents, a graphical user interface is provided to display the information and ease analysis. This will help debugging communication problems tremendously. The graphical user interface will also serve as a monitoring interface for the integrated switch, this will also allow the user to check the switch's behavior after integration on real hardware and confront the results with simulation generated results.