Manufacturers of CubeSats prefer the use of COTS electronic components such as microcontrollers (MCU) and SDRAM but these components are vulnerable to errors caused by radiation. A specific type of error caused by radiation are soft errors which can be corrected by Error Detectio
...
Manufacturers of CubeSats prefer the use of COTS electronic components such as microcontrollers (MCU) and SDRAM but these components are vulnerable to errors caused by radiation. A specific type of error caused by radiation are soft errors which can be corrected by Error Detection and Correction (EDAC) methods. Unfortunately, it is uncommon for MCUs to have an (optimal) EDAC solution integrated in their memory controller. In these cases an external solution is required.
This work proposes a generic methodology towards transparent FPGA-based correction of soft errors that can be applied given a specific microcontroller and its off-chip main memory. To determine how powerful the EDAC solution must be the methodology uses reliability models to evaluate the MTTF of the memory system. Then, the methodology describes how the EDAC logic must be designed in the FPGA fabric such that the FPGA and its EDAC logic are transparent to the MCU. A method to periodically scrub the memory transparently to the processor is also included.
A major trade-off of the FPGA based design is that it requires the clock of the MCU memory controller to be lowered to meet the timing requirements of the SDRAM interface. For some designs it is also necessary to lower the memory capacity or to use more SDRAM devices to store the ECC parity bits.
Lastly, a hardware demonstrator was built to provide EDAC capabilities to the onboard computer developed by ISISPACE. The design consisting of an ARM9 microcontroller, an FPGA and two SDRAM devices is capable of correcting a single error in each byte of the 32- bit SDRAM interface of the MCU and can scrub the entire memory in 33 seconds. Software benchmarks showed that computing performance is about half of the original board as a result of halving the memory controller clock.