Modern implementations of encryption algorithms on CPU’s that use frequent memory lookups of precomputed functions, are vulnerable to Cache based SideChannel Attacks. The ρ-VEX processor, a runtime reconfigurable VLIW processor developed at the Computer and Quantum Engineering
...
Modern implementations of encryption algorithms on CPU’s that use frequent memory lookups of precomputed functions, are vulnerable to Cache based SideChannel Attacks. The ρ-VEX processor, a runtime reconfigurable VLIW processor developed at the Computer and Quantum Engineering department at the TU Delft was identified to possibly allow for special countermeasure implementations against Cache Attacks, because of its unique Cache architecture, hardware context switching and timing behaviour. In this thesis, the possibility to use the runtime reconfigurability against Cache based SideChannel Attacks of the ρ-VEX processor is investigated. The ability for the ρ-VEX to alter its execution time based on configuration size, interfere with its own Cache state based on configuration and dynamically switch contexts between Caches are explored. Simple variants of the attacks Prime+Probe, Evict+Time and Final Round Collision Attack are implemented and ran on two setups that simulate practical scenarios of the ρ-VEX. These are a standalone setup and a setup with a second context sharing the processor with an arbitrary workload based on the PowerStone benchmarks, which causes noise by itself and can amplify countermeasures. The ρ-VEX was instanced on the Genesys2 FPGA development board. We have shown an implementation of timing noise through configuration size variations called nLane increased the amount of samples required for the timing Attacks Evict+Time and Final Round Collision to around 800x more traces. An implementation of access noise through swapping contexts called CacheSwap achieved 800x more traces for Evict+Time and 225x more traces for Final Round Collision when executed in a shared processor. The effect on Prime+Probe was only strong for higher chances to swap, but the overhead for these percentages was considered too high. An implementation of isolating lookups over private Caches, called ScatterRound, had additional benefits aside from preventing collisions. It made our Evict+Time Attack take 160x, Prime+Probe Attack 175x and Final Round Collision Attack 225x as many samples to be successful. We have shown that the overhead associated with 10% nLane and 10% CacheSwap was reasonable, but ScatterRound was concluded to require a specific execution setup to achieve performance costs that are acceptable.