Magnetic Resonance (MR)-guided online Adaptive RadioTherapy (MRgoART) utilises the excellent soft-tissue contrast of MR images taken just before the patient's treatment to quickly update and personalise radiotherapy treatment plans. Four-dimensional (4D) MR Imaging (MRI) can resolve variations in respiratory motion patterns. 4D MRI data can be used to adapt the radiation beams to maximally target the tumour while sparing as much healthy tissue as possible. 4D MRI reconstruction, however, is computationally challenging and current state-of-the-art implementations are unable to meet MRgoART time requirements. This study bridges the gap between high-performance computing and medical applications by developing and implementing a parallel, heterogeneous architecture for the XD-GRASP algorithm capable of meeting the MRgoART time requirements. Our architecture exploits long-vector instructions and utilises all available resources, while minimising and hiding the communication overhead when external GPUs are used. As a result, the reconstruction time was reduced from 994 seconds to just 90 seconds with a speedup of more than 11x. In addition, we evaluated the impact of the emerging Processing-in-Memory (PIM) technology. Our simulation results show that 16 low power, in-order PIM cores with no SIMD unit are 2.7x faster than an Intel Coreā¢ i7-9700 8-core CPU equipped with AVX512 SIMD units. Additionally, 40 PIM cores match the performance of two AMD EPYC 7551 CPUs, with 32 cores each and just 87 PIM cores will match the performance of an NVIDIA Tesla V100 GPU equipped with 5,120 CUDA cores.
@en