Reinforcement Learning for Multi-Rendezvous Mission Design

More Info
expand_more

Abstract

The design of multi-target rendezvous trajectories, which see a spacecraft approaching a sequence of objects in orbit as efficiently (by some metric) as possible, is a challenging problem of critical importance for Active Debris Removal (ADR), On-Orbit Servicing (OOS) and cis-Lunar logistics more widely. This thesis investigates two primary challenges in space Vehicle Routing Problems (VRPs): the application of Neural Combinatorial Optimization (NCO) methods for ADR missions and the integration of verifiable trajectory optimization techniques for OTV payload deployment.
The first research focus assesses the efficacy of NCO methods in designing multi-target rendezvous trajectories for ADR missions. An Attention-based routing policy, comprising a Graph Attention Network and a Pointer Network, was developed and trained using Reinforcement Learning (RL) algorithms, including REINFORCE, Advantage Actor-Critic (A2C), and Proximal Policy Optimization (PPO). Through hyperparameter analysis utilizing ANOVA, embedding dimension and the number of encoder layers were identified as critical factors influencing model performance. The trained policy was evaluated on scenarios involving 10, 30, and 50 transfers based on the Iridium 33 debris cloud. In missions with 10 transfers, the NCO policy achieved a mean optimality gap of 32%, outperforming the Dynamic RAAN Walk (DRW) heuristic in both mission cost and runtime. However, performance degraded in more complex scenarios with 30 and 50 transfers, indicating limited generalization beyond the training conditions. Grid search hyperparameter optimization revealed that while model performance improves with increased complexity, gains are marginal, and larger training datasets enhance convergence speed with only slight improvements in final performance. These findings demonstrate that NCO methods are effective for ADR missions with a limited number of targets but face scalability and generalization challenges in more complex scenarios.
The second research focus involves the design and optimization of multi-rendezvous trajectories for the UARX Space OSSIE mission using a modular framework that integrates Heuristic Combinatorial Optimization (HCO) with Sequential Convex Programming (SCP). This framework successfully determined optimal target sequences and generated near fuel-optimal trajectories for OSSIE, a translational and mass-dynamic payload delivery platform.
An Attention-based routing policy trained with RL was integrated into the combinatorial optimization process, enhancing the efficiency of mission planning. Applied to the OSSIE mission, the framework effectively explored the mission design space, optimizing 5000 mission scenarios and affirming the vehicle’s capability to fulfill advertised services. The modularity of the framework ensures adaptability to mission-specific constraints and facilitates future extensions, such as the incorporation of low-thrust propulsion profiles. Overall, this thesis confirms that NCO methods are applicable and effective for specific instances of space VRPs, particularly in optimizing ADR missions with a limited number of targets and in near-static mission scenarios where RAAN convergence is not required. The integration of verifiable trajectory optimization techniques with advanced routing policies presents a viable approach for efficient and adaptable mission planning. However, scalability and generalization remain challenges that necessitate further research. Recommendations include refining NCO model architectures to enhance scalability and generalization, exploring hybrid approaches that combine NCO with traditional heuristics, and developing automated machine learning frameworks to optimize model performance and robustness.
The project successfully achieved its primary objectives: developing and implementing heuristic and neural combinatorial optimization solvers for space VRPs, designing a modular trajectory optimization framework, and conducting comprehensive mission analyses for the OSSIE OTV. In doing so it has increased the mission design capabilities for space logistics missions at SENER Aerospace & Defence, as well as provided a strong foundation for future research and development aimed at addressing the increasing complexities of space operations.

Files

MSc_Thesis_Antonio_L_pez_River... (pdf)
Unknown license
warning

File under embargo until 31-10-2026