Optimizing Matching Time Intervals in Ride-Hailing and Ride-Pooling Services Using Reinforcement Learning

BAO, Y.

Optimizing Matching Time Intervals in Ride-Hailing and Ride-Pooling Services Using Reinforcement Learning

Master thesis (2024)

Authors

Y. BAO Civil Engineering & Geosciences

Contributors

Jie Gao Transport, Mobility and Logistics (mentor)

O. Cats Transport and Planning (graduation committee member)

F.A. Oliehoek Sequential Decision Making (graduation committee member)

Jinke He Sequential Decision Making (graduation committee member)

Faculty

Civil Engineering & Geosciences

Reinforcement Learning Ride-hailing Ride-pooling

To reference this document use:

http://resolver.tudelft.nl/uuid:d0fe9556-0aaf-4482-8c75-89affa0caa8a

More Info

expand_more

Published Date

10-12-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Civil Engineering & Geosciences

Abstract

Efficiently matching passengers with drivers is crucial for the performance of both ride-hailing and ride-pooling services, particularly when it comes to optimizing the time intervals for matching. Traditional methods often use fixed or real-time matching strategies, which do not fully consider the dynamic optimization of these intervals. This study introduces a dynamic optimization strategy based on reinforcement learning, applying the Proximal Policy Optimization (PPO) algorithm to flexibly adjust the matching intervals under various supply and demand conditions, aiming to reduce passenger wait times and improve vehicle utilization.

An efficient simulator is developed in this research to model the spatiotemporal matching processes for ride-hailing and ride-pooling services. Through experiments, the reinforcement learning-based matching strategy is compared against fixed interval and real-time matching strategies. Results indicate that the RL-based strategy outperforms in key metrics such as total waiting time and detour delay. Additionally, Potential-based Reward Shaping effectively addresses the sparse reward problem, further enhancing the model’s learning efficiency. The findings validate that dynamically optimizing matching intervals through reinforcement learning can improve the overall efficiency of ride-hailing services in dynamic supply-demand environments and is scalable and adaptable.

Files

Thesis_Bao_final.pdf

(pdf | 10.8 Mb)

License info not available