Reinforcement learning for order distribution in self-organizing logistics
More Info
expand_more
Abstract
With the increasing global demand for logistics, supply chains have grown a lot in volume over the last decades. To be able to operate effectively within the capacity constraints of the carriers, proper collaboration and optimization of order allocation is required. Van Berkel Logistics facilitates the transport of containers by trucks from sea terminals in Rotterdam to inland customers and back. This logistical planning problem is manually solved by planners on a daily basis. Within this research it is investigated to what extent reinforcement learning could be applied for solving this planning problem of moving containers in an automated way. A simulation environment was constructed which represents the container planning dynamics. It was made as accurate as reasonably possible with the help of historic data. Three reinforcement learning models, the OnePass, Iterative and Attention model, have been developed and tested for their ability to learn to choose proper orders such that the orders are as much on time as possible. A main challenge in constructing these models was to design them such that they could cope with a varying state and action space. In an experimental evaluation, it was found that the models are able to learn to make better decisions over time and eventually perform similar to the heuristic baseline tested out in terms of total lateness observed. In terms of driven distance and fraction on time orders, the OnePass and Iterative model were able to beat the heuristic choices. Overall, the Iterative model has shown the best performance and is able to learn scenarios as big as real-life scenarios van Berkel Logistics deals with. However, it also tends to be slower than the other models due to its iterative approach.