Service provisioning systems assign users to service providers according to allocation criteria that strike an optimal trade-off between users' Quality of Experience (QoE) and the operation cost endured by providers. These systems have been leveraging Smart Contracts (SCs) to add trust and transparency to their criteria. However, deploying fixed allocation criteria in SCs does not necessarily lead to the best performance over time since the blockchain participants join and leave flexibly, and their load varies with time, making the original allocation sub-optimal. Furthermore, updating the criteria manually at every variation in the blockchain jeopardizes the autonomous and independent execution promised by SCs. Thus, we propose a set of light-weight agents for SCs that are capable of optimizing the performance. We also propose using online learning SCs, empowered by Deep Reinforcement Learning (DRL) agent, that leverage the chained data to continuously self-tune its allocation criteria. We show that the proposed learning-assisted method achieves superior performance on the combinatorial multi-stage allocation problem while still being executable in real-time. We also compare the proposed approach with standard heuristics as well as planning methods. Results show a significant performance advantage over heuristics and better adaptability to the dynamic nature of blockchain networks.
@en