Probabilistic reach-avoid for Bayesian neural networks

Wicker, Matthew; Laurenti, Luca; Patane, Andrea; Paoletti, Nicola; Abate, A.; Kwiatkowska, Marta

Probabilistic reach-avoid for Bayesian neural networks

Journal article (2024)

Authors

Matthew Wicker University of Oxford

Luca Laurenti Team Luca Laurenti

Andrea Patane University of Oxford

Nicola Paoletti King’s College London

A. Abate University of Oxford, Team Bart De Schutter

Marta Kwiatkowska University of Oxford

Research Group

Team Luca Laurenti

Safety Reinforcement learning Formal verification Bayesian neural networks Certified control synthesis Reach-while-avoid

To reference this document use:

http://resolver.tudelft.nl/uuid:ffbcd8e6-370c-4bce-a086-21604204d3ef

More Info

expand_more

Published Date

2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Research Group

Team Luca Laurenti

Abstract

Model-based reinforcement learning seeks to simultaneously learn the dynamics of an unknown stochastic environment and synthesise an optimal policy for acting in it. Ensuring the safety and robustness of sequential decisions made through a policy in such an environment is a key challenge for policies intended for safety-critical scenarios. In this work, we investigate two complementary problems: first, computing reach-avoid probabilities for iterative predictions made with dynamical models, with dynamics described by Bayesian neural network (BNN); second, synthesising control policies that are optimal with respect to a given reach-avoid specification (reaching a “target” state, while avoiding a set of “unsafe” states) and a learned BNN model. Our solution leverages interval propagation and backward recursion techniques to compute lower bounds for the probability that a policy's sequence of actions leads to satisfying the reach-avoid specification. Such computed lower bounds provide safety certification for the given policy and BNN model. We then introduce control synthesis algorithms to derive policies maximizing said lower bounds on the safety probability. We demonstrate the effectiveness of our method on a series of control benchmarks characterized by learned BNN dynamics models. On our most challenging benchmark, compared to purely data-driven policies the optimal synthesis algorithm is able to provide more than a four-fold increase in the number of certifiable states and more than a three-fold increase in the average guaranteed reach-avoid probability.

Files

1-s2.0-S0004370224000687-main.... (pdf)

(pdf | 6.93 Mb)

- Embargo expired in 17-10-2024

License info not available