In 2005, Jordan showed how to estimate the gradient of a real-valued function with a high-dimensional domain on a quantum computer. Subsequently, in 2017, it was shown by Gilyén et al. how to do this with a different input model. They also proved optimality of their algorithm for
...
In 2005, Jordan showed how to estimate the gradient of a real-valued function with a high-dimensional domain on a quantum computer. Subsequently, in 2017, it was shown by Gilyén et al. how to do this with a different input model. They also proved optimality of their algorithm for \ell^\infty -approximations of functions satisfying some smoothness conditions.
In this text, we expand the ideas of Gilyén et al., and extend their algorithm such that functions with fewer regularity constraints can be used as input. Moreover, we show that their algorithm is essentially optimal in the query complexity to the phase oracle even for classes of functions that have more stringent smoothness conditions. Finally, we also prove that their algorithm is optimal for approximating gradients with respect
to general \ell^p -norms, where p \in [1,\infty].
Furthermore, we investigate how Gilyén et al.’s algorithm can be used to do reinforcement learning on a quantum computer. We elaborate on Montanaro’s ideas for quantum Monte-Carlo simulation, and show how they can be used to implement quantum value estimation of Markov reward processes. We also show essential optimality of this algorithm in the query complexity of all its oracles. Next, we show how we can construct a quantum policy evaluation algorithm, and how we can use these algorithms as subroutines in Gilyén et al.’s quantum gradient estimation algorithm to perform quantum policy optimization.
The most important open questions remain whether it is possible to improve the query complexity of the extension of Gilyén et al.’s algorithm, when function classes containing functions of Gevrey-type 1 are used as input, as at the moment for this specific parameter setting the algorithm is not better than a very simple classical gradient estimation procedure. Improvement of this result would automatically improve the quantum policy optimization routine as well.