Reward Based Program Synthesis for Minecraft

Adapting Program Synthesizers for Reward Evaluation and Leveraging Discovered Programs

More Info
expand_more

Abstract

Program synthesis is the task to construct a program that provably satisfies a given high-level specification. There are various ways in which a specification can be described. This research focuses on adapting the Probe synthesizer, traditionally reliant on input-output examples, to utilize reward-based synthesis. The generalization of Probe allows for flexibility in using various search algorithms, selection and updating algorithms, enhancing its applicability to a general case. By modifying the Probe algorithm to learn from rewards, we explore how exploiting existing programs as partial solutions impacts synthesis performance. Different ways of exploitation were tested, specifically, how much the probabilities change, and how a starting probabilities can affect the synthesis. Exploitation of programs could lead to faster synthesis but it could also lead to no solutions depending on the world environment.