Program Synthesis from Rewards using Probe and FrAngel
Impact of Exploration-Exploitation Configurations on Probe and FrAngel in Minecraft
More Info
expand_more
Abstract
Program synthesis involves finding a program that meets the user intent, typically provided as input/output examples or formal mathematical specifications. This paper explores a novel specification in program synthesis - learning from rewards.
We explore existing synthesizers, Probe and FrAngel, to solve navigation tasks inside the popular Minecraft game. The problem formulation is inspired by reinforcement learning but was adapted to program synthesis. Similar to reinforcement learning, balancing exploration and exploitation is essential for solving the task efficiently. Excessive exploration can prevent finding the correct program because the feedback from the environment is not used. On the other hand, excessive exploitation is not ideal, as seemingly promising programs might not lead to the actual solution. This work compares different trade-offs between exploration and exploitation of Probe and FrAngel when applied to Minecraft environments.