Reward Based Program Synthesis for Minecraft

Reward Based Program Synthesis for Minecraft

Adapting Program Synthesizers for Reward Evaluation and Leveraging Discovered Programs

Bachelor thesis (2024)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Exploitation Probe Minecraft Rewards Reward-based games MineRL Program-Synthesis Inductive program

To reference this document use:

http://resolver.tudelft.nl/uuid:49e7fbbe-cec7-4ed5-8c52-bd1bb850e917

More Info

expand_more

Published Date

21-06-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Program synthesis is the task to construct a program that provably satisfies a given high-level specification. There are various ways in which a specification can be described. This research focuses on adapting the Probe synthesizer, traditionally reliant on input-output examples, to utilize reward-based synthesis. The generalization of Probe allows for flexibility in using various search algorithms, selection and updating algorithms, enhancing its applicability to a general case. By modifying the Probe algorithm to learn from rewards, we explore how exploiting existing programs as partial solutions impacts synthesis performance. Different ways of exploitation were tested, specifically, how much the probabilities change, and how a starting probabilities can affect the synthesis. Exploitation of programs could lead to faster synthesis but it could also lead to no solutions depending on the world environment.

Files

Research_paper_final_2.pdf

(pdf | 0.561 Mb)

Unknown license