This thesis investigates the performance of various bandit algorithms in non-stationary contextual environments, where reward functions change unpredictably over time. Traditional bandit algorithms, designed for stationary settings, often fail in dynamic real-world scenarios. Thi
...
This thesis investigates the performance of various bandit algorithms in non-stationary contextual environments, where reward functions change unpredictably over time. Traditional bandit algorithms, designed for stationary settings, often fail in dynamic real-world scenarios. This research evaluated the adaptability and computational performance of popular algorithms such as UCB, LinUCB, and LinEXP3 using a self-implemented bandit framework. Empirical results reveal significant insights into the trade-offs and optimal strategies for applying these algorithms in non-stationary conditions. Notably, LinEXP3 demonstrated superior performance in complex environments due to its ability to incorporate Bayesian posteriors, despite its higher computational cost. The key contributions of this paper include the empirical evaluation of these algorithms and their implementations, with tailored environment settings. The results suggest promising directions for further research, including the incorporation of broader algorithmic ranges like Contextual Thompson Sampling and other reinforcement learning algorithms adapted for linear contextual settings. Additionally, future work should focus on using real-world datasets to validate these algorithms and introducing covariance matrices for context vectors to simulate more realistic learning processes. These findings could influence the design and implementation of bandit algorithms in practical applications such as recommendation systems and financial portfolio management.