In recent years, immersive media, particularly Virtual Reality (VR) technology, has seen significant growth. VR technology immerses users in fully digital environments, offering interactive experiences beyond traditional media. However, delivering highquality 360° videos over the
...
In recent years, immersive media, particularly Virtual Reality (VR) technology, has seen significant growth. VR technology immerses users in fully digital environments, offering interactive experiences beyond traditional media. However, delivering highquality 360° videos over the internet poses challenges such as bandwidth constraints and latency issues. Latency, in particular, can disrupt the sense of presence by causing unrealistic interactions that break the illusion of being in a virtual environment. One promising solution is the prediction of a user’s head pose trajectories to preemptively adapt the content delivery and minimize delays. Head pose prediction enables adaptive streaming systems to prioritize and deliver only the relevant portions of 360° videos, significantly reducing bandwidth requirements while ensuring a smooth user experience. Despite advances in predictive modeling, existing approaches often struggle with accuracy when user behavior is unpredictable, influenced by content characteristics and individual differences. To address these challenges, this thesis investigates the potential of leveraging entropy metrics, such as Actual Entropy (AE) and Instantaneous Entropy (IE), as measures of user predictability to improve head pose prediction. Through an exploratory analysis of 360° video datasets and existing state-of-the-art prediction models, we identify a linear correlation between prediction errors and entropy metrics, highlighting the potential of entropy-driven approaches. We develop two adaptive attention-based models: an LSTM-based model with entropy-modulated attention and a multi-head adaptive attention model. In addition, we explore entropyaugmented baseline approaches. While adaptive models achieve mixed results, a baseline model combining head pose and instantaneous entropy was found to be more stable, demonstrating the utility of even straightforward entropy integration. Although the entropy-based models did not consistently outperform state-of-the-art methods, our findings demonstrate that entropy augmentation offers a promising avenue for improving the stability and robustness of head pose prediction in specific scenarios. This thesis highlights that understanding dataset characteristics and how entropy is incorporated into model architectures is crucial for optimizing performance. These insights suggest that future work should focus on adapting model designs to better account for user predictability, which could lead to more adaptive and responsive VR systems.