Performance of Objective Speech Quality Metrics on Languages Beyond Validation Data: A Study of Turkish and Korean
More Info
expand_more
Abstract
This study investigates the performance of two objective speech quality metrics, Perceptual Evaluation of Speech Quality (PESQ) and Virtual Speech Quality Objective Listener (ViSQOL), in predicting human-rated speech quality scores, which are essential for telecommunication systems' Quality of Experience (QoE). These metrics have been validated using a limited number of languages due to the insufficiency of labeled data with human-rated scores. This research focuses on the applicability of PESQ and ViSQOL in Turkish and Korean, two languages that were not part of the validation data for calibrating these metrics.
The experiment used English as the baseline language for comparison, and the results showed that Turkish samples had higher average ViSQOL scores, with the difference being statistically significant compared to the English samples. Furthermore, Turkish male speakers had the highest correlation between PESQ and ViSQOL scores, and ViSQOL rated speech higher than PESQ, especially under babble noise degradations. Future research should focus on extending this study by exploring biases across additional metrics and languages, while also constructing a dataset with labeled subjective scores for more languages to improve the calibration of these metrics.