Multimodal Context Informed Machine Translation of Manga Using LLMs

More Info
expand_more

Abstract

Large language models have achieved breakthroughs in many natural language processing tasks. One of their main appeals is the ability to tackle problems that lack sufficient training data to create a dedicated solution. Manga translation is one such task, a still budding and underdeveloped field, that at the same time deals with a problem even more complex than standard translation. One of its biggest challenges is successfully incorporating the visual modality in translation, to resolve ambiguities. Recently, a new type of LLMs -- multimodal LLMs -- has emerged and showed potential in understanding visual symbolism in narrative pieces like memes and comics. In this work, we investigate whether these models could be useful in the field of manga translation, ultimately helping manga authors and publishers make their works available to wider audiences. Specifically, we evaluate a number of methods based on GPT-4-Turbo for text only translation, image informed translation and volume-level translation. We perform both automatic and human evaluation. Moreover, we contribute new evaluation data -- the first parallel Japanese-Polish manga translation benchmark. Our findings show that our proposed methods are able to achieve state of the art results for English, and set a new standard for Polish. We conclude that while this is not a sufficient replacement for a professional human translator, it could help speed up the translation process or be used as a learning aid.

Files