Climate change is a heated discussion topic in public arenas such as social media. Both texts and visuals play key roles in the debate, as they can complement, contradict, or reinforce each other in nuanced ways. It is therefore urgently needed to study the messages as multimodal
...
Climate change is a heated discussion topic in public arenas such as social media. Both texts and visuals play key roles in the debate, as they can complement, contradict, or reinforce each other in nuanced ways. It is therefore urgently needed to study the messages as multimodal objects to better understand the polarized debate about climate change impacts and policies. Multimodal representation models such as CLIP are known to be able to transfer knowledge across domains and modalities, enabling the investigation of textual and visual semantics together. Yet they are not directly able to distinguish the nuances between supporting and sceptic climate change stances. This paper explores a simple but effective strategy combining modality fusion and domain-knowledge enhancing to prepare CLIP-based models with knowledge of climate change stances. A multimodal Dutch Twitter dataset is collected and experimented with the proposed strategy, which increased the macro-average F1 score across stances from 51% to 86%. The outcomes can be applied in both data science and public policy studies, to better analyse how the combined use of texts and visuals generates meanings during debates, in the context of climate change and beyond.@en