Towards creating a conversational memory for long-term meeting support

predicting memorable moments in multi-party conversations through eye-gaze

Conference paper (2022)

Authors

Maria Tsfasman

Kristian Fenech Eötvös University

M. Tarvirdians Interactive Intelligence -

Andras Lorincz Eötvös University

C.M. Jonker Universiteit Leiden, Interactive Intelligence -

Catharine Oertel Interactive Intelligence -

Research Group

Interactive Intelligence () (TU Delft)

DOI: https://doi.org/10.1145/3536221.3556613

Conversational memory Multi-modal corpora Multi-party interaction Social signals

To reference this document use:

http://resolver.tudelft.nl/uuid:668a53e9-1688-433b-9934-eb0de73dc89f

More Info

expand_more

Published Date

2022

Language

English

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Intelligent Systems

Research Group

Interactive Intelligence

Abstract

When working in a group, it is essential to understand each other's viewpoints to increase group cohesion and meeting productivity. This can be challenging in teams: participants might be left misunderstood and the discussion could be going around in circles. To tackle this problem, previous research on group interactions has addressed topics such as dominance detection, group engagement, and group creativity. Conversational memory, however, remains a widely unexplored area in the field of multimodal analysis of group interaction. The ability to track what each participant or a group as a whole find memorable from each meeting would allow a system or agent to continuously optimise its strategy to help a team meet its goals. In the present paper, we therefore investigate what participants take away from each meeting and how it is reflected in group dynamics.As a first step toward such a system, we recorded a multimodal longitudinal meeting corpus (MEMO), which comprises a first-party annotation of what participants remember from a discussion and why they remember it. We investigated whether participants of group interactions encode what they remember non-verbally and whether we can use such non-verbal multimodal features to predict what groups are likely to remember automatically. We devise a coding scheme to cluster participants' memorisation reasons into higher-level constructs. We find that low-level multimodal cues, such as gaze and speaker activity, can predict conversational memorability. We also find that non-verbal signals can indicate when a memorable moment starts and ends. We could predict four levels of conversational memorability with an average accuracy of 44 %. We also showed that reasons related to participants' personal feelings and experiences are the most frequently mentioned grounds for remembering meeting segments.

Files

3536221.3556613.pdf

(pdf | 4.26 Mb)