J.D. Vargas Quiros | TU Delft Repository

Towards the Automatic Assessment of Social Experience in in-the-wild Mingling Settings

Doctoral thesis (2024) - J.D. Vargas Quiros (author) , MJT Reinders (promotor) , H.S. Hung (copromotor) , Laura Cabrera Quiros (copromotor)

Endowing machines with social competence is not only a science fiction theme. It is also a long-held goal in computer science. Machines have changed how we work, communicate, and do art, science, and engineering, but they have had little effect on one of our core human needs: soc ...

Endowing machines with social competence is not only a science fiction theme. It is also a long-held goal in computer science. Machines have changed how we work, communicate, and do art, science, and engineering, but they have had little effect on one of our core human needs: social interaction. Although digital communication has changed the way we interact with others, machines have arguably done little to enhance the quality of our face-to-face interactions and are seldom seen as tools to help us improve the way we interact with others. This is in part due to their lack of social competence thus far.

A crucial stepping stone towards social competence and the ability to display empathy is the ability to assess social experience. Social experience refers to internal states reflecting an individual's perception of a social situation, like enjoying a conversation or feeling attracted to someone they are interacting with. Social experience variables are hard to study because they are not directly observable and change over time. Researchers must rely on self-reports or third-party assessments (annotations). Algorithms for assessment of social experience generally take one of two approaches: 1) Direct modeling of the relationship between raw/derived signals and experience variables, utilizing sensor readings or outputs of detectors and feature extractors; and 2) intermediate modeling/detection of discrete actions performed during interactions (ie. speaking, laughter, gesturing).

In this thesis dissertation, we focus on in-the-wild mingling setting, where subjects are standing and are free to form and switch conversation groups as they desire. Data collection and annotation are paid special attention due to their relevance in a nascent field and the nuance involved in collecting and annotating social signals. Because the goal is to study machine social perception in real-life settings, interactions are not scripted and instrumentation is kept to a minimum.

We start with work concerned with the direct assessment of social experience, in this case of attraction, by exploring the predictive power of body acceleration. By analyzing accelerometer data from speed dating interactions, we investigate how the intensity and variations in body movement relate to self-reported attraction levels. This study sheds light on the predictive power of synchrony, mimicry, and convergence estimates for predicting attraction, and potentially other constructs related to affiliation.

We then address the detection of speaking, an action of wide interest in social signal processing due to the relevance of turn-taking in social experience. We address the limitations posed by visual cross-contamination in crowded mingling settings. We introduce a model that employs accelerometer readings and body poses to enhance the robustness of speaking status detection in a complex scene, with multiple interactions occurring simultaneously.

The dissertation also presents two novel datasets: ConfLab and REWIND, each serving a unique purpose. ConfLab, collected during a conference, is notable for its annotations of body joints, and improvements to the sensor setup resulting in increased data fidelity. Such methodological contributions to enable efficient and high-quality data collection are increasingly valuable given the scarcity of social interaction datasets, particularly in mingling settings. REWIND, gathered at a business networking event, stands out with its high-quality individual audio recordings, useful for the cross-modal study of multimodal signals such as speaking or laughter.

In a similar line, we present the Covfee software framework. Covfee challenges existing annotation methodologies by introducing and studying interfaces for continuous annotation for keypoints and actions. This framework was instrumental in efficiently processing the vast amounts of data collected in studies like ConfLab by streamlining the annotation process.

Also building on the Covfee framework, the dissertation culminates in an exploration of laughter annotation across different modalities. By comparing laughter annotations acquired in different conditions, the research highlights the complexities and nuances involved in interpreting social signals across different sensory inputs. We challenge the assumption that laughter intensity should be considered a property of the laughter episode. Instead, we find evidence that laughter evaluations differ significantly depending on the modalities available to the observer and that modalities with higher agreement will not necessarily result in the highest model performance. These results not only contribute to the study of laughter detection but also provide valuable insights for future research on multimodal social signal processing.

In summary, this dissertation weaves together a series of methodological contributions and novel findings, often derived from these new methods, each contributing to further our understanding of how to best train machines for social understanding and competence.@en

The OpenVIMO Platform

A Tutorial on Building and Managing Large-scale Online Experiments involving Videoconferencing

Conference paper (2024) - Bernd Dudzik (author) , Jose Vargas Quiros (author)

Online experiments leveraging video conferencing offer significant advantages for studying human social interactions, including enhanced participant diversity and scalability. However, challenges include complex adjustments, privacy risks, software requirements, limited customiza ...

Individual and joint body movement assessed by wearable sensing as a predictor of attraction in speed dates

Journal article (2023) - J.D. Vargas Quiros (author) , Oyku Kapcak (author) , Hayley Hung (author) , L.C. Cabrera Quiros (author)

Interpersonal attraction is known to motivate behavioral responses in the person experiencing this subjective phenomenon. Such responses may involve the imitation of behavior, as in mirroring or mimicry of postures or gestures, which have been found to be associated with the desi ...

Impact of Annotation Modality on Label Quality and Model Performance in the Automatic Assessment of Laughter In-the-Wild

Journal article (2023) - Jose Vargas-Quiros (author) , Laura Cabrera Quiros (author) , Catherine Oertel (author) , HS Hung (author)

Although laughter is known to be a multimodal signal, it is primarily annotated from audio. It is unclear how laughter labels may differ when annotated from modalities like video, which capture body movements and are relevant in in-the-wild studies. In this work we ask whether an ...

Multimodal data collection for social interaction analysis in-the-wild

Conference paper (2019) - HS Hung (author) , C.A. Raman (author) , Ekin Gedik (author) , S. Tan (author) , J.D. Vargas-Quiros (author)

The benefits of exploiting multi-modality in the analysis of human-human social behaviour has been demonstrated widely in the community. An important aspect of this problem is the collection of data-sets that provide a rich and realistic representation of how people actually soci ...

Estimating Romantic, Social, and Sexual Attraction by Quantifying Bodily Coordination using Wearable Sensors

Conference paper (2019) - Öykü Kapcak (author) , Jose Vargas-Quiros (author) , H.S. Hung (author)

In this paper we introduce a novel method of estimating romantic, social and sexual attraction between two people by quantifying their bodily coordination using wearable sensors in a speed-date setting. We developed simple synchrony and convergence features, inspired from the lit ...