Are We All in a Truman Show? Spotting Instagram Crowdturfing through Self-Training

Conference paper (2023)

Authors

Pier Paolo Tricomi Chisito S.r.l., Università degli Studi di Padova

Sousan Tarahomi University of Twente

Christian Cattai Università degli Studi di Padova

Francesco Martini Università degli Studi di Padova

M. Conti Università degli Studi di Padova, Chisito S.r.l.

Affiliation

External organisation

Instagram Collusion Bot Detection Crowdturfing Detection Fake Accounts Fake Engagement Fake Profiles Self-Training Semi-Supervised Learning

To reference this document use:

http://resolver.tudelft.nl/uuid:63784ee8-9417-4ca0-bb56-2c8c1360c114

More Info

expand_more

Published Date

2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Affiliation

External organisation

Abstract

Influencer Marketing generated 16 billion in 2022. Usually, the more popular influencers are paid more for their collaborations. Thus, many services were created to boost profiles' popularity metrics through bots or fake accounts. However, real people recently started participating in such boosting activities using their real accounts for monetary rewards, generating ungenuine content that is extremely difficult to detect. To date, no works have attempted to detect this new phenomenon, known as crowdturfing (CT), on Instagram. In this work, we propose the first Instagram CT engagement detector. Our algorithm leverages profiles' characteristics through semi-supervised learning to spot accounts involved in CT activities. Compared to the supervised approaches used so far to identify fake accounts, semi-supervised models can exploit huge quantities of unlabeled data to increase performance. We purchased and studied 1293 CT profiles from 11 providers to build our self-training classifier, which reached 95% F1-score. We tested our model in the wild by detecting and analyzing CT engagement from 20 mega-influencers (i.e., with more than one million followers), and discovered that more than 20 % was artificial. We analyzed the CT profiles and comments, showing that it is difficult to detect these activities based solely on their generated content.