Investigating Data Collection and Reporting Practices of Human Annotations in Societally Impactful Machine Learning Applications
A Systematic Review of Top-Cited IEEE Access Papers
More Info
expand_more
Abstract
This systematic review investigates the practices and implications of human annotations in machine learning (ML) research. Analyzing a selection of 100 papers from the IEEE Access Journal, the study explores the data collection and reporting methods employed. The findings reveal a prevalent lack of standardization and formalization in the annotation process. Key details such as annotation sources, number of annotators, and formal instructions are frequently neglected, possibly compromising the quality and effectiveness of ML algorithms. Domain-specific implications are discussed, highlighting the need for comprehensive annotation practices in areas like medical diagnostics, language processing, and intelligent vehicle systems. The study contributes to the field by emphasizing the importance of standardized procedures and transparency in ML research. Future research is recommended to develop systematic annotation methodologies and examine the impact of subpar annotation on data quality.