A Machine Learning-Based Approach to Enhancing Label Aggregation in Crowdsourcing

Himanshu Suyal; Avtar Singh

doi:10.14201/adcaij.32757

Abstract

Crowdsourcing is the most effective means of obtaining labelled data for supervised machine learning. However, the varying expertise of crowd workers often results in noisy annotations. While traditional label aggregation methods attempt to handle label noise, they typically overlook the relationships between different data instances. Moreover, crowdsourced datasets often experience class imbalance, wherein predominant classes eclipse minority classes, hence exacerbating label accuracy issues. This paper proposed a Reliability-Weighted Bayesian Label Aggregation (RWBLA) to overcome the above challenges. First, the K-nearest neighbours (KNN) method is used to improve the label set for each instance by augmenting labels from its closest neighbours, resulting in multiple noisy label sets. Next, it improves the aggregation by assigning weights to the neighbouring labels based on worker reliability, adaptive distance, and label similarity for each instance. In addition, worker reliability is determined dynamically based on the neighbourhood information, and to handle the imbalance issue, label similarity is modified. In the end, a weighted Bayesian inference method is used to infer the correct label for each instance. The performance of the proposed approach is evaluated on 20 synthetics and three real-world crowdsourcing datasets. It shows that RWBLA consistently surpasses eight baseline label aggregations, improving aggregation accuracy by 3 % to 13 %. Moreover, in analyses utilizing imbalanced real-world crowdsourcing data, RWBLA surpassed state-of-art aggregation algorithms by 2 % to 6 %, underscoring its efficacy in situations with minor class imbalances.

Affiliated Institutions

Related Publications

Multi-Label Image Recognition With Graph Convolutional Networks

Zhao-Min Chen , Xiu-Shen Wei , Peng Wang +1 more

The task of multi-label image recognition is to predict a set of object labels that present in an image. As objects normally co-occur in an image, it is desirable to model the l...

2019 1170 citations

Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels

Bo Han , Quanming Yao , Xingrui Yu +5 more

Deep learning with noisy labels is practically challenging, as the capacity of deep models is so high that they can totally memorize these noisy labels sooner or later during tr...

2018 Neural Information Processing Systems 1215 citations

Unsupervised Feature Learning via Non-parametric Instance Discrimination

Zhirong Wu , Yuanjun Xiong , Stella X. Yu +1 more

Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so. We study whether...

2018 3435 citations

3D Mesh Labeling via Deep Convolutional Neural Networks

Kan Guo , Dongqing Zou , Xiaowu Chen

This article presents a novel approach for 3D mesh labeling by using deep Convolutional Neural Networks (CNNs). Many previous methods on 3D mesh labeling achieve impressive perf...

2015 ACM Transactions on Graphics 225 citations

Harmony potentials for joint classification and segmentation

Josep M. Gonfaus , Xavier Boix , Joost van de Weijer +3 more

Hierarchical conditional random fields have been successfully applied to object segmentation. One reason is their ability to incorporate contextual information at different scal...

2010 156 citations

Publication Info

Year: 2025
Type: article
Volume: 14
Pages: e32757-e32757
Citations: 0
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

A Machine Learning-Based Approach to Enhancing Label Aggregation in Crowdsourcing

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Cite This

APA Style

                            
                                    Himanshu Suyal, 
                                
                                    Avtar Singh
                                
                            (2025). 
                            A Machine Learning-Based Approach to Enhancing Label Aggregation in Crowdsourcing. 
                            ADCAIJ ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL
                            , 14
                            
                            , e32757-e32757.
                            https://doi.org/10.14201/adcaij.32757

Identifiers

DOI: 10.14201/adcaij.32757