Learning with Long-Tailed Noisy Labels
No Thumbnail Available
Date
2024-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Indian Statistical Institute, Kolkata
Abstract
Deep neural networks (DNNs) have shown exceptionally good performance in a variety
of activities by using correctly labelled and ’good’ training datasets. These
remarkable results, however, are mostly observed with datasets that are carefully
controlled and precisely structured. Conversely, data obtained from real-world applications
frequently encounter substantial problems that are not commonly found
in these ’good’ datasets. Two common biases frequently found in real-world data
are: (i) long-tailed class distribution, where a small number of classes have a significant
number of instances while the rest have only a few, and (ii) label noise, which
refers to inaccuracies and errors in the assigned data labels.
When learning models are specifically built to address only one of these biases, either
by focusing on the long-tailed nature of the data or on the noise in the labels,
their performance declines when they come across data that has both long-tailed
distribution and noisy labels, which is a very common occurrence in real-world applications.
This work investigates the complex issue of learning from datasets with long-tailed
label noise. In real-world problems such as autonomous driving, medical diagnosis,
and large-scale user-generated content platforms, the data obtained frequently
shows these properties. Therefore, it is essential to create strong learning algorithms
that can successfully address both problems at the same time.
Our objective is to study and make meaningful contributions to the progress of deep
learning methods that can effectively handle real-world data difficulties while being
robust and dependable. We study the current methods for handling these learning
problems, focuse on their shortcomings and try to improve the same.
We propose Median of Means for centroid estimation on a clean subset of the
dataset. We then use the SFA framework and Semi supervised learning for classification
task on imbalanced noisy labels.
Description
Dissertation under the supervision of Dr. Swagatam Das
Keywords
Classification, Class imbalance, Noisy labels, SFA, Median of Means
Citation
52p.
