Dissertation and Thesis
Permanent URI for this communityhttp://164.52.219.250:4000/handle/10263/2146
Browse
5 results
Search Results
Item Enhancing Confidence Calibration in Long-Tailed Recognition(Indian Statistical Institute, Kolkata, 2024-06) Jana, SasankaDeep neural networks often struggle with heavily class-imbalanced training datasets. Recently, two-stage methods have been developed to separate representation learning from classifier learning, aiming to enhance performance. However, the crucial issue of miscalibration remains. To tackle this, we introduce novel methods to improve both calibration and performance in such scenarios. Recognizing that predicted probability distributions of classes are closely tied to the number of class instances, we propose label-aware smoothing with balanced softmax. This strategy tackles the issue of differing levels of over-confidence among different categories, thereby improving the learning process of classifiers. Furthermore, to counteract potential bias in the dataset between the two stages caused by different sampling techniques, we incorporate shifted batch normalization into the decoupling framework. The methods we suggest have set fresh standards on numerous prevalent longtailed recognition datasets such as CIFAR10-LT and CIFAR100-LT.Item Learning with Long-Tailed Noisy Labels(Indian Statistical Institute, Kolkata, 2024-06) Dey, SarbajitDeep neural networks (DNNs) have shown exceptionally good performance in a variety of activities by using correctly labelled and ’good’ training datasets. These remarkable results, however, are mostly observed with datasets that are carefully controlled and precisely structured. Conversely, data obtained from real-world applications frequently encounter substantial problems that are not commonly found in these ’good’ datasets. Two common biases frequently found in real-world data are: (i) long-tailed class distribution, where a small number of classes have a significant number of instances while the rest have only a few, and (ii) label noise, which refers to inaccuracies and errors in the assigned data labels. When learning models are specifically built to address only one of these biases, either by focusing on the long-tailed nature of the data or on the noise in the labels, their performance declines when they come across data that has both long-tailed distribution and noisy labels, which is a very common occurrence in real-world applications. This work investigates the complex issue of learning from datasets with long-tailed label noise. In real-world problems such as autonomous driving, medical diagnosis, and large-scale user-generated content platforms, the data obtained frequently shows these properties. Therefore, it is essential to create strong learning algorithms that can successfully address both problems at the same time. Our objective is to study and make meaningful contributions to the progress of deep learning methods that can effectively handle real-world data difficulties while being robust and dependable. We study the current methods for handling these learning problems, focuse on their shortcomings and try to improve the same. We propose Median of Means for centroid estimation on a clean subset of the dataset. We then use the SFA framework and Semi supervised learning for classification task on imbalanced noisy labels.Item Dealing with classification irregularities in real-world scenarios(Indian Statistical Institute, Kolkata, 2020-07) Sadhukhan, PayelClassification of objects is a basic chore of machine intelligence. Over the years, a number of classifiers from different genres have been developed by the machine learning community. To increase the pertinence of machine learning algorithms in human lives, we have to work on the interface of algorithm design and its utility. Traditional classifiers are designed on the basis of a number of assumptions like i] well-balanced class cardinalities, ii] membership of an instance to more than one overlapping classes, iii] an equal number of classes in the training and test phase and more. A classifier fails to perform optimally or meaningfully or both, whenever there is a breach of one or more of these assumptions. Interestingly, datasets from a number of real-world domains have shown to possess many of these. This dissertation is motivated to address the three above-mentioned assumptions and accomplish purposeful learning of the data. Class imbalance is the quantitative disproportion between the cardinalities of some or all classes of a dataset. For a two-class scenario, the class with a significantly higher number of instances is termed as the majority class whereas the other is the minority class. While training a traditional classifier with class-imbalanced data, usually the classifier is found to get biased towards the quantitatively abundant class. In one of our work, we handle the class imbalance problem by estimating the minority set and consequently adding synthetic minority points from the estimated set to decrease the difference in cardinality. Multi-label nature, the membership of a feature vector to two or more labels is another addition in recent years. Though the instance set (feature values) is the same across all labels, the positive and negative class partition varies from label to label. Extraction of a label-specific feature set is an efficacious solution to this problem. This is the motivation of the second work of this thesis. Furthermore, the multi-label datasets suffer from the problem of class-imbalance. The degree of class-imbalance varies from label to label which further aggravates the problem. In the third work of this thesis, we have addressed the class imbalance aspect of multi-label datasets. Lastly, we handle open set classification. In open set classification, we have to correctly classify the instances belonging to the known class (seen during) besides detecting the instances belonging to the unknown class (class unseen during training). On encountering such a problem, extant classifiers classify the unknown class instances into one of the training classes, which it should not. We propose a a scheme where our classifier rejects a test instance as unknown besides the usual known class classifications (and the two happen simultaneously, as a consequence of the scheme itself).Item Classification of normal and fatty liver ultrasound images(Indian Statistical Institute, Kolkata, 2009) Nanda, Manoj KumarItem Distribution theory of some classification statistics(Indian Statistical Institute,Calcutta, 1961-12) John, S
