Browsing by Author "Panda, Aditya"

Now showing 1 - 2 of 2

Discriminative Dictionary Learning by Exploiting Inter-Class Similarity for HEp-2 Cell Classi cation
(Indian Statistical Institute,Kolkata, 2019-07) Panda, Aditya
In this literature we present an algorithm for automatic classi cation of IIF images of HEp-2 cells into relevant classes. Our algorithm is majorly based on the \Dictionary Learning" algorithm and we have rede ned it's objective function to suit our purpose. The major di culty in HEp-2 cell image classi cation lies in it's low inter-class variability and substantial intra-class variations. To address these issues, we have modi ed the objective function of \Dictionary Learning" to learn inter-class features. Moreover, we used a local feature extractor based pre-processing stage and also a \spatial decomposition" classi er set-up for better classifying test images. We evaluated our algorithm on three most widely accepted bamechmark data-sets for HEp-2 cell classi cation, ICPR 2012, ICIP 2013 and SNP data-sets. Proposed algorithm has achieved superior results than other popular dictionary learning algorithms for HEp-2 cell classi cation. Moreover, when comparing with other algorithms for HEp-2 cell classi cation, including the winners of ICPR 2012, ICIP 2013 and SNP data-set, we show that proposed algorithm reports very competitive result. Though our proposed algorithm is designed to be application speci c to HEp-2 cell, still we evaluated its performance on another popular benchmark data-set, \Diabetic Retinopathy" data-set. Our algorithm provided higher accuarcy than other state-ofthe- art algorithms on that data-set too.
On Zero-Shot Recognition of Unseen State-Object Composition
(Indian Statistical Institute, Kolkata, 2024-09) Panda, Aditya
Compositional Zero-Shot Learning (CZSL) attempts to recognise images of new (unseen) compositions of states and objects, when images of only a subset of stateobject compositions are available as training data. Thus a CZSL model should recognise a young dog when the model has seen images of the state-object compositions young bear, old bear and old dog. There are multiple challenges to solve the CZSL problem. It is difficult to disentangle the visual features of object dog and its state young from its compositional image young dog. The features of a state are observed to have high variation in visual features across compositions. For example, the state sliced has different visual features in compositions sliced apple and sliced tomato. In the second chapter of the thesis, we attempt to disentangle the visual features of state and object using a two-stage sequential recognition approach. In next chapter of the thesis, we work on the open-world CZSL problem where no prior information about the feasibility of a state-object composition is available. We use a Graph Convolutional Network based architecture along with a frequency-based feasibility prediction approach for the open-world CZSL problem. Another challenge in CZSL lies in the fact that the extent of association between the features of a state and an object vary significantly in different images of the same composition. For example, in different images of peeled orange, the oranges may be peeled to a different extent. Thus the visual features of images of peeled orange may vary. In the fourth chapter, a novel Knowledge-guided Transformer Network is proposed to better process the partial association between the visual features of state and object. In the fifth chapter, we attempt the partially supervised CZSL (pCZSL) problem, where for each state-object compositional image, either the state or the object annotation is available. We propose a novel vision transformer based architecture with Locality Preserving Neighbourhood Aggregation approach in the fifth chapter. Effective identification of the discriminative features of state and object often depends on the scale of the object in the image. For example, in the images of the two compositions, young bear and old bear, the identification of the states young and old may depend on recognising the scale (or size) of the object bear in the image. In the sixth chapter, we leverage Vision Language Model (VLM) to estimate the scale-aware features in CZSL. Extensive experiments on C-GQA, MIT-States and UT-Zappos50k datasets demonstrate the effectiveness of the approaches in this thesis, when compared to the stateof- the-art in the closed-world CZSL, open-world CZSL and pCZSL settings. As concluding remarks, we discuss the future scope of research in CZSL.