Dissertation and Thesis
Permanent URI for this communityhttp://164.52.219.250:4000/handle/10263/2146
Browse
12 results
Search Results
Item Development of Some Scalable Pattern Recognition Algorithms for Real Life Data Analysis(2017-11-20) Garai, ParthaA huge amount of data is being generated continuously as a result of recent advancement and wide use of high-throughput technologies. With the rapid increase in size of data distributed worldwide, understanding the data has become critical. In this regard, dimensionality reduction and clustering have become the necessary preprocessing steps of multiple research areas and applications. One of the important problems of real life large data sets is uncertainty. Some of the sources of this uncertainty include imprecision in computation and vagueness in class denitions. The uncertainty may also be present in the denition of class membership function. In this background, the thesis addresses the problem of dimensionality reduction and clustering of real life data sets, in the presence of noise and uncertainty. The thesis rst presents the problem of feature selection using both type-1 and interval type-2 fuzzyrough sets, which are eective for dimensionality reduction of real life data sets when uncertainty is present in the data set. The properties of fuzzy-rough sets allow greater exibility in handling noisy and real valued data. While the concept of lower approximation and boundary region of rough sets deals with uncertainty, incompleteness, and vagueness in class denition, the use of either type-1 or interval type-2 fuzzy sets enables ecient handling of overlapping classes in uncertain environment. Moreover, a new concept of \simultaneous attribute selection and feature extraction" is introduced for dimensionality reduction, integrating judiciously the merits of both feature selection and extraction. A scalable rough-fuzzy clustering algorithm is introduced for large real life data sets, where the theory of rough hypercuboid approach, interval type-2 fuzzy sets, and c-means algorithm are integrated judiciously to handle the uncertainty present in a data set. While the concept of rough hypercuboid approach deals with uncertainty, incompleteness, and vagueness in cluster denition, the use of fuzzy membership of interval type-2 fuzzy sets in the boundary region of a cluster enables ecient handling of overlapping partitions in uncertain environment. Finally, the application of both clustering and feature selection algorithms is demonstrated by grouping functionally similar microRNAs from microarray data. The proposed approach can automatically select the optimum set of features while clustering the microRNAs, making the complexity of the algorithm lower.Item Development of some Neural Network Models for Non-negative Matrix Factorization: Dimensionality Reduction(Indian Statistical Institute, Kolkata, 2025-01) Dutta, PrasunRecent research has been driven by the abundance of data, leading to the develop- ment of systems that enhance understanding across various fields. Effective machine learning algorithms are crucial for managing high-dimensional data, with dimension reduction being a key strategy to improve algorithm efficiency and decision-making. Non-negative Matrix Factorization (NMF) stands out as a method that transforms large datasets into interpretable, lower-dimensional forms by decomposing a matrix with non-negative elements into a pair of non-negative factors. This approach addresses the curse of dimensionality by dimensionally reducing data while preserving meaningful information. Dimension reduction techniques rely on extracting high-quality features from large datasets. Machine learning algorithms offer a solution by learning and optimizing fea- ture representations, which often outperform manually crafted ones. Artificial Neural Networks (ANNs) emulate human brain processing and excel in handling complex and nonlinear data relationships. Deep neural network models learn hierarchical patterns from data without explicit human intervention, making them ideal for large datasets. Traditional NMF technique employs block coordinate descent to update input ma- trix factors, whereas, we aim for simultaneous update. Our research work attempts to combine the strengths of NMF and neural networks to develop novel architectures that optimize low-dimensional data representation. We introduce five novel neural net- work architectures for NMF, accompanied by tailored objective functions and learning strategies to enhance the low rank approximation of input matrices in our thesis. In this thesis, first of all, n2MFn2, a model based on shallow neural network architec- ture, has been developed. An approximation of the input matrix has been ensured by the formulation of an appropriate objective function and adaptive learning scheme. Ac- tivation functions and weight initialization strategies have also been adjusted to adapt to the circumstances. On top of this shallow model, two deep neural network models, named DN3MF and MDSR-NMF, have been designed. To achieve the robustness of the deep neural network framework, the models have been designed as a two stage architecture, viz., pre-training and stacking. To find the closest realization of the con- ventional NMF technique as well as the closest approximation of the input, a novel neu- ral network architecture has been proposed in MDSR-NMF. Finally, two deep learning models, named IG-MDSR-NMF and IG-MDSR-RNMF, have been developed to imitate the human-centric learning strategy while guaranteeing a distinct pair of factor ma- trices that yields a better approximation of the input matrix. In IG-MDSR-NMF and IG-MDSR-RNMF the layers not only receive the hierarchically processed input from the previous layer but also refer to the original data whenever needed to ensure that the learning path is correct. A novel kind of non-negative matrix factorization tech- nique known as Relaxed NMF has been developed for IG-MDSR-RNMF, in which only one factor matrix meets the non-negativity requirements while the other one does not. This novel NMF technique allows the model to generate the best possible low dimen- sional representation of the input matrix while the confrontation of maintaining a pair of non-negative factors is removedItem Video Summarization using Neural Networks(Indian Statistical Institute, Kolkata, 2020-07) Kumar, ManishWith the rapid increase in video data, the need for video summarization has increased. it allows us to search and retrieve video data easily. In this work, we have tried to solve the problem of video summary using unsupervised learning. Our method is based on the extraction of key frames. So we rst segmented the shots and extracted key frames from each shot. And these key frames are combined to generate a storyboard. We have experimented with di erent clustering techniques such as k-mean, k-medoid and self-organizing maps. We evaluated our results using a human-generated summary.Item Semi-supervised clustering of stable instances(Indian Statistical Institute, Kolkata, 2018) Sanyal, DeepayanItem On supervised and unsupervised methodologies for mining of text data(Indian Statistical Institute, Kolkata, 2014) Basu, TanmayItem Hierarchical approach to document classification of 20 newsgroup dataset(Indian Statistical Institute, Kolkata, 2016) Dhamija, KanishkaItem Clustering and ranking unit bugs using dynamic invariants(Indian Statistical Institute, Kolkata, 2013) Jain, NehulItem Clustering of web search queries to identify users' intent(Indian Statistical Institute, Kolkata, 2014) Sarkar, SayantanItem Feature sensitive level set for clustering(Indian Statistical Institute, Kolkata, 2004) Gupta, AshishItem Clustering of gene expression data(Indian Statistical Institute, Kolkata, 2004) Venkatraman, V
