Comparative Analysis on Different Feature Selection
No Thumbnail Available
Date
2024-07
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Indian Statistical Institute, Kolkata
Abstract
In this research, we propose a comprehensive framework for uncovering hidden patterns, selecting optimal features, and reducing dimensionality in large datasets, particularly focusing on 10K x 10K dimensional data. Traditional methods often struggle to efficiently handle such vast datasets due to computational constraints and information overload. To address this challenge, we introduce three innovative approaches leveraging deep neural networks (DNNs) and recurrent neural networks (RNNs) to enhance pattern identification, feature selection, and dimensionality reduction.
Firstly, we develop a DNN-based framework tailored to identifying hidden patterns within extensive datasets. By harnessing the representational power of deep neural networks, our framework systematically uncovers intricate relationships and structures among observations, allowing for the extraction and preservation of unique patterns for future use.
Secondly, we propose an optimal feature selection framework designed to efficiently navigate through the entire feature set and identify the most informative subset.
Leveraging advanced optimization techniques, our approach intelligently selects features that maximize predictive performance while minimizing redundancy, thus enhancing model interpretability and computational efficiency.
Thirdly, we introduce an autoencoder-based dimension reduction method aimed at effectively reducing the dimensionality of the dataset without sacrificing crucial information. By employing the encoding phase of an autoencoder architecture, we compress the input data into a lower-dimensional latent space, significantly reducing the number of features. Notably, our approach preserves the essential characteristics of the original data, ensuring minimal information loss.
Lastly, we propose utilizing RNNs/LSTMs as an alternative to Markovian transition models, particularly addressing the limitations associated with the "memoryless" property. By harnessing the sequential nature of RNNs, our framework enables the generation of state transition probabilities with greater user control and flexibility, making it well-suited for real-life applications where memory and context play crucial roles.Overall, our proposed framework offers a comprehensive solution for efficiently analyzing large-scale datasets, empowering researchers and practitioners to extract meaningful insights, make informed decisions, and advance various domains, including finance, healthcare, and engineering.
Description
Dissertation under the guidance of Prof. Indranath Chatterjee and Prof. Debrup Chakroborty
Keywords
Comparative Analysis, LSTM
Citation
15p.
