Comparative Analysis on Different Feature Selection

No Thumbnail Available

Date

2024-07

Journal Title

Journal ISSN

Volume Title

Publisher

Indian Statistical Institute, Kolkata

Abstract

In this research, we propose a comprehensive framework for uncovering hidden patterns, selecting optimal features, and reducing dimensionality in large datasets, particularly focusing on 10K x 10K dimensional data. Traditional methods often struggle to efficiently handle such vast datasets due to computational constraints and information overload. To address this challenge, we introduce three innovative approaches leveraging deep neural networks (DNNs) and recurrent neural networks (RNNs) to enhance pattern identification, feature selection, and dimensionality reduction. Firstly, we develop a DNN-based framework tailored to identifying hidden patterns within extensive datasets. By harnessing the representational power of deep neural networks, our framework systematically uncovers intricate relationships and structures among observations, allowing for the extraction and preservation of unique patterns for future use. Secondly, we propose an optimal feature selection framework designed to efficiently navigate through the entire feature set and identify the most informative subset. Leveraging advanced optimization techniques, our approach intelligently selects features that maximize predictive performance while minimizing redundancy, thus enhancing model interpretability and computational efficiency. Thirdly, we introduce an autoencoder-based dimension reduction method aimed at effectively reducing the dimensionality of the dataset without sacrificing crucial information. By employing the encoding phase of an autoencoder architecture, we compress the input data into a lower-dimensional latent space, significantly reducing the number of features. Notably, our approach preserves the essential characteristics of the original data, ensuring minimal information loss. Lastly, we propose utilizing RNNs/LSTMs as an alternative to Markovian transition models, particularly addressing the limitations associated with the "memoryless" property. By harnessing the sequential nature of RNNs, our framework enables the generation of state transition probabilities with greater user control and flexibility, making it well-suited for real-life applications where memory and context play crucial roles.Overall, our proposed framework offers a comprehensive solution for efficiently analyzing large-scale datasets, empowering researchers and practitioners to extract meaningful insights, make informed decisions, and advance various domains, including finance, healthcare, and engineering.

Description

Dissertation under the guidance of Prof. Indranath Chatterjee and Prof. Debrup Chakroborty

Keywords

Comparative Analysis, LSTM

Citation

15p.

Endorsement

Review

Supplemented By

Referenced By