Development of Some Scalable Pattern Recognition Algorithms for Real Life Data Analysis
No Thumbnail Available
Date
2017-11-20
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
A huge amount of data is being generated continuously as a result of recent advancement
and wide use of high-throughput technologies. With the rapid increase in size of data
distributed worldwide, understanding the data has become critical. In this regard, dimensionality
reduction and clustering have become the necessary preprocessing steps of
multiple research areas and applications. One of the important problems of real life large
data sets is uncertainty. Some of the sources of this uncertainty include imprecision in
computation and vagueness in class denitions. The uncertainty may also be present in
the denition of class membership function.
In this background, the thesis addresses the problem of dimensionality reduction and
clustering of real life data sets, in the presence of noise and uncertainty. The thesis rst
presents the problem of feature selection using both type-1 and interval type-2 fuzzyrough
sets, which are eective for dimensionality reduction of real life data sets when
uncertainty is present in the data set. The properties of fuzzy-rough sets allow greater
exibility in handling noisy and real valued data. While the concept of lower approximation
and boundary region of rough sets deals with uncertainty, incompleteness, and vagueness
in class denition, the use of either type-1 or interval type-2 fuzzy sets enables ecient
handling of overlapping classes in uncertain environment. Moreover, a new concept of
\simultaneous attribute selection and feature extraction" is introduced for dimensionality
reduction, integrating judiciously the merits of both feature selection and extraction.
A scalable rough-fuzzy clustering algorithm is introduced for large real life data sets,
where the theory of rough hypercuboid approach, interval type-2 fuzzy sets, and c-means
algorithm are integrated judiciously to handle the uncertainty present in a data set. While
the concept of rough hypercuboid approach deals with uncertainty, incompleteness, and
vagueness in cluster denition, the use of fuzzy membership of interval type-2 fuzzy sets
in the boundary region of a cluster enables ecient handling of overlapping partitions in
uncertain environment. Finally, the application of both clustering and feature selection
algorithms is demonstrated by grouping functionally similar microRNAs from microarray
data. The proposed approach can automatically select the optimum set of features while
clustering the microRNAs, making the complexity of the algorithm lower.
Description
This thesis is under the supervision of Prof. Pradipta Maji
Keywords
Dimensionality Reduction, Feature Selection, Clustering, Fuzzy-Rough Sets, Interval Type-2 Fuzzy Sets, Uncertainty Handling, Rough Hypercuboid Clustering, MicroRNA Microarray Analysis
Citation
292p.
