Browsing by Author "Bose, Kushal"

Now showing 1 - 2 of 2

Estimation of Error Bound for k-Nearest Neighbor Classi er on Multi Class Data Sets
(Indian Statistical Institute, Kolkata, 2019-07) Bose, Kushal
A motivational problem that arises in machine learning is to estimate out-of-sample error rate of a k-nearest neighbor classi er. Without having any prior knowledge of distribution or any assumption of distribution it is required to estimate the maximum probability of misclassi cation of an unlabeled sample. Previous works include the assumption on data distribution as identical and independent distribution (i.i.d.). This method works for binary classi cation only. Our proposed algorithm is applicable for any data sets without having any knowledge of the underlying data distribution. Our algorithm will search the misclassi cation region in the data set and calculate the bound for an unlabeled test sample. Our method will always detect the class overlapping region within the data set irrespective of balanced and imbalanced. Our method is also designed for both two-class and multi-class data sets. Our experiments includes the bound validation for di erent scenarios. We have tested for random k value and xed range of k. Also veri ed for balanced and imbalanced data sets. We also demonstrated to nd optimal sets of k values where classi er error will be minimized.
Innovations in Graph Neural Network Design: Addressing Oversmoothing, Heterophily, and Information Propagation
(Indian Statistical Institute, 2026-04-13) Bose, Kushal
In an unstructured learning paradigm, Graph Neural Networks (GNNs) adeptly tackle graph data like social networks, molecules, transaction networks, etc. In the primitive stage, GNNs are designed to be shallow, comprising two or three layers. Emulating the success of deep CNNs, deep GNNs are also proposed by stacking multiple layers. Those multi-layered GNNs are pivotal in enabling long-range interactions where multi-hop neighbors carry significant information, like molecular property prediction. Yet, the multi-layered GNNs face challenges of Oversmoothing, where node features become indistinguishable due to the recursive nature of message passing. In the second chapter of the thesis, we propose a non-recursive message passing technique to address oversmoothing. Our method explores random paths and computes path features, and those are subsequently aggregated to update the node features. The multi-hop message passing also depends on the homophily or heterophily settings of the network. GNNs typically perform better in homophilic settings where adjacent nodes share identical class labels. Conversely, the performance of GNNs is exacerbated in the heterophilic networks where adjacent nodes may have different class labels. In the third chapter of the thesis, we address the challenges of graph heterophily by rewiring the graph topology. We learn the similarity scores of the edges obtained from the autoencoder-based class representations. The impressive performances on heterophilic benchmarks reaffirm the superiority of our approach. We also study the effects of rewiring special edges like self-loops and parallel edges. In the fourth chapter of the thesis, we investigate the effects of the addition of self-loops and parallel edges on the eigenvalues of the graph Laplacian. Empirically, we observe that the gradual addition of self-loops or parallel edges generates performance trends (either increasing or decreasing) on the heterophilic graphs. This work offers insights into the graph spectrum based on the observed performance trends, bypassing the need to execute expensive eigenvalue decomposition. The deep GNNs also suffer from Oversquashing, an information bottleneck arises due to the requirement of storing exponentially growing information into fixed capacity channels. In the fifth chapter of the thesis, we propose asynchronous message passing to utilize fixed-capacity channels in a time-dependent access. This prevents the capacity constraints and ultimately overcomes oversquashing. We achieved commendable performances on the REDDIT-BINARY and Peptides-struct datasets. To mitigate both oversmoothing and oversquashing, Graph Transformers (GTs) come into the scenario to enable pair-wise message passing across the network. Precisely, GT incorporates structural information of the underlying graph datasets via positional encodings. In the sixth chapter of the thesis, we designed a novel and efficient positional encoding that is learnable and maps the encodings into hyperbolic spaces. Our positional encodings are expressive and efficiently capture hierarchical structures embedded in the molecular graphs, which is validated by extensive theoretical underpinnings. We further demonstrate that hyperbolic positional encodings, when added with features in final layers, diminish the effects of oversmoothing. We achieved superior performance on MNIST and OGBG-MOLHIV graphs by employing hyperbolic positional encodings. In the seventh chapter of the thesis, we shed light on the potential future research avenues and scope in the domain of GNN.