Dissertation and Thesis
Permanent URI for this communityhttp://164.52.219.250:4000/handle/10263/2146
Browse
Item 2.5D Dual-Encoder U-Net for Lesion Segmentation in Chest CT Scans(Indian Statistical Institute, Kolkata, 2025-06) Mukkara, JagannathAccurate segmentation of lesions in chest CT scans plays a vital role in diagnosing and monitoring pulmonary diseases such as COVID-19. In this, we introduce a novel 2.5D[1] dual-encoder U-Net model[2] that utilizes both the central slice and its neighboring slices to improve segmentation accuracy while keeping computational demands manageable. Our model incorporates residual connections[3] and feature fusion[4] to effectively merge multi-slice contextual information, overcoming the limitations found in traditional 2D and 3D methods. To ensure a reliable evaluation and avoid data leakage, we used patient-level data splitting. We validate our approach on a carefully curated chest CT dataset, showing enhanced segmentation performance and better generalization compared to standard U-Net models. Through extensive experiments, including ablation studies and visualizations, we demonstrate the advantages of combining 2.5D learning with a dual-encoder architecture for medical image segmentation tasks.Item A Study of the SHA-2 Cryptographic Hash Family(Indian Statistical Institute, Kolkata, 2009-02-01) Sanadhya Somitra KumarItem A1-homotopy types of A2 and A2 \ {(0, 0)}(Indian Statistical Institute, Kolkata, 2024-12) Roy, BimanMorel-Voevodsky developed A^1-homotopy theory which is a bridge between algebraic geometry and algebraic topology. In this thesis we study the A^1-connected component of a smooth variety in great detail. We have shown that the A^1-connected component of a smooth variety contains the information about the existence of affine lines in the variety. Using this and Miyanishi-Sugie's algebraic characterisation, we determine that the affine plane is the only A^1-contractible smooth affine surface over the field of characteristic zero. In the other part of the thesis, we studied the A^1-homotopy type of A^2-{(0,0)}. We showed that over the field of characteristic zero, if an open subvariety of a smooth affine surface is A^1-weakly equivalent to A^2-{(0,0)}, then it is isomorphic to A^2-{(0,0)}.Item ABO blood-group gene frequencies in the Indian sub-continent: a statistical study of patterns of variation(Indian Statistical Institute, Kolkata, 1980) Majumder, Partha PItem Access structures for an image database(Indian Statistical Institute, Kolkata, 1992) Kuila, Sudhansu SekharItem Acyclicity Tests in Classes of Dense Digraphs in Streaming Model(Indian Statistical Institute, Kolkata, 2020-07) Kundu, MadhumitaGraph is a popular model to represent highly structured data which involves entities who have pairwise relations between them. In many applications, computing graph theoretic properties after modelling the entire dataset as graph, provides us interesting informations which gives us insights about the whole dataset. However, in case of application, the datasets in question can be so large that it's di cult to store in the main memory and the dataset can even be dynamic(can change with time). These days in so many applications, the algorithm that requires to solve the problem which takes massive dataset as input, has limitations on time as well as space taken to store the information. These constraints leads us for the development of new techniques. Streaming model of computation takes all these challenges into account and provides us solutions with limited resources in cost of accuracy. Graph stream is a sequence of imcoming edges and we are only allowed to insert(insertion only model) or both insert and delete(dynamic model) into an initially empty graph. Finally our objective is to nd out certain properties of the graph at the end of the stream which minimizes the amount of space the algorithm uses. Sometimes this algorithm needs to provide the trade of between the space usage and the time taken. There is a large volume work on undirected graphs in streaming model but the area of directed graph stream is a pretty unexplored. In this project, we study the problem of testing acyclicity in dense digraphs in semi-streaming model. Here the graph on n vertices is presented as a stream of edges and using O(n polylog(n))-space, we must determine if it is acyclic or notItem Adaptation-Based Classi ers for Handling Some Problems with Multi-Label Data(Indian Statistical Institute, Kolkata, 2022-06) Law, AnweshaThe concept of multi-label (ML) data generalizes the association of instances to classes by labelling each data sample with more than one class simultaneously. Since this data can belong to more than one class at the same time, instances that are multi-label in nature, should not be forcefully assigned a single label. It needs to be handled in its original form. However, various problems arise while dealing with multi-label data. In this thesis, four such issues have been highlighted and dealt with. The first problem is the large input dimension that sometimes occurs in multi-label data. Dimensionality reduction of the features helps to strike a balance between the feature size, the number of samples and the output dimension. The next limitation is that of a complex decision space with overlapping class boundaries. This occurs due to the instances belonging to multiple classes simultaneously. Various approaches such as improving the feature to class mapping, increasing the class separability and simplifying the decision space have been implemented. The third drawback arises due to a large number of classes and label-sets in multi-label data, most of which are under-represented. This emphasizes the problem of class imbalance that widely prevails in multi-label data. This imbalance has been handled through the usage of customized classifiers suitable for the data at hand. Finally, the problem of class correlation is to be handled in this thesis. Multiple classes simultaneously assigned to every instance indicates a possibility of a few classes co-occurring on numerous occasions. These frequently co-occurring classes might have some correlation among them which have been identified and utilized to improve the multi-label classification performance.This thesis addresses the above-mentioned issues to perform efficient multi-label classification. Smaller components that target the individual issues have been incorporated to build large classification models. The first work aims to reduce feature dimensions and learn a better feature to class mapping for the complex decision space. A shallow but fast network known as extreme learning machines (ELMs) has been cascaded with autoencoders (AEs) to propose a network that can handle both issues. Two variations of the network have been proposed. To further explore the overlapping boundaries of ML data, the second contribution increases the separability of the complex decision space and also incorporates dimensionality reduction. Functional link artificial neural network (FLANN) has been adopted here for the unique functional expansion capability that transforms the features to a higher dimension thus making it considerably more separable. After identifying the best configuration of the network, it has then been integrated with autoencoders to reduce the functionally expanded feature dimension and bring additional transformation into the multi-label data. While these classifiers display improved performance, they do not consider the problems of class imbalance or label correlation. Hence, the third work builds a tree of classifiers that handles the problem of class imbalance, simplifies decision space for the ease of learning and preserves label correlations. A novel label-set proximity-based technique has been devised that simplifies boundaries and splits the data while preserving label correlations. Every split is learned by a classifier suited for the balanced or imbalanced data at hand. While handling multiple issues together successfully, this classifier tree model preserves label correlations but does not explicitly use them to improve classification performance. In this regard, the final contribution specifically extracts underlying label correlations from the data and associates them with predictions of existing multi-label classifiers to improve the overall performance. A novel frequent label-set mining technique generates rules that help to improve scores predicted by the existing multi-label algorithms. This thesis incorporates various elements to handle the problems of multi-label data and converges them to create cohesive models for multi-label classification.Item Addressing class imbalance problems to improve animal detection through aerial image data(Indian Statistical Institute, Kolkata, 2025-06) Koushal, SuryangMonitoring animal populations in wildlife reserves is essential for conservation, especially for endangered species, but manual censuses are costly, risky, and logistically challenging due to vast, inaccessible terrains. Unmanned Aerial Vehicles (UAVs) with digital cameras provide a safer, scalable solution for collecting aerial imagery to estimate animal populations. However, semi-automated processing of these images faces significant challenges due to class imbalance in datasets, including foreground-background disparities, where background terrain dominates over sparse animal instances, and inter-class imbalances from uneven species representation and varied visual appearances (e.g., species, sizes, fur patterns) against diverse backgrounds like deserts or forests. These imbalances hinder Convolutional Neural Networks (CNNs) used for object detection, leading to inaccurate population estimates. This project addresses these issues using a dataset of 561 aerial images from Tsavo National Parks (March 2014) and Laikipia-Samburu Ecosystem (May 2015), collected by the Kenya Wildlife Service. We propose a clustering-based approach to categorize background terrain into distinct classes (e.g., desert, grassland), aiming to mitigate imbalances and improve animal detection accuracy in UAV imagery, supporting reliable, data-driven conservation strategies.Item Administrative document processing(Indian Statistical Institute, Kolkata, 2016) Chandra, SatishItem Advanced Techniques in Symmetric Key Cryptanalysis(Indian Statistical Institute, Kolkata, 2024-07) Chakraborty, DebasmitaSymmetric key cryptographic primitives are essential tools used extensively in daily digital interactions. These primitives are mainly designed to provide three key services: ensuring data confidentiality, maintaining data integrity, and verifying the authenticity of data sources. The primary types of symmetric key primitives that deliver these services include block ciphers, stream ciphers, hash functions, message authentication codes, and authenticated encryption with associated data. This thesis mainly explores the security analysis of hash functions, several block ciphers, and stream ciphers using some advanced cryptanalytic techniques. We begin by examining the collision security of a hash function, specifically under the assumption that the underlying compression functions are collision-resistant. This characteristic is termed the collision-resistance preserving property of a hash function. Notably, both the Merkle-Damgård and Merkle tree hash structures exhibit this property, prompting the question of whether it is possible to reduce the number of underlying compression function calls while maintaining the collision-resistance preserving property. In pursuit of this question, we prove that for an ℓn-to-sn-bit collision-preserving hash function, designed using r tn-to-n-bit compression function calls, it must hold that r ≥ ⌈(ℓ − s)/(t − 1)⌉, assuming all operations other than the compression function are linear. Shifting our focus, we delve into advanced techniques for enhanced cryptanalysis of block and stream ciphers. Initially, we concentrate on the impossible differential (ID) and zero correlation (ZC) attacks, which are pivotal cryptanalytic methods for block ciphers. We introduce an advanced, unified constraint programming (CP) approach based on satisfiability for identifying ID distinguishers in ARX and AndRX ciphers alongside a similar method for identifying ZC distinguishers. Furthermore, we extend our novel model to formulate a unified optimization problem that incorporates the distinguisher and key recovery for AndRX designs. Our approach not only enhances ID attacks but also unveils new distinguishers for various ciphers, including SIMON, SPECK, Simeck, ChaCha, Chaskey, LEA, and SipHash. Another significant cryptanalytic technique, particularly applicable to the analysis of block and stream ciphers, is the division property—an advanced version of integral cryptanalysis. Here, we explore the feasibility of the MILP method for the bit-based division property using three subsets (BDPT) propagation in ciphers with complex linear layers. We apply our novel method to discover integral distinguishers based on BDPT for the SIMON, SIMON(102), PRINCE, MANTIS, PRIDE, and KLEIN block ciphers. The integral distinguishers identified by our method are superior to or consistent with the longest existing distinguishers. Finally, we investigate the cube attack, a powerful cryptanalytic technique against stream ciphers. We study the NIST lightweight 3rd round candidate Grain-128AEAD through the lens of division property-based cube attacks. Initially, we introduce some effective cubes and construct an algorithm to identify conditional key bits for these cubes in Grain-128AEAD. Subsequently, we employ the three-subset division property without unknown subsets based cube attacks to recover exact superpolies for Grain-128AEAD in the weak-key setting, yielding improved results.Item Adversarial Attack on Neural Machine Translation System(Indian Statistical Institute, Kolkata, 2019-06) Abijith, K PNowadays Deep Neural Network based solutions are deployed to solve numerous tasks. Thus, it has become absolutely important to study the robustness of these systems. Machine Translation is one of the popular applications of Deep Neural Networks. This thesis studies the robustness of Neural Machine Translation systems by generating adversarial examples with the objective to fool the model. Whenever there is a change in the source, i.e. when a word in the input sentence is replaced by an unrelated word, the translation system is supposed to re ect the changes while doing translation. These unwanted invariance learned by the model is undesirable. With intention to exploit this undesirable property learned by a Neural Machine Translation system we design an attack called: Invariance-based targeted attack. This attack introduces multiple changes(replacement of words) to the original input sentence, keeping the translation unchanged. In-order to facilitate the explanation of the design of the attack we introduce two methods: (i) Min-Grad method: To identify the position where a replacement of the word makes the least change in the translation, and (ii) Soft-Attn method: To search for a new word to replace, given a list of choices. The initial part of the report explain the preliminary explorations we did in-order to get some insights on how to do the problem formulation. These experiments are run on LSTM based models with single replacement policy. Using the learning from the rst part we extend the experiments to Transformer and BLSTM based models, which are considered as the state-of-the-art systems for machine translation.Item Agricultural tenancy in palanpur(Indian Statistical Institute,Delhi, 1992) Sharma, Naresh KumarItem Agriculture trade and protectionism(Indian Statistical Institute, Kolkata, 2013-07) Basu, DebasmitaItem Alfsen-errors structure topology in the theory of complex L1-preduals(Indian Statistical Institute, Kolkata, 1981) Rao, T S S R KItem Algorithm for mapping boolean network to LUT based FPGAs(Indian Statistical Institute, Kolkata, 2001) Bhattacharyya, JayasriItem Algorithms and Applications in Complex Network Representation, Classification and Manipulation(Indian Statistical Institute, Kolkata, 2024-07) Chowdhury, AnjanA complex network is a useful model for many real-world systems. Recently, much effort has been put into studying the insights of the complex network. This thesis is all about the study of complex networks. Based on the study, this thesis can be broadly divided into three parts: The first one involves analysing a complex network to find a crucial network structure called constant community by extracting and applying some features called graph representations. The second part involves the study of the quality of the graph representations on a downstream task, i.e., the node classification task. In the third part, we tried to apply the handcrafted and automatically learned graph features to some real-world scenarios, i.e., in brain networks. While detecting the constant community, we developed two strategies to construct and use the graph representations: semi-supervised and unsupervised. In the semi-supervised approach, we converted the original graph to its corresponding line graph, where a node in the line graph represents an edge in the original graph. We then applied a graph neural network (GNN) as a graph representation learning tool to classify the nodes in the line graph, which in turn was used to capture the constant communities in the original graph. In the unsupervised approach, using some hand-crafted features for each edge in the original network, we developed some novel algorithms inspired by image threshold algorithms to filter out the non-constant community edges and hence find the constant communities. In the semi-supervised approach, we noticed that when we reduced the number of training nodes, the representational capability of GNN decreased, and as a result, the classification accuracy of GNN drastically dropped. This phenomenon led us to develop input and output intervention methods to improve the accuracy of the GNN. In the input intervention, we extend the training nodes’ set using random walk and some machine learning methods to agnostically capture similar nodes from various non-contiguous sub-networks in a whole network. In the output intervention, we used random walk methods to correctly relabel the possibly misclassified nodes by the GNN as its output. The last part of the thesis deals with applications of network representation, classification, and finally manipulation in dealing with complex human brain networks. The brain regions and their interrelationships can be modelled using complex network. Utilising the complex network and its representation, in this part we contributed to neuroscience in two ways: first, we devised a methodology to diagnose a neurodevelopmental disease called Attention Deficit Hyperactivity Disorder (ADHD) using some extracted network features and applied them to various deep learning-based models. Then in the second work, we built a probabilistic model using anatomical and topological similarities to generate synthetic brain networks and track down the progression of a neurodegenerative disease called Alzheimer’s disease (AD) in human brains. The results are promising enough to establish the use of complex network analysis in computational neurologyItem Algorithms and bounds in online learning(Indian Statistical Institute, Kolkata, 2016) Sharma, AnkitItem Algorithms for biological cell storing(Indian Statistical Institute, Kolkata, 2010) Chatterjee, SoumyottamItem Algorithms for Boundary Labeling of Horizontal Line Segments(Indian Statistical Institute, Kolkata, 2019-06) Kurmi, AbhilashIn boundary labelling problem the target is to labeling a set P of n points in the plane with labels that are aligned to side of the bounding box of P . In this work, we investigate a variant of this problem. In our problem, we consider a set of sites inside a rectangle R and label are placed in the compliment of R and touches the left boundary of it. Labels are axis- parallel rectangles of same size and no two labels overlaps. We introduce a set V , called visibility , which is a set of subsets of labels correspond to points of sites. Before connecting site (say p) at point (say p1 2 p) with some label (say l), first we need to check weather subset of label correspond to p1 is in set V or not. If it is then we check the label l belongs to that subset of label or not. If it contains that label then we can join site to the label, otherwise not. In our problem we used po-leaders, that is starting from site it is parallel to the side of R where its label resides and then orthogonal to that side of R. We considered various geometric objects as sites, such as point, same length horizontal segment, different length horizontal segments. As a solution, we derive a dynamic algorithm that minimizes the arbitrary cost function and give us planar solution where sites connects to labels by po- leaders and induces a matching such that no two po-leader intersects, also no two leaders shares common site (or label) and every leader satisfies visibility V . For points as sites, our dynamic algorithm runs in O(n3) time and optimizes the cost function. This running time also same for the case of unit length horizontal line segments as sites. Then we taken arbitrary length horizontal segment, algorithms runs in O(n4) time. We assumed that only one end point of any horizontal line segment can be used to connect label (by po-leader).Item Algorithms for Feature Selection(Indian Statistical Institute, Kolkata, 2022-12) Lall, SnehalikaWith the advancement of science and technology, data has increased both in sam- ple size and dimension. Examples of high-dimensional data include genomic data, text data, image retrieval, bioinformatics, etc. One of the major problems in handling such data is that all the features are not equally important. Hence, fea- ture engineering, feature selection and feature reduction are considered important pre-processing tasks to discard redundant, irrelevant features while preserving the prominent features of the data as much as possible. Feature selection, in practice, often improves the accuracy of down-stream machine learning problems, including clustering and classification. In this thesis, we aim to devise some novel and robust feature selection mecha- nisms in diverse domains of applications with a special focus on high dimensional biological data such as gene expression and single cell transcriptomic data. We develop a series of feature selection techniques equipped with structure-aware data sampling at its core. We adopt several concepts from statistics (e.g. copula and its variant), information theory (entropy), and advanced machine learning domain (variational graph autoencoder, generative adversarial network, and its variant) to design the feature selection models for high dimensional and noisy data. The proposed models perform extremely well both in supervised and unsu- pervised cases, even if the sample size is very low. Important outcomes from all the proposed methods are discussed in chapters. Moreover, an overall discussion about the applicability along with a brief mention of the shortcomings of all the discussed methods is provided. Some suggestions and guidance are provided to overcome the disadvantages which direct the future scope of improvement of all the devised methods.
