Dissertation and Thesis
Permanent URI for this communityhttp://164.52.219.250:4000/handle/10263/2146
Browse
511 results
Search Results
Item Development of Some Scalable Pattern Recognition Algorithms for Real Life Data Analysis(2017-11-20) Garai, ParthaA huge amount of data is being generated continuously as a result of recent advancement and wide use of high-throughput technologies. With the rapid increase in size of data distributed worldwide, understanding the data has become critical. In this regard, dimensionality reduction and clustering have become the necessary preprocessing steps of multiple research areas and applications. One of the important problems of real life large data sets is uncertainty. Some of the sources of this uncertainty include imprecision in computation and vagueness in class denitions. The uncertainty may also be present in the denition of class membership function. In this background, the thesis addresses the problem of dimensionality reduction and clustering of real life data sets, in the presence of noise and uncertainty. The thesis rst presents the problem of feature selection using both type-1 and interval type-2 fuzzyrough sets, which are eective for dimensionality reduction of real life data sets when uncertainty is present in the data set. The properties of fuzzy-rough sets allow greater exibility in handling noisy and real valued data. While the concept of lower approximation and boundary region of rough sets deals with uncertainty, incompleteness, and vagueness in class denition, the use of either type-1 or interval type-2 fuzzy sets enables ecient handling of overlapping classes in uncertain environment. Moreover, a new concept of \simultaneous attribute selection and feature extraction" is introduced for dimensionality reduction, integrating judiciously the merits of both feature selection and extraction. A scalable rough-fuzzy clustering algorithm is introduced for large real life data sets, where the theory of rough hypercuboid approach, interval type-2 fuzzy sets, and c-means algorithm are integrated judiciously to handle the uncertainty present in a data set. While the concept of rough hypercuboid approach deals with uncertainty, incompleteness, and vagueness in cluster denition, the use of fuzzy membership of interval type-2 fuzzy sets in the boundary region of a cluster enables ecient handling of overlapping partitions in uncertain environment. Finally, the application of both clustering and feature selection algorithms is demonstrated by grouping functionally similar microRNAs from microarray data. The proposed approach can automatically select the optimum set of features while clustering the microRNAs, making the complexity of the algorithm lower.Item A Study of the SHA-2 Cryptographic Hash Family(Indian Statistical Institute, Kolkata, 2009-02-01) Sanadhya Somitra KumarItem Adversarial Attack on Neural Machine Translation System(Indian Statistical Institute, Kolkata, 2019-06) Abijith, K PNowadays Deep Neural Network based solutions are deployed to solve numerous tasks. Thus, it has become absolutely important to study the robustness of these systems. Machine Translation is one of the popular applications of Deep Neural Networks. This thesis studies the robustness of Neural Machine Translation systems by generating adversarial examples with the objective to fool the model. Whenever there is a change in the source, i.e. when a word in the input sentence is replaced by an unrelated word, the translation system is supposed to re ect the changes while doing translation. These unwanted invariance learned by the model is undesirable. With intention to exploit this undesirable property learned by a Neural Machine Translation system we design an attack called: Invariance-based targeted attack. This attack introduces multiple changes(replacement of words) to the original input sentence, keeping the translation unchanged. In-order to facilitate the explanation of the design of the attack we introduce two methods: (i) Min-Grad method: To identify the position where a replacement of the word makes the least change in the translation, and (ii) Soft-Attn method: To search for a new word to replace, given a list of choices. The initial part of the report explain the preliminary explorations we did in-order to get some insights on how to do the problem formulation. These experiments are run on LSTM based models with single replacement policy. Using the learning from the rst part we extend the experiments to Transformer and BLSTM based models, which are considered as the state-of-the-art systems for machine translation.Item Some Results on Combinatorial Batch Codes and Permutation Binomials over Finite Fields(Indian Statistical Institute,Kolkata, 2015) Bhattacharya, SrimantaIn this thesis,we study combinatorial batch codes (CBCs) and permutation binomials (PBs) over �nite �elds of even characteristic. Our primary motivation for considering these problems comes from their importance in cryptography. CBCs are replication based variants of batch codes, which were introduced in [IKOS04a] as a tool for reducing the computational overhead of private information retrieval protocols (a cryptographic primitive). On the other hand, permutation polynomials, with favourable cryptographic properties, have applications in symmetric key encryption schemes, especially in block ciphers. Moreover, these two objects are interesting in their own right, and they have connections with other important combinatorial objects. CBCs are much similar to unbalanced expanders, a much studied combinatorial object having numerous applications in theoretical computer science. On the other hand, the speci�c class of PBs that we consider in this work, are intimately related to orthomorphisms. Orthomorphisms are relevant in the construction of mutually orthogonal latin squares, a classical combinatorial objects having applications in design of statistical experiments. These aspects motivate us to explore theoretical properties of CBCs and PBs over �nite �elds. However, these two objects are inherently widely di�erent; CBCs are purely combinatorial objects, and PBs are algebraic entities. So, we explore these two objects independently in two di�erent parts, where our entire focus lies in exploring theoretical aspects of these objects. In Part I, we consider CBCs. There, we provide bounds on the parameters of CBCs and obtain explicit constructions of optimal CBCs. In Part II, we consider PBs over �nite �elds. There, we obtain explicit characterization and enumeration of subclasses of PBs under certain restrictions. Next, we describe these two parts in more detail.Item Discriminative Dictionary Learning by Exploiting Inter-Class Similarity for HEp-2 Cell Classi cation(Indian Statistical Institute,Kolkata, 2019-07) Panda, AdityaIn this literature we present an algorithm for automatic classi cation of IIF images of HEp-2 cells into relevant classes. Our algorithm is majorly based on the \Dictionary Learning" algorithm and we have rede ned it's objective function to suit our purpose. The major di culty in HEp-2 cell image classi cation lies in it's low inter-class variability and substantial intra-class variations. To address these issues, we have modi ed the objective function of \Dictionary Learning" to learn inter-class features. Moreover, we used a local feature extractor based pre-processing stage and also a \spatial decomposition" classi er set-up for better classifying test images. We evaluated our algorithm on three most widely accepted bamechmark data-sets for HEp-2 cell classi cation, ICPR 2012, ICIP 2013 and SNP data-sets. Proposed algorithm has achieved superior results than other popular dictionary learning algorithms for HEp-2 cell classi cation. Moreover, when comparing with other algorithms for HEp-2 cell classi cation, including the winners of ICPR 2012, ICIP 2013 and SNP data-set, we show that proposed algorithm reports very competitive result. Though our proposed algorithm is designed to be application speci c to HEp-2 cell, still we evaluated its performance on another popular benchmark data-set, \Diabetic Retinopathy" data-set. Our algorithm provided higher accuarcy than other state-ofthe- art algorithms on that data-set too.Item BlockV: A Blockchain Enabled Peer-Peer Ride Sharing Service(Indian Statistical Institute,Kolkata, 2019-07) Pal, PanchalikaToday's ride sharing is a centralized trust based system where users trust the service providers for the ride set up, tracking, cancellation, fare calculation etc. Any malicious activity in the centralized server based system or a malicious driver or a malicious rider destroys the fairness involved in the ride and causes inconvenience to the parties. After the completion of the ride, the drivers are rated by the riders. There are possibilities that, a malicious rider can claim the refund with a fake complain and give the driver poor rating intentionally or a malicious driver follows a longer path unnecessarily and charges the rider more. Current system is not capable of deciding the correctness of the objections raised by either parties regarding the ride and provides a biased outcome of each objections as per the centralized company's marketing strategies. In this context, we present BlockV, a blockchain enabled anonymous permissionless solution to ensure end to end fairness of the ride. The creation, completion, dissatisfaction or abortion of any ride will be written in the blockchain ledger, hence will be available to all participants in the peer to peer network. This simultaneously ensures the fairness in maintenance of the inbuilt reputation system. We have implemented a prototype in Ethereum private network and KOVAN test network and analyzed the security.Item Recognition of Strokes in Tennis Videos Using Deep Learning(Indian Statistical Institute,Kolkata, 2019-07) Singanporia, KushalPrior introduction of neural nets to domain of computer vision, action recognition requires specific domain knowledge. Still domain knowledge is useful in action recognition but with availability of huge data and neural nets, data-driven feature learning methods have emerged as an alternative. Recent trends in action recognition uses LSTM and its various modifications, as LSTM have memory retaining capability which other architectures lake. In this work we performed action recognition on different tennis strokes. Our work relay on architecture proposed By Husain, Dellen, and Torras, 2016. Architecture is comprised of various modified VGG-nets connected in parallel. As it doesn’t include LSTM, which makes it different than other works.Item Classi cation of Micro-Blog Texts(Indian Statistical Institute,Kolkata, 2019-07) Sen, BihanClassi cation of micro-blog texts is a very common task for sentiment analysis, user opinion mining, product review analysis, crisis managements, identifying ofensive and hate speech propagation across social media, restricting unnecessary expansion of fake news and rumors etc. In this dissertation, we consider two problems from this domain: (i) classi cation of tweets during crisis scenarios like natural disasters, terrorist attacks etc and (ii) identifying o ensive tweets. We tried both statistical and deep learning approaches. Datasets from the TREC-IS 2018 and 2019 tasks, and OLID from O enseEval workshop were used for our experiments. The rst task is formulated as a multi-label classi cation task, while the second is a binary classi cation problem. Our results suggest that preprocessing of social media text is very crucial for classi cation. We also conclude that Deep Learning approaches do not always outperform traditional learning. We also took part as an active participant in the TREC-IS 2019A task. Out of all 34 submissions from across the world, one of our submissions achieved the highest macro-averaged F-1 score on this task (0.1969) and outperformed the second highest score (0.1556) by a substantial margin.Item On Testing of Samplers and Related Problems(Indian Statistical Institute,Kolkata, 2019-07) Chakraborty, ShayakThe last decade has seen an unprecedented adoption of Arti cial Intelligence and Machine Learning Algorithms in various engineering applications such as medical imaging, autonomous vehicles, web services, etc. Probabilistic Reasoning lies at the heart of such algorithms. Probabilistic reasoning techniques rely highly on sampling techniques and thus samplers that are able to generate quality samples from a discrete distribution uniformly at random, are in heavy demand. This has created a need for e cient samplers that come with some correctness and performance guarantees, and indeed, a number of proposals to this e ect have been developed in literature. This has also created a need to nd an e cient and scalable way for testing the reliability of a sampler with respect to the quality of the samples it generates. Unlike in the eld of program testing, in which nding a single trace of execution is su cient to prove the existence of a bug, the number of samples needed for sampler testing is not one, in fact, most of the testing mechanisms existing in literature, use exponential or sub-exponential number of samples to test for uniformity of a given sampler. The objective of this work is to propose techniques that can cut down on this complexity for certain classes of samplers. To overcome the high sample complexity for property testing of distributions, a framework for conditional sampling was proposed for checking properties of distributions, with polynomial complexity in the number of dimensions of the sample space. A recent framework named Barbarik that works on the guidelines of conditional sampling has suggested a way to test uniformity of samplers sampling from Boolean domains, using a constant number of samples. Barbarik has been proposed in the context of uniformity testing of samplers which claim to provide uniform witnesses for arbitrary Boolean formulae. The objective of this thesis is to examine ways to develop similar algorithms for testing other kind of samplers. In the rst result presented in this thesis we propose a framework to harness Barbarik to test uniformity of state-of-art samplers which are used in the context of Horn formulae. Horn formulae is a subclass of general Boolean formulae and have been quite popular in a variety of domains, due to their simple yet powerful expressiveness, while being able to entail polynomial procedures for solving. In this thesis, we propose an e cient procedure to test samplers for Horn clause, and we demonstrate the e cacy of our testing framework with experiments on large scale benchmarks. In the second part of this thesis, using a di erent form of the conditional sampling model, we investigate the problem of testing samplers in the context of uniform Perfect Matching in a Graph. The problem of nding uniform perfect matching in a graph is of importance in computer science and has a large volume of research associated with it. Modern complex networks often need to nd a uniform perfect matching. The problem of nding a uniform perfect matching nds it's root in physical sciences like statistical mechanics and chemistry. This has inspired research on methods for testing whether the algorithm in use generates a perfect matching uniformly at random. Our contribution in this thesis is a randomized algorithm using the framework of sub-cube conditional sampling to test samplers for uniform perfect matching.Item Event Timeline Generation and Summarization using PageRank(Indian Statistical Institute,Kolkata, 2019-07) Das, PabaniGiven a search query our system retrieves most significant events related to the query and display them in user friendly timeline. We are using graph structure to compute relative importance of events. We are representing event and its elements as complete graph. To rank events we are linearly combining basic IR ranking and ranking of event elements. evaluation on DUC Data experimental results shows that our system performs at par with the NLP based summarization techniques and yet works as fast as a search engine
