Robust Android Malware Detection with CTGAN
No Thumbnail Available
Date
2024-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Indian Statistical Institute, Kolkata
Abstract
In this paper our main objective is to make a robust malware detector system by enhancing
it’s ability to detect malicious applications. Machine learning based model
has been used to detect or classify malware and benign samples, while the malware
attackers have strong motivation to attack such ML based algorithms. Malware
attackers usually have no access to the detailed structures and parameters of the
machine learning models used by malware detection systems, and therefore they
can only perform black-box attacks. With the proliferation of malware threats, the
development of robust detection methods is imperative. Generative Adversarial Networks
(GANs) have recently emerged as a promising avenue for generating synthetic
data, offering potential applications in augmenting datasets for malware detection.
This paper presents a comparative analysis of contemporary GANs with Conditional
Tabular GANs (CTGAN) in the context of detecting malware and benign samples
generated through GANs. Through extensive experimentation on diverse datasets,
including both benign and malicious samples, we demonstrate that CTGAN outperforms
contemporary GAN architectures in generating synthetic data that closely
resembles real-world malware behaviors. Our evaluation metrics encompass various
aspects of detection accuracy, including precision, recall, F1-score, TPR, AUC-ROC,
confusion matrix and generator and discriminator loss. Additionally, we analyze
the robustness of the generated samples against state-of-the-art malware detection
techniques. The results indicate that CTGAN exhibits superior performance in producing
synthetic malware instances that challenge existing detection methods like
MaLGAN and LSGAN, thereby showcasing its potential for enhancing the efficacy
of malware detection systems. CTGAN enhances adversarial training with around
20% when compared with untrained detector. This study contributes to the advancement
of GAN-based approaches in cybersecurity and underscores the significance of
leveraging synthetic data generation techniques for improving malware detection
capabilities.
Description
Dissertation under the guidance of Sc. Shri Sanchit Gupta and Shri Debrup Chakroborty
Keywords
Generative Adversarial Networks (GANs), Machine learning based model, Conditional Tabular GANs (CTGAN), MaLGAN, LSGAN
Citation
24p.
