Exploring Resource-Efficient Deep Learning for Medical Image Segmentation

dc.contributor.authorDutta, Pallabi
dc.date.accessioned2026-05-20T05:40:21Z
dc.date.issued2026-05-19
dc.descriptionThis thesis has been completed under the supervision of Prof. Sushmita Mitra
dc.description.abstractAutomated medical image segmentation improves diagnostic accuracy by au tomating the precise delineation of target anatomical structures in the input images. Artificial Intelligence (AI), and specifically, Deep Learning (DL), has emerged as a state-of-the-art approach for this task. However, the significant computational demands of DL approaches often hinders their deployment. Ad vanced models, including Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), require substantial processing power and a large memory footprint, limiting their use in resource-constrained settings. This thesis aims to address this challenge by developing a series of novel, resource-efficient DL models that achieve high segmentation accuracy with reduced computational costs. The research follows a logical progression of architectural novelty. First, global context-aware attention frameworks, FuDSA-Net and VoCANet, are in troduced by leveraging multi-scalar features and global-context aware attention for efficient 2D/3D segmentation. The spatial and spectral domains are then integrated using a novel hybrid CNN-ViT framework WaveCoformer for learn ing robust representation of the target structure. The developed model achieves high segmentation accuracy with a lower parameter count. Subsequently, the research investigates a computationally efficient alternative to ViTs for segmen tation, called Vision-xLSTM, by developing the U-VixLSTM model. This is extended to the Rot-UViL architecture, capable of modeling cross-dimensional dependencies in volumetric inputs with its novel rotational attention. Finally, the thesis presents a prompt-driven pruning framework for ViT-based segmenta tion models, called PrATo, which dynamically prunes irrelevant ViT tokens with a parameter-free prompt-driven scoring mechanism. The framework achieves ∼ 35−55% reduction of processed tokens. The frameworks developed in this thesis are validated across multiple publicly available datasets; demonstrating their high segmentation accuracy along with computational efficiency.
dc.identifier.citation165p.
dc.identifier.urihttp://hdl.handle.net/10263/7685
dc.language.isoen
dc.relation.ispartofseriesISI Ph.D Thesis; TH685
dc.subjectMedical Image Segmentation
dc.subjectEfficient Deep Learning
dc.subjectVision Transformers
dc.subjectCNN
dc.subjectVision-xLSTM
dc.subjectModel compression
dc.titleExploring Resource-Efficient Deep Learning for Medical Image Segmentation
dc.typeThesis

Files

Original bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
Form-17 Pallabi Dutta.pdf
Size:
334.62 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
Thesis-Pallabi Dutta.pdf
Size:
4.07 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections