Understanding Batch-Normalization in Deep Neural Networks

Srujan, Pendyala Sai

Understanding Batch-Normalization in Deep Neural Networks

Files

CS2335_Dissertation_Plag.pdf (853.81 KB)

CS2335_Dissertation_signed.pdf (1.99 MB)

Date

2025-06

Authors

Srujan, Pendyala Sai

Publisher

Indian Statistical Institute, Kolkata

Abstract

Batch Normalization (BN) is a commonly used technique in various deep learning architectures for tasks such as image classification and object detection. It stabilizes and accelerates training by normalizing the activations of intermediate layers using mean and variance of the batch, allowing the use of higher learning rates and often improving generalization through implicit regularization. During inference, BN uses running estimates of batch statistics accumulated during training. However, if individual batches are not representative of the overall data distribution, these accumulated statistics may not accurately approximate the population statistics. This discrepancy can lead to a phenomenon known as **estimation shift**, which impairs the model’s generalization performance. In this project, we study the behavior of estimation shift in deep learning models using BN and explore techniques to mitigate its effects. Specifically, we introduce **dynamicity** in the momentum parameter of BN layer (DMBN) while computing exponential moving averages and evaluate its impact under various architectural configurations. We use MNIST, FashionMNIST, and CIFAR-10/100 datasets to train and test both simple Deep Neural Networks (DNNs) as well as deeper Convolutional Neural Networks (CNNs) such as ResNet-50. Our experiments are conducted in two phases: first, by varying the static momentum parameter across different values, and second, by introducing layer-wise dynamic momentum where each layer is assigned the momentum (or equivalently, β) that minimizes estimation shift. The performance of the proposed method, DMBN, is evaluated using various performance metrics such as sensitivity, specificity, accuracy, and F-score. The DMBN is compared with existing BN-BFN method and is observed to be performing better in most of cases. For example, for fashionMNIST data, the accuracy values achieved by DMBN and BN-BFN are 0.889 and 0.853, respectively.

Description

Dissertation under the supervision of Dr. Sasanka Roy and Dr. Shubhra Sankar Ray

Keywords

Batch Normalization (BN), Convolutional Neural Networks (CNNs), DMBN, Deep Neural Networks

Citation

41p.

URI

http://hdl.handle.net/10263/7588

Collections

Dissertations - M Tech (CS)

Full item page

Understanding Batch-Normalization in Deep Neural Networks

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By