Understanding Batch-Normalization in Deep Neural Networks
No Thumbnail Available
Date
2025-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Indian Statistical Institute, Kolkata
Abstract
Batch Normalization (BN) is a commonly used technique in various deep learning
architectures for tasks such as image classification and object detection. It stabilizes
and accelerates training by normalizing the activations of intermediate layers using
mean and variance of the batch, allowing the use of higher learning rates and often
improving generalization through implicit regularization.
During inference, BN uses running estimates of batch statistics accumulated
during training. However, if individual batches are not representative of the overall
data distribution, these accumulated statistics may not accurately approximate
the population statistics. This discrepancy can lead to a phenomenon known as
**estimation shift**, which impairs the model’s generalization performance.
In this project, we study the behavior of estimation shift in deep learning models
using BN and explore techniques to mitigate its effects. Specifically, we introduce
**dynamicity** in the momentum parameter of BN layer (DMBN) while computing
exponential moving averages and evaluate its impact under various architectural configurations.
We use MNIST, FashionMNIST, and CIFAR-10/100 datasets to train
and test both simple Deep Neural Networks (DNNs) as well as deeper Convolutional
Neural Networks (CNNs) such as ResNet-50.
Our experiments are conducted in two phases: first, by varying the static momentum
parameter across different values, and second, by introducing layer-wise
dynamic momentum where each layer is assigned the momentum (or equivalently, β)
that minimizes estimation shift. The performance of the proposed method, DMBN,
is evaluated using various performance metrics such as sensitivity, specificity, accuracy,
and F-score. The DMBN is compared with existing BN-BFN method and is
observed to be performing better in most of cases. For example, for fashionMNIST
data, the accuracy values achieved by DMBN and BN-BFN are 0.889 and 0.853,
respectively.
Description
Dissertation under the supervision of Dr. Sasanka Roy and Dr. Shubhra Sankar Ray
Keywords
Batch Normalization (BN), Convolutional Neural Networks (CNNs), DMBN, Deep Neural Networks
Citation
41p.
