Scene Text Detection
No Thumbnail Available
Date
2024-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Indian Statistical Institute, Kolkata
Abstract
Scene text detection is crucial for numerous applications, including autonomous driving
and assistive technology for visually impaired individuals. This project leverages
various versions of the You Only Look Once (YOLO) model to achieve efficient and
accurate text detection in natural scenes. Given YOLO’s balance between speed
and accuracy, it is an ideal candidate for real-time text detection tasks. Throughout
this project, we compare different versions of YOLO, evaluating their performance
on various multilingual datasets. These datasets comprise diverse scene text images
with varying backgrounds, lighting conditions, and font styles. Each model
is assessed based on metrics such as precision, recall, and mean Average Precision
(mAP) score.As the YOLO versions are updated, their capability to detect text
improves. Additionally, transfer learning is applied to datasets with common root
languages, such as Hindi and Bengali or Telugu and Kannada. Our approach involves
training these models in different ways and analyzing their performance on
datasets with shared linguistic roots.Experimental results demonstrate that later
YOLO versions significantly enhance text detection capabilities. This comparative
analysis provides valuable insights into selecting the most suitable YOLO version for
specific real-time text detection applications and highlights the benefits of transfer
learning in multilingual contexts.
Description
Dissertation under the supervision of Dr. Ujjwal Bhattacharya
Keywords
You Only Look Once (YOLO), Feature Pyramid Network(FPN), PANet
Citation
40p.
