Scene Text Detection

dc.contributor.authorGhorai, Sugata
dc.date.accessioned2025-02-07T12:07:07Z
dc.date.available2025-02-07T12:07:07Z
dc.date.issued2024-06
dc.descriptionDissertation under the supervision of Dr. Ujjwal Bhattacharyaen_US
dc.description.abstractScene text detection is crucial for numerous applications, including autonomous driving and assistive technology for visually impaired individuals. This project leverages various versions of the You Only Look Once (YOLO) model to achieve efficient and accurate text detection in natural scenes. Given YOLO’s balance between speed and accuracy, it is an ideal candidate for real-time text detection tasks. Throughout this project, we compare different versions of YOLO, evaluating their performance on various multilingual datasets. These datasets comprise diverse scene text images with varying backgrounds, lighting conditions, and font styles. Each model is assessed based on metrics such as precision, recall, and mean Average Precision (mAP) score.As the YOLO versions are updated, their capability to detect text improves. Additionally, transfer learning is applied to datasets with common root languages, such as Hindi and Bengali or Telugu and Kannada. Our approach involves training these models in different ways and analyzing their performance on datasets with shared linguistic roots.Experimental results demonstrate that later YOLO versions significantly enhance text detection capabilities. This comparative analysis provides valuable insights into selecting the most suitable YOLO version for specific real-time text detection applications and highlights the benefits of transfer learning in multilingual contexts.en_US
dc.identifier.citation40p.en_US
dc.identifier.urihttp://hdl.handle.net/10263/7518
dc.language.isoenen_US
dc.publisherIndian Statistical Institute, Kolkataen_US
dc.relation.ispartofseriesMTech(CS) Dissertation;22-32
dc.subjectYou Only Look Once (YOLO)en_US
dc.subjectFeature Pyramid Network(FPN)en_US
dc.subjectPANeten_US
dc.titleScene Text Detectionen_US
dc.typeOtheren_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Sugata_Ghorai-cs2232.pdf
Size:
842.43 KB
Format:
Adobe Portable Document Format
Description:
Dissertations - M Tech (CS)

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: