Classi cation of Micro-Blog Texts

dc.contributor.authorSen, Bihan
dc.date.accessioned2022-02-08T05:42:24Z
dc.date.available2022-02-08T05:42:24Z
dc.date.issued2019-07
dc.descriptionDissertation under the supervision of Dr. Mandar Mitraen_US
dc.description.abstractClassi cation of micro-blog texts is a very common task for sentiment analysis, user opinion mining, product review analysis, crisis managements, identifying ofensive and hate speech propagation across social media, restricting unnecessary expansion of fake news and rumors etc. In this dissertation, we consider two problems from this domain: (i) classi cation of tweets during crisis scenarios like natural disasters, terrorist attacks etc and (ii) identifying o ensive tweets. We tried both statistical and deep learning approaches. Datasets from the TREC-IS 2018 and 2019 tasks, and OLID from O enseEval workshop were used for our experiments. The rst task is formulated as a multi-label classi cation task, while the second is a binary classi cation problem. Our results suggest that preprocessing of social media text is very crucial for classi cation. We also conclude that Deep Learning approaches do not always outperform traditional learning. We also took part as an active participant in the TREC-IS 2019A task. Out of all 34 submissions from across the world, one of our submissions achieved the highest macro-averaged F-1 score on this task (0.1969) and outperformed the second highest score (0.1556) by a substantial margin.en_US
dc.identifier.citation45p.en_US
dc.identifier.urihttp://hdl.handle.net/10263/7272
dc.language.isoenen_US
dc.publisherIndian Statistical Institute,Kolkataen_US
dc.relation.ispartofseriesDissertation;;2019-23
dc.subjectText Classi cationen_US
dc.subjectClassi cation Algorithmsen_US
dc.titleClassi cation of Micro-Blog Textsen_US
dc.typeOtheren_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
ClassificationMBlog_Bihan.pdf
Size:
434.6 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: