Classification of Disaster Tweets Using Natural Language Processing Pipeline
S Deepa Lakshmi1 and T Velmurugan2*
1Assistant Professor, PG and Research Department of Computer Science, Dwaraka Doss Goverdhan Doss Vaishnav College, Chennai, India
2Associate Professor, PG and Research Department of Computer Science, Dwaraka Doss Goverdhan Doss Vaishnav College, Chennai, India
*Corresponding Author: T Velmurugan, Associate Professor, PG and Research
Department of Computer Science, Dwaraka Doss Goverdhan Doss Vaishnav College, Chennai, India.
Received:
January 31, 2023; Published: February 28, 2023
Abstract
A number of methods are utilised for the analysis of tweets based information extraction. Natural Language Processing (NLP) is a branch of artificial intelligence that enables us to understand human sentences and words. NLP combines rule-based modelling of human language combined with statistical, machine learning and deep learning models. This research work aims at using NLP for disaster tweet classification using pipelines. Tweets are highly unstructured in nature and hence text pre-processing is an important phase which involves removing unwanted and irrelevant words from the tweets. NLP pipeline is a set of steps followed to build end to end NLP software including text pre-processing, feature extraction and modelling. Pre-processing is done using tokenization, stop words removal, lemmatization and feature extraction using TF-IDF transformer. To analyse the tweets based informations, classification algorithms are used. The classification algorithms Support Vector Machine, MLP, Adaboost and Multinomial NB are used to classify the tweets and the best performing classifier is identified.
Keywords: Natural Language Processing Pipeline; Feature Extraction; Classification of Tweets; Multinomial NB
References
- Berry MW and Kogan J. “Text mining: applications and theory”. John Wiley and Sons (2010).
- Rastenis J., et al. “Multi-language spam/phishing classification by email body text: Toward automated security incident investigation”. Electronics6 (2021): 668.
- Balogun AL., et al. “Assessing the potentials of digitalization as a tool for climate change adaptation and sustainable development in urban centres”. Sustainable Cities and Society 53 (2020): 101888.
- Rustam Furqan., et al. “Tweets classification on the base of sentiments for US airline companies". Entropy 11 (2019): 1078.
- Khattak AM., et al. “Tweets classification and sentiment analysis for personalized tweets recommendation”. Complexity (2020).
- Indra ST., et al. “Using logistic regression method to classify tweets into the selected topics”. In 2016 international conference on advanced computer science and information systems (icacsis)”. (2016): 385-390.
- Hidayatullah AF., et al. “Analysis of stemming influence on indonesian tweet classification”. TELKOMNIKA (Telecommunication Computing Electronics and Control) 14.2 (2016): 665-673.
- Didi Yosra., et al. “COVID-19 Tweets Classification Based on a Hybrid Word Embedding Method". Big Data and Cognitive Computing2 (2022): 58.
- Gulati K., et al. “Comparative analysis of machine learning-based classification models using sentiment classification of tweets related to COVID-19 pandemic”. Materials Today: Proceedings 51 (2022): 38-41.
- Lamsal R and Kumar TV. “Twitter-based disaster response using recurrent nets”. In Research Anthology on Managing Crisis and Risk Communications (2023): 613-632.
- Korenius T., et al. “Stemming and lemmatization in the clustering of Finnish text documents”. In Proceedings of the thirteenth ACM international conference on Information and knowledge management (2004): 625-633.
- Kulkarni A., et al. “Converting text to features. Natural Language Processing Recipes”. Unlocking Text Data with Machine Learning and Deep Learning Using Python (2021): 63-106.
- Zhao G., et al. “TFIDF based feature words extraction and topic modeling for short text”. In Proceedings of the 2018 2Nd International Conference on Management Engineering, Software Engineering and Service Sciences (20180): 188-191.
- Osisanwo FY., et al. “Supervised machine learning algorithms: classification and comparison”. International Journal of Computer Trends and Technology (IJCTT)3 (2017): 128-138.
- El Rahman SA., et al. “Sentiment analysis of twitter data”. In 2019 international conference on computer and information sciences (ICCIS) (2019): 1-4.
- Davis J and Goadrich M. “The relationship between Precision-Recall and ROC curves”. In Proceedings of the 23rd international conference on Machine learning (2006): 233-240.
Citation
Copyright