Classification of Disaster Tweets Using Natural Language Processing Pipeline

S Deepa Lakshmi1 and T Velmurugan2*

1Assistant Professor, PG and Research Department of Computer Science, Dwaraka Doss Goverdhan Doss Vaishnav College, Chennai, India
2Associate Professor, PG and Research Department of Computer Science, Dwaraka Doss Goverdhan Doss Vaishnav College, Chennai, India

*Corresponding Author: T Velmurugan, Associate Professor, PG and Research Department of Computer Science, Dwaraka Doss Goverdhan Doss Vaishnav College, Chennai, India.

Received: January 31, 2023; Published: February 28, 2023


A number of methods are utilised for the analysis of tweets based information extraction. Natural Language Processing (NLP) is a branch of artificial intelligence that enables us to understand human sentences and words. NLP combines rule-based modelling of human language combined with statistical, machine learning and deep learning models. This research work aims at using NLP for disaster tweet classification using pipelines. Tweets are highly unstructured in nature and hence text pre-processing is an important phase which involves removing unwanted and irrelevant words from the tweets. NLP pipeline is a set of steps followed to build end to end NLP software including text pre-processing, feature extraction and modelling. Pre-processing is done using tokenization, stop words removal, lemmatization and feature extraction using TF-IDF transformer. To analyse the tweets based informations, classification algorithms are used. The classification algorithms Support Vector Machine, MLP, Adaboost and Multinomial NB are used to classify the tweets and the best performing classifier is identified.

Keywords: Natural Language Processing Pipeline; Feature Extraction; Classification of Tweets; Multinomial NB


