EFFECTIVE OFFENSIVE LANGUAGE DEDUCTION USING DEEP LEARNING IN SOCIAL MEDIA

KALAIVANI ADAIKKAN; DURAIRAJ THENMOZHI

doi:10.59277/RRST-EE.2024.2.14

Authors

KALAIVANI ADAIKKAN Sri Sivasubramaniya Nadar College of Engineering, Tamil Nadu, India Author
DURAIRAJ THENMOZHI Sri Sivasubramaniya Nadar College of Engineering, Tamil Nadu, India Author

DOI:

https://doi.org/10.59277/RRST-EE.2024.2.14

Keywords:

Offensive language detection, Graph-based deep learning (GDL), Red fox optimization (RFO), Term frequency-inverse document frequency (TF-IDF), Lexicon-based feature

Abstract

Offensive language detection is the technique of identifying and detecting user-generated offensive comments such as insults, pain, profanity, and racism that are targeted at a specific individual or group on social media. As social media platforms become more prominent, offensive language is used more frequently, becoming a major challenge in modern society. A novel effective offensive language classification (EOLC) technique has been proposed to overcome these challenges. English language tweets from YouTube and X (Twitter) with offensive, mild, swear, and non-offensive tweets are used in this paper. Initially, the tweets and comments are pre-processed, and the features are extracted using different techniques, namely term frequency-inverse document frequency (TF-IDF), WordVec, and lexicon-based features. The extracted features are classified using the graph-based deep learning (GDL) method for numerical representation and decision-making. GDL network is optimized with red fox optimization (RFO) to normalize the weight and biases of the network and achieve better accuracy. The proposed GDL model achieves the highest levels of classification accuracy on the X (Twitter) and YouTube datasets, with 95.5 % and 96.8 %, respectively. The results obtained from GDL are more accurate and of higher quality than those obtained from traditional classifiers. The proposed EOLC method improves the overall accuracy by 5.56 %, 7.4 %, 7.7 %, and 10.2 % better than Text CNN, CNN-LSTM, DRNN, and LogitBoost, respectively.

References

(1) A. H. Razavi, D. Inkpen, S. Uritsky, S. Matwin, Offensive language detection using multi-level classification, Canadian Conference on Artificial Intelligence Springer, Berlin, Heidelberg, pp. 16–27 (2010).

(2) F.Z. El-Alami, S.O. El Alaoui, N.E. Nahnahi, A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model, Journal of King Saud University-Computer and Information Sciences, pp. 1–9 (2021).

(3) S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, J. Gao, Deep learning–based text classification: a Comprehensive Review, ACM Computing Surveys (CSUR) 54, 3, pp. 1–40 (2021).

(4) R. Jenke, A. Peer, M. Buss, Feature extraction and selection for emotion recognition from EEG, IEEE Transactions on Affective Computing 5, 3, pp. 327–339 (2014).

(5) O. Sharif, M.M. Hoque, A.S.M. Kayes, R. Nowrozy, I.H. Sarker, Detecting suspicious texts using machine learning techniques, Appl. Sci, 10, 18, pp. 6527 (2020).

(6) H. Gupta, P. Kumar, S. Saurabh, S.K. Mishra, B. Appasani, A. Pati, C. Ravariu A. Srinivasulu, Category boosting machine learning algorithm for breast cancer prediction, Rev. Roum. Sci. Tech. – Électrotechn. Et Énerg., 66, 3, pp. 201–206 (2021).

(7) H. Razavi, D. Inkpen, S. Uritsky, S. Matwin, Offensive language detection using multi-level classification, Canadian Conference on Artificial Intelligence, Springer, Berlin, Heidelberg, pp. 16–27 (2010).

(8) W. Zhang, T. Yoshida, X. Tang, A comparative study of TF* IDF, LSI and multi-words for text classification, Expert Syst. Appl. 38, 3, pp. 2758–2765 (2011).

(9) S. Abro, Z.S. Shaikh, S. Khan, G. Mujtaba, Z.H. Khand, Automatic hate speech detection using machine learning: a comparative study, Mach. Learn., 10, 6 (2020).

(10) W. Zhang, T. Yoshida, X. Tang, TFIDF, LSI and multi-word in information retrieval and text categorization, IEEE International Conference on Systems, Man and Cybernetics, Singapore, pp. 108–113 (2008).

(11) P. Mishra, V. Varadharajan, U. Tupakula, E.S. Pilli, A detailed investigation and analysis of using machine learning techniques for intrusion detection, IEEE Commun. Surv. Tutorials, 21, 1, pp. 686–728 (2018).

(12) H. Mohaouchane, A. Mourhir, N.S. Nikolov, Detecting offensive language on arabic social media using deep learning, Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), Granada, Spain, pp. 466–471 (2019).

(13) N.D. Srivastava, Y. Sharma, Combating online hate: a comparative study on identification of hate speech and offensive content in social media text, IEEE Recent Advances in Intelligent Computational Systems (RAICS), Thiruvananthapuram, India, pp. 47–52 (2020).

(14) G.A. De Souza, M. Da Costa-Abreu, Automatic offensive language detection from Twitter data using machine learning and feature selection of metadata, International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, pp. 1–6 (2020).

(15) S.T. Luu, H.P. Nguyen, K. Van Nguyen, N.L.T. Nguyen, Comparison between traditional machine learning models and neural network models for Vietnamese hate speech detection, International Conference on Computing and Communication Technologies (RIVF), pp. 1–6 (2020).

(16) R.K. Giri, S.C. Gupta, U.K. Gupta, An approach to detect offense in memes using natural language processing (NLP) and deep learning, International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, pp. 1–5 (2021).

(17) M.P. Akhter, Z. Jiangbin, I.R. Naqvi, M. Abdelmajeed, M.T. Sadiq, Automatic detection of offensive language for Urdu and Roman Urdu, IEEE Access 8, pp. 91213-91226 (2020).

(18) F.Y.A. Anezi, Arabic hate speech detection using deep recurrent neural networks, Appl. Sci, 12, 12, pp. 6010 (2022).

(19) V. Pais, R. Ion, A.M. Avram, M. Mitrofan, D. Tufis, In-depth evaluation of Romanian natural language processing pipelines, Romanian Journal of Information Science and Technology, 24, 4, pp. 384–401 (2021).

(20) R. Alqaisi, W. Ghanem, A. Qaroush, Extractive multi-document Arabic text summarization using evolutionary multi-objective optimization with K-medoid clustering, IEEE Access, 8, pp. 228206–228224 (2020).

(21) R. Ahuja, A. Chug, S. Kohli, S. Gupta, P. Ahuja, The impact of features extraction on the sentiment analysis, Procedia Computer Science, 152, pp. 341–348 (2019).

(22) D. Połap, and M. Woźniak, Red fox optimization algorithm, Expert Systems with Applications, 166, pp. 114107 (2021).

EFFECTIVE OFFENSIVE LANGUAGE DEDUCTION USING DEEP LEARNING IN SOCIAL MEDIA

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Language

Information