Authors :
Sahana. P. Shankar; Pranathi Hegde; Nidhi N P; Sanjana N; Sree Bhanu Mukkamala
Volume/Issue :
Volume 9 - 2024, Issue 12 - December
Google Scholar :
https://tinyurl.com/5edmtm7x
Scribd :
https://tinyurl.com/4uxzzwvp
DOI :
https://doi.org/10.5281/zenodo.14613807
Abstract :
With more than 100 million active users on
social media today, it has become inevitable that the
average user is exposed to some form of cyberbullying.
Toxicity and hate comments have become a critical
challenge necessitating efficient tools for their detection
and mitigation. In this study, we propose a novel ensemble
approach combining context-free and context-aware
models to detect toxic comments. Using the Civil
Comments dataset, we curated two distinct datasets, one
with conversational context and one without, which had
to be extensively processed and augmented before they
were employed. The two models were built using the
RoBERTa architecture which was further fine-tuned and
modified to suit this particular task. Lastly, the
classification outputs from both the models were
integrated using equal weights. The context-free model
achieved an accuracy 94.87% and an F1 score of 0.95 for
both labels- toxic and non-toxic. The context-aware model
showed an accuracy of 87.82% achieving an F1 score of
0.91 for non-toxic comments and 0.80 for toxic comments.
This work underscores the importance of incorporating
conversational context and ensemble techniques in
developing robust toxicity detection systems.
Keywords :
Social Media, Cyberbullying, Toxicity, Ensemble Approach, Civil Comments Dataset, Data Augmentation, RoBERTa.
References :
- Chowanda, A., Sutoyo, R., & Tanachutiwat, S. (2021). Exploring text-based emotions recognition machine learning techniques on social media conversation. Procedia Computer Science, 179, 821-828.
- Ho, V. A., Nguyen, D. H. C., Nguyen, D. H., Pham, L. T. V., Nguyen, D. V., Nguyen, K. V., & Nguyen, N. L. T. (2020). Emotion recognition for Vietnamese social media text. In Computational Linguistics: 16th International Conference of the Pacific Association for Computational Linguistics, PACLING 2019, Hanoi, Vietnam, October 11–13, 2019, Revised Selected Papers 16 (pp. 319-333). Springer Singapore.
- Batbaatar, E., Li, M., & Ryu, K. H. (2019). Semantic-emotion neural network for emotion recognition from text. IEEE Access, 7, 111866-111878.
- Gaind, B., Syal, V., & Padgalwar, S. (2019). Emotion detection and analysis on social media. arXiv preprint arXiv:1901.08458.
- Canales, L., & Martínez-Barco, P. (2014). Emotion detection from text: A survey. In Proceedings of the Workshop on Natural Language Processing in the 5th Information Systems Research Working Days (JISIC) (pp. 37-43).
- Kusal, S., Patil, S., Choudrie, J., Kotecha, K., Vora, D., & Pappas, I. (2022). A review on text-based emotion detection—Techniques, applications, datasets, and future directions. arXiv preprint arXiv:2205.03235.
- Acheampong, F. A., Wenyu, C., & Nunoo-Mensah, H. (2020). Text-based emotion detection: Advances, challenges, and opportunities. Engineering Reports, 2(7), e12189.
- Hicham, N., Karim, S., & Habbat, N. (2023). Customer sentiment analysis for Arabic social media using a novel ensemble machine learning approach. International Journal of Electrical and Computer Engineering, 13(4), 4504.
- Omuya, E. O., Okeyo, G., & Kimwele, M. (2023). Sentiment analysis on social media tweets using dimensionality reduction and natural language processing. Engineering Reports, 5(3), e12579.
- Lian, Z., Liu, B., & Tao, J. (2021). CTNet: Conversational transformer network for emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 985-1000.
- Wankhade, M., Rao, A. C. S., & Kulkarni, C. (2022). A survey on sentiment analysis methods, applications, and challenges. Artificial Intelligence Review, 55(7), 5731-5780.
- Dang, N. C., Moreno-García, M. N., & De la Prieta, F. (2020). Sentiment analysis based on deep learning: A comparative study. Electronics, 9(3), 483.
- Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), 1093-1113.
- Yue, L., Chen, W., Li, X., Zuo, W., & Yin, M. (2019). A survey of sentiment analysis in social media. Knowledge and Information Systems, 60, 617-663.
- Soleymani, M., Garcia, D., Jou, B., Schuller, B., Chang, S. F., & Pantic, M. (2017). A survey of multimodal sentiment analysis. Image and Vision Computing, 65, 3-14.
- Neethu, M. S., & Rajasree, R. (2013). Sentiment analysis in Twitter using machine learning techniques. In 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT) (pp. 1-5). IEEE.
- Le, B., & Nguyen, H. (2015). Twitter sentiment analysis using machine learning techniques. In Advanced Computational Methods for Knowledge Engineering: Proceedings of 3rd International Conference on Computer Science, Applied Mathematics and Applications-ICCSAMA 2015 (pp. 279-289). Springer International Publishing.
- Jemai, F., Hayouni, M., & Baccar, S. (2021). Sentiment analysis using machine learning algorithms. In 2021 International Wireless Communications and Mobile Computing (IWCMC) (pp. 775-779). IEEE.
- Hemalatha, I., Varma, G. S., & Govardhan, A. (2013). Sentiment analysis tool using machine learning algorithms. International Journal of Emerging Trends & Technology in Computer Science (IJETTCS), 2(2), 105-109.
- Mitra, A., & Mohanty, S. (2020). Sentiment analysis using machine learning approaches. In Emerging Technologies in Data Mining and Information Security (pp. 63-68). Springer.
- Gupta, B., Negi, M., Vishwakarma, K., Rawat, G., Badhani, P., & Tech, B. (2017). Study of Twitter sentiment analysis using machine learning algorithms on Python. International Journal of Computer Applications, 165(9), 29-34.
- Jain, A. P., & Dandannavar, P. (2016). Application of machine learning techniques to sentiment analysis. In 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)(pp. 628-632). IEEE.
- Van Atteveldt, W., Van der Velden, M. A., & Boukes, M. (2021). The validity of sentiment analysis: Comparing manual annotation, crowd-coding, dictionary approaches, and machine learning algorithms. Communication Methods and Measures, 15(2), 121-140.
- Singh, J., Singh, G., & Singh, R. (2017). Optimization of sentiment analysis using machine learning classifiers. Human-centric Computing and Information Sciences, 7, 1-12.
- Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. arXiv preprint cs/0205070.
With more than 100 million active users on
social media today, it has become inevitable that the
average user is exposed to some form of cyberbullying.
Toxicity and hate comments have become a critical
challenge necessitating efficient tools for their detection
and mitigation. In this study, we propose a novel ensemble
approach combining context-free and context-aware
models to detect toxic comments. Using the Civil
Comments dataset, we curated two distinct datasets, one
with conversational context and one without, which had
to be extensively processed and augmented before they
were employed. The two models were built using the
RoBERTa architecture which was further fine-tuned and
modified to suit this particular task. Lastly, the
classification outputs from both the models were
integrated using equal weights. The context-free model
achieved an accuracy 94.87% and an F1 score of 0.95 for
both labels- toxic and non-toxic. The context-aware model
showed an accuracy of 87.82% achieving an F1 score of
0.91 for non-toxic comments and 0.80 for toxic comments.
This work underscores the importance of incorporating
conversational context and ensemble techniques in
developing robust toxicity detection systems.
Keywords :
Social Media, Cyberbullying, Toxicity, Ensemble Approach, Civil Comments Dataset, Data Augmentation, RoBERTa.