A Novel Ensemble Approach for Toxic Comment Detection Using Context-Free and Context-Aware Models


Authors : Sahana. P. Shankar; Pranathi Hegde; Nidhi N P; Sanjana N; Sree Bhanu Mukkamala

Volume/Issue : Volume 9 - 2024, Issue 12 - December

Google Scholar : https://tinyurl.com/5edmtm7x

Scribd : https://tinyurl.com/4uxzzwvp

DOI : https://doi.org/10.5281/zenodo.14613807

Abstract : With more than 100 million active users on social media today, it has become inevitable that the average user is exposed to some form of cyberbullying. Toxicity and hate comments have become a critical challenge necessitating efficient tools for their detection and mitigation. In this study, we propose a novel ensemble approach combining context-free and context-aware models to detect toxic comments. Using the Civil Comments dataset, we curated two distinct datasets, one with conversational context and one without, which had to be extensively processed and augmented before they were employed. The two models were built using the RoBERTa architecture which was further fine-tuned and modified to suit this particular task. Lastly, the classification outputs from both the models were integrated using equal weights. The context-free model achieved an accuracy 94.87% and an F1 score of 0.95 for both labels- toxic and non-toxic. The context-aware model showed an accuracy of 87.82% achieving an F1 score of 0.91 for non-toxic comments and 0.80 for toxic comments. This work underscores the importance of incorporating conversational context and ensemble techniques in developing robust toxicity detection systems.

Keywords : Social Media, Cyberbullying, Toxicity, Ensemble Approach, Civil Comments Dataset, Data Augmentation, RoBERTa.

References :

  1. Chowanda, A., Sutoyo, R., & Tanachutiwat, S. (2021). Exploring text-based emotions recognition machine learning techniques on social media conversation. Procedia Computer Science, 179, 821-828.
  2. Ho, V. A., Nguyen, D. H. C., Nguyen, D. H., Pham, L. T. V., Nguyen, D. V., Nguyen, K. V., & Nguyen, N. L. T. (2020). Emotion recognition for Vietnamese social media text. In Computational Linguistics: 16th International Conference of the Pacific Association for Computational Linguistics, PACLING 2019, Hanoi, Vietnam, October 11–13, 2019, Revised Selected Papers 16 (pp. 319-333). Springer Singapore.
  3. Batbaatar, E., Li, M., & Ryu, K. H. (2019). Semantic-emotion neural network for emotion recognition from text. IEEE Access, 7, 111866-111878.
  4. Gaind, B., Syal, V., & Padgalwar, S. (2019). Emotion detection and analysis on social media. arXiv preprint arXiv:1901.08458.
  5. Canales, L., & Martínez-Barco, P. (2014). Emotion detection from text: A survey. In Proceedings of the Workshop on Natural Language Processing in the 5th Information Systems Research Working Days (JISIC) (pp. 37-43).
  6. Kusal, S., Patil, S., Choudrie, J., Kotecha, K., Vora, D., & Pappas, I. (2022). A review on text-based emotion detection—Techniques, applications, datasets, and future directions. arXiv preprint arXiv:2205.03235.
  7. Acheampong, F. A., Wenyu, C., & Nunoo-Mensah, H. (2020). Text-based emotion detection: Advances, challenges, and opportunities. Engineering Reports, 2(7), e12189.
  8. Hicham, N., Karim, S., & Habbat, N. (2023). Customer sentiment analysis for Arabic social media using a novel ensemble machine learning approach. International Journal of Electrical and Computer Engineering, 13(4), 4504.
  9. Omuya, E. O., Okeyo, G., & Kimwele, M. (2023). Sentiment analysis on social media tweets using dimensionality reduction and natural language processing. Engineering Reports, 5(3), e12579.
  10. Lian, Z., Liu, B., & Tao, J. (2021). CTNet: Conversational transformer network for emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 985-1000.
  11. Wankhade, M., Rao, A. C. S., & Kulkarni, C. (2022). A survey on sentiment analysis methods, applications, and challenges. Artificial Intelligence Review, 55(7), 5731-5780.
  12. Dang, N. C., Moreno-García, M. N., & De la Prieta, F. (2020). Sentiment analysis based on deep learning: A comparative study. Electronics, 9(3), 483.
  13. Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), 1093-1113.
  14. Yue, L., Chen, W., Li, X., Zuo, W., & Yin, M. (2019). A survey of sentiment analysis in social media. Knowledge and Information Systems, 60, 617-663.
  15. Soleymani, M., Garcia, D., Jou, B., Schuller, B., Chang, S. F., & Pantic, M. (2017). A survey of multimodal sentiment analysis. Image and Vision Computing, 65, 3-14.
  16. Neethu, M. S., & Rajasree, R. (2013). Sentiment analysis in Twitter using machine learning techniques. In 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT) (pp. 1-5). IEEE.
  17. Le, B., & Nguyen, H. (2015). Twitter sentiment analysis using machine learning techniques. In Advanced Computational Methods for Knowledge Engineering: Proceedings of 3rd International Conference on Computer Science, Applied Mathematics and Applications-ICCSAMA 2015 (pp. 279-289). Springer International Publishing.
  18. Jemai, F., Hayouni, M., & Baccar, S. (2021). Sentiment analysis using machine learning algorithms. In 2021 International Wireless Communications and Mobile Computing (IWCMC) (pp. 775-779). IEEE.
  19. Hemalatha, I., Varma, G. S., & Govardhan, A. (2013). Sentiment analysis tool using machine learning algorithms. International Journal of Emerging Trends & Technology in Computer Science (IJETTCS), 2(2), 105-109.
  20. Mitra, A., & Mohanty, S. (2020). Sentiment analysis using machine learning approaches. In Emerging Technologies in Data Mining and Information Security (pp. 63-68). Springer.
  21. Gupta, B., Negi, M., Vishwakarma, K., Rawat, G., Badhani, P., & Tech, B. (2017). Study of Twitter sentiment analysis using machine learning algorithms on Python. International Journal of Computer Applications, 165(9), 29-34.
  22. Jain, A. P., & Dandannavar, P. (2016). Application of machine learning techniques to sentiment analysis. In 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)(pp. 628-632). IEEE.
  23. Van Atteveldt, W., Van der Velden, M. A., & Boukes, M. (2021). The validity of sentiment analysis: Comparing manual annotation, crowd-coding, dictionary approaches, and machine learning algorithms. Communication Methods and Measures, 15(2), 121-140.
  24. Singh, J., Singh, G., & Singh, R. (2017). Optimization of sentiment analysis using machine learning classifiers. Human-centric Computing and Information Sciences, 7, 1-12.
  25. Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. arXiv preprint cs/0205070.

With more than 100 million active users on social media today, it has become inevitable that the average user is exposed to some form of cyberbullying. Toxicity and hate comments have become a critical challenge necessitating efficient tools for their detection and mitigation. In this study, we propose a novel ensemble approach combining context-free and context-aware models to detect toxic comments. Using the Civil Comments dataset, we curated two distinct datasets, one with conversational context and one without, which had to be extensively processed and augmented before they were employed. The two models were built using the RoBERTa architecture which was further fine-tuned and modified to suit this particular task. Lastly, the classification outputs from both the models were integrated using equal weights. The context-free model achieved an accuracy 94.87% and an F1 score of 0.95 for both labels- toxic and non-toxic. The context-aware model showed an accuracy of 87.82% achieving an F1 score of 0.91 for non-toxic comments and 0.80 for toxic comments. This work underscores the importance of incorporating conversational context and ensemble techniques in developing robust toxicity detection systems.

Keywords : Social Media, Cyberbullying, Toxicity, Ensemble Approach, Civil Comments Dataset, Data Augmentation, RoBERTa.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe