Spam Detection Using Large Datasets with Multilingual Support


Authors : Anil Kumar Jatra; Kusum Sharma

Volume/Issue : Volume 9 - 2024, Issue 12 - December

Google Scholar : https://tinyurl.com/44yhrepb

Scribd : https://tinyurl.com/yxtmfhru

DOI : https://doi.org/10.5281/zenodo.14609293

Abstract : Spam detection in the era of big data requires scalable and efficient techniques, particularly when dealing with large datasets containing diverse languages. Traditional methods struggle to address the multilingual nature of spam, as language-specific approaches may not generalize well across different languages. This paper explores the establishment of a spam block method that leverages large, diverse datasets encompassing multiple languages. We employ advanced machine-learning techniques to handle the complexities of linguistic variations. By incorporating cross-lingual embeddings, transfer learning, and ensemble models, our system aims to detect spam content across various languages accurately. We highlight the importance of feature extraction, text preprocessing, and model adaptation in achieving robust multilingual spam detection. The proposed approach demonstrates improved performance in detecting spam messages while maintaining scalability and adaptability to new languages, providing a foundational framework for combating spam globally.

Keywords : Spam Detection, Multilingual Spam, Machine Learning, Cross-Lingual Embeddings, Transfer Learning, Ensemble Methods, Feature Extraction, Text Preprocessing, Model Adaptation, Large Datasets, and Language- Independent Spam Detection.

References :

  1. Shreya Menthe, Kanish Rawal, Mrudula Hirave, A.J.Patil, “SMS spam detection using machine learning”  DOI: 10.17148/IJARCCE.2024. 13307
  2. Suparna DasGupta, Soumyabrata Saha, Suman Kumar Das, “SMS spam detection using machine learning” Journal of Physics: Conference Series DOI: 10.1088/1742-6596/1797/1/012017
  3. Ravi H Gedam, Sumit Kumar Banchhor,” Sms spam detection using machine learning” Journal of Computational Analysis and Applications Volume 33, No. 4, 2024
  4. Arpita Laxman Gawade, Sneha Sagar Shinde, Samruddhi Gajanan Sawant, Rutuja Santosh Chougule, Mrs Almas Amol Mahaldar “A Research Paper of SMS Spam Detection” 2024 IJNRD, Volume 9, Issue 3-03-2024, ISSN: 2456-4184 | IJNRD.ORG
  5. Harshit Kumar Simbal, Aaryan Sharma, Smriti Kumari, Gautam Kumar, Harshvardhan Kumar,” Spam Sms Classifier Using Machine Learning Algorithms” IJFMR240219483, Volume 6, Issue 2, March-April 2024
  6. Gregorius Airlangga, “Optimizing SMS Spam Detection Using Machine Learning: A Comparative Analysis of Ensemble and Traditional Classifiers” Journal of Computer Networks, Architecture, and High-Performance Computing, Volume 6, Number 4, October 2024 DOI: 10.47709/cnahpc. v6i4.482
  7. Shafi’l Muhammad Abdulhamid, (Member, IEEE), Muhammad Shafie Abd Latiff, Haruna Chiroma, (Member, IEEE), Oluwafemi Osho, Gaddafi Abdul-Salaam, Adamu I. Abubakar, (Member, IEEE), and Tutut Herawan, “A Review on Mobile SMS Spam Filtering Techniques”IEEE Access Published: February 13, 2017
  8. Pavas Navaney, Ajay Rana, Gaurav Dubey, “SMS Spam Filtering using Supervised Machine Learning Algorithms” Conference Paper DOI: 10.1109/CONFLUENCE. 2018.8442564
  9. Pradeep K.B, “Sms spam detection using machine learning and deep learning techniques”, Published: May 2022
  10. B Sai Deepthi, K Sudheer Kumar, CH B M Swaroop, K Satya Sudheer, “Sms spam filtering using machine learning” JETIR, May 2024, Volume 11, Issue 5 Sixth International Conference on Computing Methodologies and Communication (ICCMC 2022)
  11. Mr. Ravi H. Gedam, Dr. Sumit Kumar Banchhor, “An Enhanced SMS Spam Detection Framework Using Blockchain and Machine Learning” IJISAE, 2024, Volume 12(22s), Pages 728–739
  12. Samadhan Nagre, “Mobile SMS Spam Detection using Machine Learning Techniques” 2018 JETIR December 2018, Volume 5, Issue 12
  13. Manas Ranjan Bishi, N Sardhak Manikanta, G Hari Surya Bharadwaj, P Siva Krishna Teja, Dr G Rama Koteswara Rao, “Optimizing SMS Spam Detection: Leveraging the Strength of a Voting Classifier Ensemble” IJISAE, 2024, Volume 12(3), Pages 2458–2469
  14. Ahmed Alzahrani, “Explainable AI-based Framework for Efficient Detection of Spam from Text Using an Enhanced Ensemble Technique”, Engineering, Technology & Applied Science Research Volume 14, No. 4, 2024, Pages 15596-15601
  15. Shushanta Pudasainia, Aman Shakyaa, , Sanjeeb Prasad Pandeya, Prakriti Paudelb, Sunil Ghimirec, Prabhat Ale, “SMS Spam Detection using Relevance Vector Machine” 3rd International Conference on Evolutionary Computing and Mobile Sustainable Networks (ICECMSN 2023)
  16. Abdallah Ghourabi, Manar Alohaly, “Enhancing Spam Message Classification and Detection Using Transformer-Based Embedding and Ensemble Learning” Sensors 2023, Volume 23, Article 3861DOI: 10.3390/s23083861
  17. Abdallah Ghourabi, Mahmood A. Mahmood, Qusay M. Alzubi, “A Hybrid CNN-LSTM Model for SMS Spam Detection in Arabic and English Messages” Future Internet 2020, Volume 12, Article 156 DOI: 10.3390/fi12090156
  18. Mr. E.Sankar, Y Y S Shekhar Babu, M.Tridev, “Sms spam detection using machine learning” International Journal of Scientific Research in Engineering and Management Volume 7, Issue 4, April 2023
  19. Umair Maqsood, Saif Ur Rehman, Tariq Ali, Khalid Mahmood, Tahani Alsaedi, Mahwish Kundi, “An Intelligent Framework Based on Deep Learning for SMS and e-mail Spam Detection” Hindawi Applied Computational Intelligence and Soft Computing, Volume 2023 DOI: 10.1155/2023/6648970
  20. Suvarna M, Sanjeev J R, Kiran K, Ganjendran, “Sms spam detection using machine learning” DOI: 10.17148/IARJSET.2024.11440
  21. Nisha Wilvicta, Pradeep N, Tharun R, Mohammed Tousif, “Sms spam detection using machine learning” International Journal of Advances in Engineering Architecture Science and Technology DOI: 12.2023 13677758/IJAEAST. 2023.10.0001
  22. Humaira Yasmin Aliza, Kazi Aahala Nagary, Eshtiak Ahmed, Kazi Mumtahina Puspita, Khadiza Akter Rimi, Ankit Khater, Fahad Faisal, “A Comparative Analysis of SMS Spam Detection Employing Machine Learning Methods” Proceedings of the
  23. Andrew Kipkebut, Moses Thiga, Elizabeth Okumu, “Machine Learning Sms Spam Detection Model” Kabarak University International Conference on Computing and Information Systems, October 14–15, 2019
  24. Samadhan M. Nagare, Pratibha P. Dapke, Syed Ahteshamuddin Quadri, Sagar B. Bandal, Manasi Ram Baheti, “A Review on Various Approaches on Spam Detection of Mobile Phone SMS” International Journal for Research in Engineering Applications & Management (IJREAM) ISSN: 2454-9150, Volume 9, Issue 2, May 2023
  25. Luo GuangJun, Shah Nazir, Habib Ullah Khan, Amin Ul Haq, “Spam Detection Approach for Secure Mobile Message Communication Using Machine Learning Algorithms” Hindawi Security and Communication Networks, Volume 2020 DOI: 10.1155/2020/8873639

Spam detection in the era of big data requires scalable and efficient techniques, particularly when dealing with large datasets containing diverse languages. Traditional methods struggle to address the multilingual nature of spam, as language-specific approaches may not generalize well across different languages. This paper explores the establishment of a spam block method that leverages large, diverse datasets encompassing multiple languages. We employ advanced machine-learning techniques to handle the complexities of linguistic variations. By incorporating cross-lingual embeddings, transfer learning, and ensemble models, our system aims to detect spam content across various languages accurately. We highlight the importance of feature extraction, text preprocessing, and model adaptation in achieving robust multilingual spam detection. The proposed approach demonstrates improved performance in detecting spam messages while maintaining scalability and adaptability to new languages, providing a foundational framework for combating spam globally.

Keywords : Spam Detection, Multilingual Spam, Machine Learning, Cross-Lingual Embeddings, Transfer Learning, Ensemble Methods, Feature Extraction, Text Preprocessing, Model Adaptation, Large Datasets, and Language- Independent Spam Detection.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe