Authors :
Rotimi E. Ajigboye
Volume/Issue :
Volume 11 - 2026, Issue 1 - January
Google Scholar :
https://tinyurl.com/5463v8c2
Scribd :
https://tinyurl.com/yrth7mhn
DOI :
https://doi.org/10.38124/ijisrt/26jan1337
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
The rapid growth of digital infrastructure has intensified the scale and sophistication of cyber threats, demanding
more adaptive and intelligent detection mechanisms. Natural language processing (NLP) has emerged as a critical enabler
for cybersecurity threat detection by transforming unstructured textual data—such as threat intelligence reports, security
logs, phishing messages, and malware descriptions—into actionable security insights. Recent advances in transformer-based
architectures and large language models have significantly improved the automated identification of malicious patterns,
threat actor behaviors, indicators of compromise, and tactics, techniques, and procedures embedded within heterogeneous
cyber data sources. NLP-driven systems enable semantic understanding, contextual reasoning, and cross-document
knowledge extraction, overcoming the limitations of rule-based and signature-driven approaches that struggle with zero-
day and polymorphic attacks. By integrating named entity recognition, relation extraction, text classification, and knowledge
graph construction, modern NLP frameworks support end-to-end threat intelligence pipelines that enhance situational
awareness and accelerate incident response. Moreover, the convergence of NLP with cybersecurity facilitates scalable
phishing detection, malware classification, and automated threat intelligence structuring, enabling interoperability with
standardized security platforms. Despite these advances, challenges remain in model robustness, explainability, domain
adaptation, and resistance to adversarial manipulation. This abstract highlights the role of NLP-driven approaches in
advancing cybersecurity threat detection, emphasizing their capacity to automate intelligence extraction, improve detection
accuracy, and support proactive defense strategies in an evolving threat landscape.
Keywords :
Natural Language Processing; Cybersecurity Threat Detection; Threat Intelligence Extraction; Transformer-Based Models; Large Language Models.
References :
- Büchel, M., Böhme, R., & Trinius, P. (2025). SoK: Automated TTP extraction from CTI reports — are we there yet? Proceedings of the USENIX Security Symposium. Retrieved from https://www.usenix.org/system/files/usenixsecurity25-buechel.pdf.
- Alshomrani, M., & (coauthors). (2024). Survey of transformer-based malicious software detection. Electronics, 13(23), 4677. https://doi.org/10.3390/electronics13234677.
- Wang, G., (coauthors). (2024). KnowCTI: Knowledge-based cyber threat intelligence entity extraction and linking. Journal of Information Security and Applications. https://doi.org/10.1016/j.jisa.2024.xxxxxx.
- Otieno, D., & (coauthors). (2023). Detecting phishing URLs using the BERT transformer model. National Science Foundation Technical Report / arXiv/Conference paper. Retrieved from https://par.nsf.gov/servlets/purl/10534600.
- Songailaitė, M., Kankevičiūtė, E., Zhyhun, B., & Mandravickaitė, J. (2023). BERT-based models for phishing detection. CEUR Workshop Proceedings. Retrieved from https://ceur-ws.org/Vol-3575/Paper4.pdf.
- Motlagh, F. N., (coauthors). (2025). Large language models in cybersecurity: State-of-the-art. Proceedings — SciTePress / Conference on Cybersecurity, 2025. Retrieved from https://www.scitepress.org/Papers/2025/133776/133776.pdf.
- Joy, A., (coauthors). (2025). Threat Intelligence Extraction Framework (TIEF) for TTP extraction and STIX generation. Security and Privacy (MDPI), 5(3), 63. https://doi.org/10.3390/security5020063.
- Jaffal, N. O., Alkhanafseh, M., & Mohaisen, D. (2025). Large language models in cybersecurity: A survey of applications, vulnerabilities, and defenses. MDPI — Special Issue on AI & Security, 6(9), 216. https://doi.org/10.3390/xxxxxxx.
- Alarifi, A., Alam, F., & (coauthors). (2024). CyNER: Extracting cybersecurity entities from CTI texts using transformer-enhanced NER. Information Processing & Management / Workshop paper. Retrieved from https://www.researchgate.net/publication/392951893.
- Hartono, B., Zhang, J., & Liu, S. (2024). Transformers in cybersecurity: Advancing threat detection and APT malware classification. Proceedings of the 2024 International Conference on Generative AI and Information Security, 235–242. https://doi.org/10.1145/3665348.3665389.
- Mittal, A., & (coauthor). (2022). Phishing detection: NLP & machine learning approaches — a survey and experiments. Data Science Review / Technical Report. Retrieved from https://scholar.smu.edu/cgi/viewcontent.cgi?article=1215&context=datasciencereview.
- Alshomrani, M., (coauthors). (2024). Survey: Transformer-based approaches for malware and IoC detection in static and dynamic analysis. Electronics / Special Issue on AI Security. Retrieved from https://www.mdpi.com/2079-9292/13/23/4677
- Saias, J., (coauthors). (2025). Advances in NLP techniques for detection of message-based threats in digital communications. Electronics (MDPI), 14(13), 2551. https://doi.org/10.3390/electronics14132551.
- Kaur, R., & (coauthors). (2025). Harnessing the power of language models in cybersecurity: Frameworks, use cases, and challenges. Computers & Security / Elsevier. https://doi.org/10.1016/j.cose.2025.xxxxxx.
- Joye, A., Büchel, M., & (coauthors). (2024). SoK / systematic review: Automated mapping of CTI reports to MITRE ATT&CK (TTP extraction workflows). IEEE / USENIX Workshop Paper. Retrieved from https://www.usenix.org/system/files/usenixsecurity25-buechel.pdf.
The rapid growth of digital infrastructure has intensified the scale and sophistication of cyber threats, demanding
more adaptive and intelligent detection mechanisms. Natural language processing (NLP) has emerged as a critical enabler
for cybersecurity threat detection by transforming unstructured textual data—such as threat intelligence reports, security
logs, phishing messages, and malware descriptions—into actionable security insights. Recent advances in transformer-based
architectures and large language models have significantly improved the automated identification of malicious patterns,
threat actor behaviors, indicators of compromise, and tactics, techniques, and procedures embedded within heterogeneous
cyber data sources. NLP-driven systems enable semantic understanding, contextual reasoning, and cross-document
knowledge extraction, overcoming the limitations of rule-based and signature-driven approaches that struggle with zero-
day and polymorphic attacks. By integrating named entity recognition, relation extraction, text classification, and knowledge
graph construction, modern NLP frameworks support end-to-end threat intelligence pipelines that enhance situational
awareness and accelerate incident response. Moreover, the convergence of NLP with cybersecurity facilitates scalable
phishing detection, malware classification, and automated threat intelligence structuring, enabling interoperability with
standardized security platforms. Despite these advances, challenges remain in model robustness, explainability, domain
adaptation, and resistance to adversarial manipulation. This abstract highlights the role of NLP-driven approaches in
advancing cybersecurity threat detection, emphasizing their capacity to automate intelligence extraction, improve detection
accuracy, and support proactive defense strategies in an evolving threat landscape.
Keywords :
Natural Language Processing; Cybersecurity Threat Detection; Threat Intelligence Extraction; Transformer-Based Models; Large Language Models.