Enhanced Legal Information Access in Nigeria: A Novel Retrieval Augmented Generation (RAG) Approach


Authors : Echeonwu, Emmanuel Chinyere; Bolou, Dickson Bolou; Omonijo, Oluwaseyi Oluwatola; Ugbogbo, Mike Johnon; Omejieke, Chinenye Ekene

Volume/Issue : Volume 10 - 2025, Issue 12 - December


Google Scholar : https://tinyurl.com/4xms2y6v

Scribd : https://tinyurl.com/2s4ft6t7

DOI : https://doi.org/10.38124/ijisrt/25dec1333

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : This study presents a novel Retrieval Augmented Generation (RAG), a text-based query system, for efficient access to Nigerian legal information. Utilizing the Nigerian Constitution and Criminal Code as its knowledge base, the system employs a pipeline involving semantic segmentation, Sentence Transformer embeddings, and vector database indexing for optimized information retrieval. User queries are refined by a Google Gemini large language model, trained as a Nigerian legal expert, to identify key terms and intent before searching the database for the top ten most relevant document chunks. These chunks, along with the refined query and keywords, are then fed back into Gemini to generate a detailed, referenced answer. The current implementation is evaluated using the precision. Recall, F1Score, perplexity and diversity metrics, and results fall within acceptable benchmarks of mean values (0.65, 0.73, 0.68, 14.42, 0.87) respectively, representing a significant advancement in making complex legal big data accessible.

Keywords : Retrieval Augmented Generation2, Embeddings, Bigdata, Vector Database, Large Language Model.

References :

  1. Gwangndi, M., I. (2016). The Socio-Legal Context of the Nigerian Legal System and the Shariah Controversy: An Analysis of Its Impact on Some Aspects of Nigerian Women’S Rights. Journal of Law, Policy and Globalization. 45: 2224-3240.
  2. Han, Y., Liu, C., and Wang, P., (2023). A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge. https://arxiv.org/pdf/2310.11703.
  3. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Kuttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2021). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.ArXiv, abs/2005.11401.
  4. Li, B., Zhou, H., He, J., Wang, M., Yang, Y., and Li, L., (2020). On the Sentence Embeddings from Pre-trained Language Models. Art. no. arXiv:2011.05864, 2020. doi:10.48550/arXiv.2011.05864.
  5. Omri K., Adir C., Noam, M., Rotman, M., and Berant, J. (2018).Text Segmentation as a           Supervised Learning Task. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). pages 469–473, New Orleans, Louisiana. Association for Computational Linguistics.
  6. Petroni, F., Rocktäschel, T., Riedel, S., Lewis, P., Bakhtin, A., Wu, Y. and Miller, A. (2019). Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pages 2463–2473. Association for Computational Linguistics. doi: 10.18653/v1/D19-1250. URL https://www.aclweb.org/anthology/D19-1250.
  7. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. (2019). Exploring the limits of transfer learning with a unified text-to-text      transformer. arXiv e-prints. URL https://arxiv.org/abs/1910.10683.
  8. Reimers, N. and Gurevych, I., (2009). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Art. no. arXiv:1908.10084, doi:10.48550/arXiv.1908.10084.
  9. Roberts, A., Raffel,  C., and Shazeer, N. (2020). How much knowledge can you pack into the parameters of a language model? arXiv e-prints. URL https://arxiv.org/abs/             2002.08910.
  10. United Nation. (2006). Social Justice in an open world. Publication by the Department of Economic and Social Affairs. International Forum for Social Development. https://www.un.org/esa/socdev/documents/ifsd/SocialJustice.pdf
  11. Vijit M., Rishabh, S., Kumar G., Shubham K, M.,, Angshuman H., Arnab B., Ashutosh, M. (2021). Semantic Segmentation of Legal Documents via Rhetorical Roles. Art. no. arXiv:2112.01836,  doi:10.48550/arXiv.2112.01836.

This study presents a novel Retrieval Augmented Generation (RAG), a text-based query system, for efficient access to Nigerian legal information. Utilizing the Nigerian Constitution and Criminal Code as its knowledge base, the system employs a pipeline involving semantic segmentation, Sentence Transformer embeddings, and vector database indexing for optimized information retrieval. User queries are refined by a Google Gemini large language model, trained as a Nigerian legal expert, to identify key terms and intent before searching the database for the top ten most relevant document chunks. These chunks, along with the refined query and keywords, are then fed back into Gemini to generate a detailed, referenced answer. The current implementation is evaluated using the precision. Recall, F1Score, perplexity and diversity metrics, and results fall within acceptable benchmarks of mean values (0.65, 0.73, 0.68, 14.42, 0.87) respectively, representing a significant advancement in making complex legal big data accessible.

Keywords : Retrieval Augmented Generation2, Embeddings, Bigdata, Vector Database, Large Language Model.

CALL FOR PAPERS


Paper Submission Last Date
31 - January - 2026

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe