Comprehending and Reducing LLM Hallucinations


Authors : Harsh; Dr. Shobha T

Volume/Issue : Volume 9 - 2024, Issue 7 - July

Google Scholar : https://tinyurl.com/2um8vcxc

Scribd : https://tinyurl.com/mryw2244

DOI : https://doi.org/10.38124/ijisrt/IJISRT24JUL882

Abstract : The integration of large language models (LLM) into many artificial intelligence applications shows the best performance in tasks such as text mining, typing, question answering. Despite his success, his LL.M. The biggest concern is the emergence of so-called "hallucinations", especially in text-based systems and Q&As that rely on LL M. These hearings may lead to the spread of misinformation or fraud. This article explains the basics of AI illusions and highlights their importance in AI. Work involves deploying visualizations to a variety of tasks, including machine translation, surveys, interviews, content writing, LLM maps, and visualization questions. Additionally, this article explores potential strategies to reduce negative perceptions in order to increase the overall credibility of the LL.M.

Keywords : LLMs, Hallucination, Artificial Intelligence, Hallucination Mitigation, Factualness.

References :

  1. V. Raunak, A. Menezes, M. Junczys-Dowmunt, The curious case of hallucinations in neural machine translation, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Online, 2021, pp. 1172–1183. URL:       https://aclanthology.org/2021.naacl-main.92. doi:10.18653/v1/2021.naacl-main.92.
  2. N. M. Guerreiro, D. Alves, J. Waldendorf, B. Haddow, A. Birch, P. Colombo, A. Martins, Hallucinations in large multilingual translation models, ArXiv abs/2303.16104 (2023).            URL: https://api.semanticscholar.org/CorpusID:257771892.
  3. D. Dale, E. Voita, J. Lam, P. Hansanti, C. Ropers, E. Kalbassi, C. Gao, L. Barrault, M. R. Costa-jussà, Halomi: A manually annotated benchmark for multilingual hallucination and omission detection in machine translation, ArXiv abs/2305.11746 (2023). URL: https://api.semanticscholar.org/CorpusID:258823059.
  4. J. Pfeiffer, F. Piccinno, M. Nicosia, X. Wang, M. Reid, S. Ruder, mmt5: Modular multilingual pre-training solves source language hallucinations, ArXiv abs/2305.14224 (2023).  URL: https://api.semanticscholar.org/CorpusID:258841429.
  5. S. Lin, J. Hilton, O. Evans, TruthfulQA: Measuring how models mimic human falsehoods, in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Dublin, Ireland, 2022, pp. 3214–3252.  URL: https://aclanthology.org/2022.acl-long.229. doi:10.18653/v1/2022. acl-long.229.
  6. L. Zheng, W.-L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing, H. Zhang, J. Gonzalez, I. C. Stoica, Judging llm-as-a-judge with mt-bench and chatbot arena, ArXiv abs/2306.05685 (2023). URL: https://api.semanticscholar.org/CorpusID:259129398.
  1. V. Adlakha, P. BehnamGhader, X. H. Lu, N. Meade, S. Reddy, Evaluating correctness and faithfulness of instruction-following models for question answering, ArXiv abs/2307.16877    (2023). URL: https://api.semanticscholar.org/CorpusID:260334056.
  2. L. K. Umapathi, A. Pal, M. Sankarasubbu, Med-halt: Medical domain hallucination test for large language models, ArXiv abs/2307.15343 (2023). URL: https://api.semanticscholar.org/ CorpusID:260316324.
  3. N. Dziri, S. Milton, M. Yu, O. Zaiane, S. Reddy, On the origin of hallucinations in conversational models: Is it the datasets or the models?, in: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Seattle, United States, 2022, pp. 5271–5285. URL:  https://aclanthology.org/2022.naacl-main.387. doi:10.18653/v1/2022.naacl-main.387.
  4. S. Das, S. Saha, R. Srihari, Diving deep into modes of fact hallucinations in dialogue systems, in: Findings of the Association for Computational Linguistics: EMNLP 2022, Association for Computational Linguistics, Abu Dhabi, United Arab  Emirates, 2022, pp. 684–699. URL: https://aclanthology.org/2022.findings-emnlp.48. doi:10.18653/v1/2022. findings-emnlp.48
  5. N. M. Guerreiro, D. Alves, J. Waldendorf, B. Haddow, A. Birch, P. Colombo, A. Martins, Hallucinations in large multilingual translation models, ArXiv abs/2303.16104 (2023). URL: https://api.semanticscholar.org/CorpusID:257771892.
  6. D. Dale, E. Voita, J. Lam, P. Hansanti, C. Ropers, E. Kalbassi, C. Gao, L. Barrault, M. R. Costa-jussà, Halomi: A manually annotated benchmark for multilingual hallucination and omission detection in machine translation, ArXiv abs/2305.11746 (2023). URL: https://api.semanticscholar.org/CorpusID:258823059.
  7. J. Pfeiffer, F. Piccinno, M. Nicosia, X. Wang, M. Reid, S. Ruder, mmt5: Modular multilingual pre-training solves source language hallucinations, ArXiv abs/2305.14224 (2023). URL: https://api.semanticscholar.org/CorpusID:258841429.
  8. S. Lin, J. Hilton, O. Evans, TruthfulQA: Measuring how models mimic human falsehoods, in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Dublin, Ireland, 2022, pp. 3214–3252. URL: https://aclanthology.org/2022.acl-long.229. doi:10.18653/v1/2022. acl-long.229.
  9. L. Zheng, W.-L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing, H. Zhang, J. Gonzalez, I. C. Stoica, Judging llm-as-a-judge with mt-bench and chatbot arena, ArXiv abs/2306.05685 (2023). URL: https://api.semanticscholar.org/CorpusID:259129398.
  1. V. Adlakha, P. BehnamGhader, X. H. Lu, N. Meade, S. Reddy, Evaluating correctness and faithfulness of instruction-following models for question answering, ArXiv abs/2307.16877    (2023). URL: https://api.semanticscholar.org/CorpusID:260334056.
  2. L. K. Umapathi, A. Pal, M. Sankarasubbu, Med-halt: Medical domain hallucination test for large language models, ArXiv abs/2307.15343 (2023). URL: https://api.semanticscholar.org/ CorpusID:260316324.
  3. N. Dziri, S. Milton, M. Yu, O. Zaiane, S. Reddy, On the origin of hallucinations in conversational models: Is it the datasets or the models?, in: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Seattle, United States, 2022, pp. 5271–5285. URL:  https://aclanthology.org/2022.naacl-main.387. doi:10.18653/v1/2022.naacl-main.387.
  4. S. Das, S. Saha, R. Srihari, Diving deep into modes of fact hallucinations in dialogue systems, in: Findings of the Association for Computational Linguistics: EMNLP 2022, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 2022, pp. 684–699. URL: https://aclanthology.org/2022.findings-emnlp.48. doi:10.18653/v1/2022. findings-emnlp.48.
  5. N. Dziri, E. Kamalloo, S. Milton, O. Zaiane, M. Yu, E. M. Ponti, S. Reddy, FaithDial: A Faithful Benchmark for Information-Seeking Dialogue, Transactions of the Association for Computational Linguistics 10 (2022) 1473– 1490. URL: https://doi.org/10.1162/tacl_a_00529. doi:10.1162/tacl_a_00529.
  6. N. Dziri, H. Rashkin, T. Linzen, D. Reitter, Evaluating attribution in dialogue systems: The begin benchmark, Transactions of the Association for Computational Linguistics 10 (2021) 1066–1083. URL: https://api.semanticscholar.org/CorpusID:233481654.
  7. W. Sun, Z. Shi, S. Gao, P. Ren, M. de Rijke, Z. Ren, Contrastive learning reduces hallucination in conversations, Proceedings of the AAAI Conference on Artificial Intelligence 37 (2023)           13618–13626.      URL: https://ojs.aaai.org/index.php/AAAI/article/view/26596. doi:10.1609/ aaai.v37i11.26596.
  8. D. Tam, A. Mascarenhas, S. Zhang, S. Kwan, M. Bansal, C. Raffel, Evaluating the factual consistency of large language models through news summarization, in: Findings of the Association for Computational Linguistics: ACL 2023, Association for Computational Linguistics, Toronto, Canada, 2023, pp. 5220–5255. URL: https://aclanthology.org/2023. findings-acl.322. doi:10.18653/v1/2023.findings-acl.322.
  1. M. Cao, Y. Dong, J. Cheung, Hallucinated but factual! inspecting the factuality of hallucinations in abstractive summarization, in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Dublin,            Ireland, 2022,       pp. 3340–3354.   URL: https://aclanthology.org/2022.acl-long.236. doi:10.18653/v1/2022.acl-long.236.
  2. J. Shen, J. Liu, D. Finnie, N. Rahmati, M. Bendersky, M. Najork, “why is this misleading?”: Detecting news headline hallucinations with explanations, in: Proceedings of the ACM Web Conference 2023, WWW ’23, Association for Computing Machinery, New York, NY, USA, 2023, p. 1662–1672. URL: https://doi.org/10.1145/3543507.3583375. doi:10.1145/ 3543507.3583375.
  3. Y. Qiu, Y. Ziser, A. Korhonen, E. Ponti, S. B. Cohen, Detecting and mitigating hallucinations in multilingual summarisation, ArXiv abs/2305.13632 (2023). URL: https://api.semanticscholar.org/CorpusID:258841008.
  4. J. Yu, X. Wang, S. Tu, S. Cao, D. Zhang-li, X. Lv, H. Peng, Z. Yao, X. Zhang, H. Li, C. yan Li, Z. Zhang, Y. Bai, Y.-T. Liu, A. Xin, N. Lin, K. Yun, L. Gong, J. Chen, Z. Wu, Y. P. Qi, W. Li, Y. Guan, K. Zeng, J. Qi, H. Jin, J. Liu, Y. Gu, Y. Gu, Y. Yao, N. Ding, L. Hou, Z. Liu, B. Xu, J. Tang, J. Li, Kola: Carefully benchmarking world knowledge of large language models, ArXiv abs/2306.09296 (2023). URL: https://api.semanticscholar.org/CorpusID:259165244.
  5. N. Mihindukulasooriya, S. M. Tiwari, C. F. Enguix, K. Lata, Text2kgbench: A benchmark for ontology-driven knowledge graph generation from text, ArXiv abs/2308.02357 (2023). URL: https://api.semanticscholar.org/CorpusID:260611736.
  6. Y. Li, Y. Du, K. Zhou, J. Wang, W. X. Zhao, J. rong Wen, Evaluating object hallucination in large vision-language models, ArXiv abs/2305.10355 (2023). URL: https: //api.semanticscholar.org/CorpusID:258740697.
  7. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry
  8. L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. F. Christiano, J. Leike, R. Lowe, Training language models to follow instructions with human feedback, in: S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh (Eds.), Advances in Neural Information Processing Systems, volume 35, Curran Associates, Inc.,      2022, pp. 27730–27744. URL: https://proceedings.neurips.cc/paper_files/paper/2022/file/ b1efde53be364a73914f58805a001731-Paper-Conference.pd f.
  9. J. Wei, M. Bosma, V. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, Q. V. Le, Finetuned language models are zero-shot learners, in: International Conference on Learning Representations, 2022. URL: https://openreview.net/forum?id=gEZrGCozdqR.
  1. A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. Shi, S. Tsvyashchenko, J. Maynez, A. Rao, P. Barnes, Y. Tay, N. M. Shazeer, V. Prabhakaran, E. Reif, N. Du, B. C. Hutchinson, R. Pope, J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. Yin, T. Duke, A. Levskaya, S. Ghemawat, S. Dev, H. Michalewski, X. García, V. Misra, K. Robinson, L. Fedus, D. Zhou, D. Ippolito, D. Luan, H. Lim, B. Zoph, A. Spiridonov, R. Sepassi, D. Dohan, S. Agrawal, M. Omernick, A. M. Dai, T. S. Pillai, M. Pellat, A. Lewkowycz, E. Moreira, R. Child, O. Polozov, K. Lee, Z. Zhou, X. Wang, B. Saeta, M. Díaz, O. Firat, M. Catasta, J. Wei, K. S. Meier-Hellstern, D. Eck, J. Dean, S. Petrov, N. Fiedel, Palm: Scaling language modeling with pathways, ArXiv abs/2204.02311   (2022).   URL: https://api.semanticscholar.org/CorpusID:247951931.
  2. H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, G. Lamp

The integration of large language models (LLM) into many artificial intelligence applications shows the best performance in tasks such as text mining, typing, question answering. Despite his success, his LL.M. The biggest concern is the emergence of so-called "hallucinations", especially in text-based systems and Q&As that rely on LL M. These hearings may lead to the spread of misinformation or fraud. This article explains the basics of AI illusions and highlights their importance in AI. Work involves deploying visualizations to a variety of tasks, including machine translation, surveys, interviews, content writing, LLM maps, and visualization questions. Additionally, this article explores potential strategies to reduce negative perceptions in order to increase the overall credibility of the LL.M.

Keywords : LLMs, Hallucination, Artificial Intelligence, Hallucination Mitigation, Factualness.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe