Comparative Study on Accuracy of Responses by Select AI Tools: ChatGPT and Perplexity AI Visa Vee Human Responses


Authors : Salmon Oliech Owidi; Joanne Nabwire Lyanda; Eric W. Wangila

Volume/Issue : Volume 9 - 2024, Issue 11 - November


Google Scholar : https://tinyurl.com/mr2yep3h

Scribd : https://tinyurl.com/muvvkv6

DOI : https://doi.org/10.5281/zenodo.14274466


Abstract : This study explored questions whose solutions were provided by human experts, ChatGPT, and Perplexity AI. The responses were triangulated in discussions to identify oversights, alternative frames, and biases against human-generated insights. ChatGPT and Perplexity AI were selected due to their popularity, with ChatGPT gaining over 100 million users and Perplexity AI 87 million within a year. Educational specialists submitted questions across various fields, along with their responses, which were subsequently posed to the AI tools. These responses were coded and evaluated by twelve educational specialists and subject matter experts (N = 24) based on scientific accuracy, actionability, and comprehensibility. Descriptive statistics indicated that Human Experts achieved significantly higher mean scores in both Scientific Accuracy (M = 7.42, SD = 0.65) and Actionability (M = 7.25, SD = 0.77) compared to ChatGPT (M = 6.25, SD = 0.71; M = 5.42, SD = 0.99) and Perplexity AI (M = 4.33, SD = 0.79; M = 4.17, SD = 1.06). In terms of Comprehensibility, ChatGPT led with a mean score of 6.58 (SD = 0.99) compared to Human Experts (M = 7.08, SD = 1.24) and Perplexity AI (M = 5.43, SD = 0.55). Kruskal-Wallis tests revealed significant differences across all dimensions (p < 0.001 for Scientific Accuracy and Actionability; p = 0.015 for Comprehensibility). Post-hoc Dunn's tests confirmed that Human Experts outperformed both AI tools, while ChatGPT was significantly more comprehensible than Perplexity AI. These findings highlight the limitations of AI in delivering scientifically accurate and actionable insights due to factors like lack of emotional intelligence and common sense. The study recommends careful evaluation of AI integration in academic and research contexts to better understand their roles and limitations.

Keywords : Artificial Intelligence Tools, ChatGPT, Perplexity AI, Comparative Study.

References :

  1. Aayush P. (2024). Perplexity AI: Review, Advantages and Guide. Elegant Themes. Available online: https://www.elegantthemes.com/blog/business/perplexity-ai (accessed on 19th March 2024).
  2. Altman, S. (2023). Planning for AGI and Beyond. OpenAI Blog. Available online: https://openai.com/blog/planning-for-agi-andbeyond (accessed on 12 May 2024).
  3. Brynjolfsson, E. A. (2023). Call to Augment—Not Automate—Workers. In Generative AI: Perspectives from Stanford HAI. Stanford University, Human-Centered Artificial Intelligence, Palo Alto, CA, USA, pp. 16–17.
  4. Cao J., et al. (2023). Accuracy of information provided by ChatGPT regarding liver cancer surveillance and diagnosis. AJR Am J Roentgenol, 16:1–4. doi: 10.2214/ajr.23.29493.
  5. Coskun B, et al. (2023). Can ChatGPT, an artificial intelligence language model, provide accurate and high-quality patient information on prostate cancer? Urology (2023) 4
  6. (23)00570-8. doi: 10.1016/j.urology.2023.05.040.
  7. Donelan, M. (2024). Government Commits up to £3.5 Billion to Future of Tech and Science. UK Government News. Available online: https://www.gov.uk/government/news/government-commits-up-to-35-billion-to-future-of-tech-and-science (accessed on 12 February 2024).
  8. Ellis S. (2024). Is Perplexity AI Better than ChatGPT? A Comparison. Available online: https://em360tech.com/tech-article/perplexity-ai-vs-chatgpt (accessed on 19th March 2024).
  9. Fingerhut A., Winter D. (2023). Artificial intelligence and medical writing: where are we going? Br J Surg, znad169. doi: 10.1093/bjs/znad169.
  10. Open Letter (2023). Future of Life Institute Pause Giant AI Experiments: An Open Letter. Available online: https://futureoflife.org/openletter/pause-giant-ai-experiments (accessed on 15 May 2024).
  11. Haque M.U. et al. (2023). “I Think This Is the Most Disruptive Technology”: Exploring Sentiments of ChatGPT Early Adopters Using Twitter Data. arXiv 2022, arXiv:2212.05856.
  12. Haver HL, et al. (2023). Appropriateness of breast cancer prevention and screening recommendations provided by ChatGPT. Radiology, 307(4)
  13. . doi: 10.1148/radiol.230424.
  14. Hinton G., et al. (2023). Statement on AI Risk. Center for AI Safety. Available online: https://www.safe.ai/statement-on-ai-risk (accessed on 15 March 2024).
  15. Johnson SB, et al. (2023). Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information. JNCI Cancer Spectr, 7(2): pkad015. doi: 10.1093/jncics/pkad015.
  16. Kirk, D., et al. (2023). The Capabilities of Chat GPT’s at Answering Common Nutrition Questions. OSF, Charlottesville, VA, USA, 2023.
  17. Kleinman, Z., et al. (2024). AI Language Systems Are “Quite Stupid”. BBC News. Available online: https://www.bbc.com/news/technology-66238004 (accessed on 12 March 2024).
  18. Kleinman, Z., et al. (2023). Why Making AI Safe Isn’t as Easy as You Might Think. BBC News. Available online: https://www.bbc.com/news/technology-65850668 (accessed on 12 March 2024).
  19. Li, F.-F., et al. (2023). Generative AI: Perspectives from Stanford HAI. Stanford University, Human-Centred Artificial Intelligence, Palo Alto, CA, USA, 2023.
  20. Liang, P. The New Cambrian Era: “Scientific Excitement, Anxiety”. In Generative AI: Perspectives from Stanford HAI.
  21. Lund, B.D., et al. (2023). ChatGPT and a New Academic Reality: AI-Written Research Papers and the Ethics of the Large Language Models in Scholarly Publishing. J. Assoc. Inf. Sci. Technol., 74, 570–581.
  22. Madiega, T. (2021). Artificial Intelligence Act; European Parliament: Strasbourg, France, 2023. Available online: https://www.europarl.europa.eu/RegData/etudes/BRIE/2021/698792/EPRS_BRI698792_EN.pdf (accessed on 12 May 2023).
  23. Microsoft (2023). Will AI Fix Work? 2023 Work Trend Index: Annual Report. Available online: https://assets.ctfassets.net/WTI_Will_AI_Fix_Work_060723.pdf (accessed on 12 April 2024).
  24. Moazzam Z, et al. (2023). Quality of ChatGPT responses to questions related to pancreatic cancer and its surgical care. Ann Surg Oncol, 22. doi: 10.1245/s10434-023-13777-w.
  25. R Core Team (2022). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  26. Vallance, C. (2024). AI Could Replace Equivalent of 300 Million Jobs—Report. BBC News. Available online: https://www.bbc.com/news/technology-65102150 (accessed on 12 May 2024).
  27. White House FACT SHEET (2023). Biden-Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI. Available online: https://www.whitehouse.gov/briefing-room/statements-releases/2023/07/21/fact-sheet-biden-harris-administration-secures-voluntary-commitments-from-leading-artificial-intelligence-companies-to-manage-the-risks-posed-by-ai/ (accessed on 12 May 2024).
  28. Yeo Y., et al. (2023). Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol, 29(3):721–732. doi: 10.3350/cmh.2023.0089.
  29. Zhu L, et al. (2023). Can ChatGPT and other large language models with internet-connected databases solve the questions and concerns of patients with prostate cancer and help democratize medical knowledge? J Transl Med, 21(1):269. doi: 10.1186/s12967-023-04123-5.

This study explored questions whose solutions were provided by human experts, ChatGPT, and Perplexity AI. The responses were triangulated in discussions to identify oversights, alternative frames, and biases against human-generated insights. ChatGPT and Perplexity AI were selected due to their popularity, with ChatGPT gaining over 100 million users and Perplexity AI 87 million within a year. Educational specialists submitted questions across various fields, along with their responses, which were subsequently posed to the AI tools. These responses were coded and evaluated by twelve educational specialists and subject matter experts (N = 24) based on scientific accuracy, actionability, and comprehensibility. Descriptive statistics indicated that Human Experts achieved significantly higher mean scores in both Scientific Accuracy (M = 7.42, SD = 0.65) and Actionability (M = 7.25, SD = 0.77) compared to ChatGPT (M = 6.25, SD = 0.71; M = 5.42, SD = 0.99) and Perplexity AI (M = 4.33, SD = 0.79; M = 4.17, SD = 1.06). In terms of Comprehensibility, ChatGPT led with a mean score of 6.58 (SD = 0.99) compared to Human Experts (M = 7.08, SD = 1.24) and Perplexity AI (M = 5.43, SD = 0.55). Kruskal-Wallis tests revealed significant differences across all dimensions (p < 0.001 for Scientific Accuracy and Actionability; p = 0.015 for Comprehensibility). Post-hoc Dunn's tests confirmed that Human Experts outperformed both AI tools, while ChatGPT was significantly more comprehensible than Perplexity AI. These findings highlight the limitations of AI in delivering scientifically accurate and actionable insights due to factors like lack of emotional intelligence and common sense. The study recommends careful evaluation of AI integration in academic and research contexts to better understand their roles and limitations.

Keywords : Artificial Intelligence Tools, ChatGPT, Perplexity AI, Comparative Study.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe