Authors :
Kato Samuel Namuene; Ndinge Nadia Mbella
Volume/Issue :
Volume 11 - 2026, Issue 4 - April
Google Scholar :
https://tinyurl.com/ycw3bv5w
Scribd :
https://tinyurl.com/hppr4cya
DOI :
https://doi.org/10.38124/ijisrt/26apr077
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
This study presents a reproducible, AI-assisted framework for descriptive and univariate statistical analysis of
ecological count data, integrating vibe data analysis with conventional manual methods using snapping frequency
observations from 94 tree species in Korup National Park, Cameroon. Using Claude.ai to generate R statistical code
through structured prompt engineering, we systematically compare classical parametric approaches (t-test, Z-test) with
non-parametric alternatives (Wilcoxon signed-rank test, sign test, bootstrap confidence intervals) to determine the most
appropriate analytical framework for forestry count data across four stages; exploratory data analysis, normality
assessment, hypothesis testing, and outlier detection. Snapping frequency exhibited extreme positive skewness (5.087) and
leptokurtic distribution (kurtosis = 36.725), Protomegabaria stapfiana (8 snappings; z = 3.41). Comparison of vibe analysis and manual
analysis across 16 statistical outputs revealed complete numerical equivalence, with the AI demonstrating autonomous
assumption-aware method selection without explicit instruction. While vibe analysis completed all stages within a single
iterative session, mandatory validation through executed R code and analyst oversight remain essential. This framework
provides forestry researchers with accessible, validated tools for rigorous, reproducible statistical analysis of non-normal
count data.
Keywords :
Descriptive Statistics; Univariate Analysis; Parametric Tests; Non-Parametric Tests; Normality Testing; Artificial Intelligence; R Programming; Data Visualization; Ecological Data Analysis.
References :
- Alkaissi, H., & McFarlane, S. I. (2023). Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus, 15(2), e35179. https://doi.org/10.7759/cureus.35179
- Anderson, T. W., & Darling, D. A. (1954). A test of goodness of fit. Journal of the American Statistical Association, 49(268), 765–769. https://doi.org/10.2307/2281537
- Antoch, J., Jureckova, J., Maciak, M., & Pešta, M. (2017). Analytical methods in statistics: AMISTAT, Prague, November 2015 (1st ed.). Springer Proceedings in Mathematics & Statistics, Vol. 193, 216 pp. Springer. https://doi.org/10.1007/978-3-319-51313-3
- Anthropic. (2026). Claude (3.5 Sonnet version) [Large language model]. Available at: https://claude.ai/
- Barke, S., James, M. B., & Polikarpova, N. (2023). Grounded Copilot: How programmers interact with code-generating models. Proceedings of the ACM on Programming Languages, 7, 85-111. https://doi.org/10.1145/3586030
- Baumer, B., Cetinkaya-Rundel, M., Bray, A., Loi, L., & Horton, N. J. (2014). R Markdown: Integrating a reproducible analysis tool into introductory statistics. Technology Innovations in Statistics Education, 8(1), 1-29. https://doi.org/10.5070/T581020118
- Beam, A. L., & Kohane, I. S. (2018). Big data and machine learning in health care. JAMA, 319(13), 1317-1318. https://doi.org/10.1001/jama.2017.18391
- Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: can language models be too big? FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623. https://doi.org/10.1145/3442188.3445922
- Blei, D. M., & Smyth, P. (2017). Science and data science. Proceedings of the National Academy of Sciences of the United States of America, 114(33), 8689–8692. https://doi.org/10.1073/pnas.1702076114
- Bolker, B. M., Brooks, M. E., Clark, C. J., Geange, S. W., Poulsen, J. R., Stevens, M. H. H., & White, J. S. S. (2009). Generalized linear mixed models: A practical guide for ecology and evolution. Trends in Ecology & Evolution, 24(3), 127-135. https://doi.org/10.1016/j.tree.2008.10.008
- Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., ... & Liang, P. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258. https://doi.org/10.48550/arXiv.2108.07258
- Bonnini, S., Assegie, G. M., & Trzcinska, K. (2024). Review about the permutation approach in hypothesis testing. Mathematics, 12(17), 2617. https://doi.org/10.3390/math12172617
- Borcard, D., Gillet, F., & Legendre, P. (2018). Numerical ecology with R (2nd ed.). pp435 Springer. https://doi.org/10.1007/978-3-319-71404-2
- Breiman, L. (2001). Statistical modeling: The two cultures. Statistical Science, 16(3), 199–215. https://doi.org/10.1214/ss/1009213726
- Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877-1901. https://doi.org/10.48550/arXiv.2005.14165
- Çetinkaya-Rundel, M., & Rundel, C. (2018). Infrastructure and tools for teaching computing throughout the statistical curriculum. The American Statistician, 72(1), 58–65. https://doi.org/10.1080/00031305.2017.1397549
- Chaoubi, F., & Djalab, M. S. (2025). Statistical methods and appropriate selection criteria. Pakistan Journal of Life and Social Sciences, 23(1), 1624–1631. https://doi.org/10.57239/PJLSS-2025-23.1.00125
- Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. D. O., Kaplan, J., ... & Zaremba, W. (2021). Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374. https://doi.org/10.48550/arXiv.2107.03374
- Conover, W. J., & Iman, R. L. (1981). Rank transformations as a bridge between parametric and nonparametric statistics. The American Statistician, 35(3), 124-129. https://doi.org/10.2307/2683975
- Cox, N. J. (2007). The grammar of graphics. Journal of Statistical Software, 17(Book Review 3), 1–7. https://doi.org/10.18637/jss.v017.b03
- Crawley, M. J. (2013). The R Book (2nd ed., 1051 pp.). Chichester, UK: John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118448908
- D’Agostino, R., & Pearson, E. S. (1973). Tests for departure from normality. Empirical results for the distributions of b2 and √b1. Biometrika, 60(3), 613–622. https://doi.org/10.2307/2335012
- Davis, A. J., & Kay, S. (2023). Writing statistical methods for ecologists. Ecosphere, 14, e4539. https://doi.org/10.1002/ecs2.4539
- de Mendiburu, F. (2023). agricolae: Statistical procedures for agricultural research (Version 1.3-7) [Computer software]. Comprehensive R Archive Network. https://doi.org/10.32614/CRAN.package.agricolae
- Denny, P., Kumar, V., & Giacaman, N. (2024). Conversing with Copilot: Exploring prompt engineering for solving CS1 problems using natural language. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education(Vol. 1, pp. 427-433). https://doi.org/10.1145/3626252.3630793
- Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V., DePristo, M., Chou, K., Cui, C., Corrado, G., Thrun, S., & Dean, J. (2019). A guide to deep learning in healthcare. Nature Medicine, 25, 24–29. https://doi.org/10.1038/s41591-018-0316-z
- Fagerland, M. W., & Sandvik, L. (2009). Performance of five two-sample location tests for skewed distributions with unequal variances. Contemporary Clinical Trials, 30(5), 490-496. https://doi.org/10.1016/j.cct.2009.06.007
- Field, A. (2017). Discovering statistiIBM SPSS statistics (5th ed.). SAGE Publications. https://hdl.handle.net/10779/uos.23460641.v1
- Fox, J., & Weisberg, S. (2019). An R companion to applied regression (3rd ed.). Sage Publishing Inc. pp608
- Gandrud, C. (2018). Reproducible research with R and RStudio (2nd ed., 323 pp.). Chapman & Hall/CRC. https://doi.org/10.1201/9781315382548
- Gelman, A., & Loken, E. (2019). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no "fishing expedition" or "p-hacking" and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University, 348, 1-17.
- Gomes, D. G., Pottier, P., Crystal-Ornelas, R., Hudgins, E. J., Foroughirad, V., Sánchez-Reyes, L. L., ... & Gaynor, K. M. (2022). Why don't we share data and code? Perceived barriers and benefits to public archiving practices. Proceedings of the Royal Society B, 289(1987), 20221113. https://doi.org/10.1098/rspb.2022.1113
- Gross, J., & Ligges, U. (2015). nortest: Tests for normality (Version 1.0-4) [Computer software]. Comprehensive R Archive Network. https://doi.org/10.32614/CRAN.package.nortest
- Hampton, S. E., Anderson, S. S., Bagby, S. C., Gries, C., Han, X., Hart, E. M., Jones, M. B., Lenhardt, W. C., MacDonald, A., et al. (2015). The Tao of open science for ecology. Ecosphere, 6(7), 1–13. https://doi.org/10.1890/ES14-00402.1
- Hampton, S. E., Jones, M. B., Wasser, L. A., Schildhauer, M. P., Supp, S. R., Brun, J., Hernandez, R. R., Boettiger, C., Collins, S. L., Gross, L. J., Fernández, D. S., Budden, A., White, E. P., Teal, T. K., Labou, S. G., & Aukema, J. E. (2017). Skills and knowledge for data-intensive environmental research. BioScience, 67(6), 546–557. https://doi.org/10.1093/biosci/bix025
- Hampton, S. E., Strasser, C. A., Tewksbury, J. J., Gram, W. K., Budden, A. E., Batcheller, A. L., Duke, C. S. & Porter, J. H. (2013). Big data and the future of ecology. Frontiers in Ecology and the Environment, 11(3), 156-162. https://doi.org/10.1890/120103
- Harrison, X. A., Donaldson, L., Correa-Cano, M. E., Evans, J., Fisher, D. N., Goodwin, C. E., ... & Inger, R. (2018). A brief introduction to mixed effects modelling and multi-model inference in ecology. PeerJ, 6, e4794. https://doi.org/10.7717/peerj.4794
- Head, M. L., Holman, L., Lanfear, R., & Kahn, A. T. (2015). The extent and consequences of p-hacking in science. PLOS Biology, 13(3), e1002106. https://doi.org/10.1371/journal.pbio.1002106
- Hollander, M., Wolfe, D. A., & Chicken, E. (2015). Nonparametric Statistical Methods (3rd ed., 848 pp.). Hoboken, NJ: John Wiley & Sons. https://doi.org/10.1002/9781119196037.
- Husch, B., Beers, T. W., & Kershaw, J. A. (2003). Forest mensuration (4th ed.). John Wiley & Sons.
- Ihaka, R., & Gentleman, R. (1996). R: A Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics, 5(3), 299–314. https://doi.org/10.1080/10618600.1996.10474713
- Jupyter, P., Bussonnier, M., Forde, J., Freeman, J., Granger, B., Head, T., Holdgraf, C., Kelley, K., Nalvarte, G., Osheroff, A., Pacer, M., Panda, Y., Perez, F., Ragan-Kelley, B., & Willing, C. (2018). Binder 2.0: Reproducible, interactive, sharable environments for science at scale. In Proceedings of the 17th Python in Science Conference. SciPy. (pp. 113-120). https://doi.org/10.25080/Majora-4af1f417-011
- Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., ... & Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274
- Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data & Society, 1(1) https://doi.org/10.1177/2053951714528481
- Kolmogorov, A. (1933) Sulla determinazione empirica di una legge di distribuzione. Giornale dell’Istituto Italiano degliAttuari, 4, 83-91.
- Lai, J., Lortie, C. J., Muenchen, R. A., Yang, J., & Ma, K. (2019). Evaluating the popularity of R in ecology. Ecosphere, 10(1), e02567. https://doi.org/10.1002/ecs2.2567
- Liao, Q. V., & Vaughan, J. W. (2023). AI transparency in the age of LLMs: A human-centered research roadmap. arXiv preprint arXiv:2306.01941. https://doi.org/10.48550/arXiv.2306.01941
- Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1-35. https://doi.org/10.1145/3560815
- Logan, M. (2012). Review of Biostatistical design and analysis using R: A practical guide. Ethology Ecology & Evolution, 24(2). https://doi.org/10.1080/03949370.2011.618191
- Lowndes, J., Best, B., Scarborough, C., Afflerbach, J., Brown, J., Cheng, C., Franklin, M., Goldsborough, D., M, M. E., Nelson, G., Pitz, K., Rocha, A., Sauer, J., & Strand, M. (2017). Our path to better science in less time using open data science tools. Nature Ecology & Evolution, 1, 0160. https://doi.org/10.1038/s41559-017-0160
- MacNeil, S., Tran, A., Hellas, A., Kim, J., Sarsa, S., Denny, P., Bernstein, S., Leinonen, J & Bernstein, S. (2023). Experiences from using code explanations generated by large language models in a web software development e-book. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education (Vol. 1, pp. 931-937). https://doi.org/10.1145/3545945.3569785
- Mandasini, A. A. (2022). Improving the validity of scientific research results through research methodology management. Kontigensi: Jurnal Ilmiah Manajemen, 10(2), 439–447. https://doi.org/10.56457/jimk.v10i2.389
- Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, 18(1), 50–60. https://www.jstor.org/stable/2236101
- McDonald, J. H. (2009). Handbook of Biological Statistics (2nd ed., 313 pp.). Baltimore, MD: Sparky House Publishing. https://e-ilami.unissa.edu.bn:8443/handle/20.500.14275/6724
- Miller, G., & Spiegel, E. (2025). Guidelines for Research Data Integrity (GRDI). Scientific Data, 12, 95. https://doi.org/10.1038/s41597-024-04312-x
- Mollick, E. R., & Mollick, L. (2023). Using AI to implement effective teaching strategies in classrooms: Five strategies, including prompts. Wharton School Research Paper. 26 pp. https://doi.org/10.2139/ssrn.4391243
- Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., Buck, S., Chambers, C. D., Chin, G., … Yarkoni, T. (2015). Promoting an open research culture: Author guidelines for journals could help to promote transparency, openness, and reproducibility. Science, 348(6242), 1422–1425. https://doi.org/10.1126/science.aab2374
- Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600-2606. https://doi.org/10.1073/pnas.1708274114
- O'Hara, R. B., & Kotze, D. J. (2010). Do not log-transform count data. Methods in Ecology and Evolution, 1(2), 118-122. https://doi.org/10.1111/j.2041-210X.2010.00021.x
- O'Neil, C. (2016). Cathy O’Neil. Weapons of math destruction: How big data increases inequality and threatens democracy. New York: Crown Publishers, 2016. 272p. Hardcover, $26 (ISBN 978-0553418811). College & Research Libraries, 78(3), 403. https://doi.org/10.5860/crl.78.3.403
- Oksanen, J., Simpson, G. L., Blanchet, F. G., Kindt, R., Legendre, P., Minchin, P. R., O’Hara, R. B., Solymos, P., Stevens, M. H. H., Szoecs, E., Wagner, H., Barbour, M., Bedward, M., Bolker, B., Borcard, D., Borman, T., Carvalho, G., Chirico, M., De Caceres, M., … Weedon, J. (2025). vegan: Community ecology package (Version 2.7-2) [Computer software]. Comprehensive R Archive Network. https://doi.org/10.32614/CRAN.package.vegan
- OpenAI. (2023). GPT-4 technical report. arXiv preprint arXiv:2303.08774. https://doi.org/10.48550/arXiv.2303.08774
- Pearl, J., & Mackenzie, D. (2018). The book of why: The new science of cause and effect. Journal of MultiDisciplinary Evaluation, 14(31), 47–54. https://doi.org/10.56645/jmde.v14i31.507
- Peng, R. D. (2011). Reproducible research in computational science. Science, 334(6060), 1226-1227. https://doi.org/10.1126/science.1213847
- Peng, R. D., & Matsui, E. (2015). The art of data science. Leanpub. Available at: https://www.academia.edu/40368378/The_Art_of_Data_Science_A_Guide_for_Anyone_Who_Works_with_Data
- Poisot, T., Bruneau, A., Gonzalez, A., Gravel, D., & Peres-Neto, P. (2019). Ecological data should not be so hard to find and reuse. Trends in Ecology & Evolution, 34(6), 494-496. https://doi.org/10.1016/j.tree.2019.04.005
- Potvin, C., & Roff, D. A. (1993). Distribution-free and robust statistical methods: Viable alternatives to parametric statistics? Ecology, 74(6), 1617-1628. https://doi.org/10.2307/1939920
- Powers, S. M., & Hampton, S. E. (2019). Open science, reproducibility, and transparency in ecology. Ecological Applications, 29(1), e01822. https://doi.org/10.1002/eap.1822
- R Core Team. (2026). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
- Recknagel, F. (2001). Applications of machine learning to ecological modelling. Ecological Modelling, 146(1-3), 303-310. https://doi.org/10.1016/S0304-3800(01)00316-7
- Rane, N. (2023). ChatGPT and similar generative artificial intelligence (AI) for smart industry: Role, challenges and opportunities for Industry 4.0, Industry 5.0 and Society 5.0. SSRN Electronic Journal. 8 pp. https://doi.org/10.2139/ssrn.4603234
- Reynolds, L., & McDonell, K. (2021). Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (pp. 1-7). https://doi.org/10.1145/3411763.3451760
- Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1, 206-215. https://doi.org/10.1038/s42256-019-0048-x
- Sandve, G. K., Nekrutenko, A., Taylor, J., & Hovig, E. (2013). Ten simple rules for reproducible computational research. PLoS Computational Biology, 9(10), e1003285. https://doi.org/10.1371/journal.pcbi.1003285
- Sarker, I. H. (2022). AI-based modeling: Techniques, applications and research issues towards automation, intelligent and smart systems. SN Computer Science, 3, 158. https://doi.org/10.1007/s42979-022-01043-x
- Selbst, A. D., Boyd, D., Friedler, S. A., Venkatasubramanian, S., & Vertesi, J. (2019). Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency (pp. 59-68). https://doi.org/10.1145/3287560.3287598
- Senior, A. W., Evans, R., Jumper, J., Kirkpatrick, J., Sifre, L., Green, T., Qin, C., Žídek, A., Nelson, A. W. R., Bridgland, A., Penedones, H., Petersen, S., Simonyan, K., Crossan, S., Kohli, P., Jones, D. T., Silver, D., Kavukcuoglu, K., & Hassabis, D. (2020). Improved protein structure prediction using potentials from deep learning. Nature, 577, 706–710. https://doi.org/10.1038/s41586-019-1923-7
- Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples). Biometrika, 52(3–4), 591–611. https://doi.org/10.1093/biomet/52.3-4.591
- Shukla, R. M., Pandey, Y., Faisal, S., & Muzamil, M. (2025). Parametric and non-parametric analysis. In Advances in Agricultural Research Methodology (Vol. 2, pp. 260–272). S P Publishing.
- Singla, A. (2023). Evaluating ChatGPT and GPT-4 for visual programming. In ICER ’23: Proceedings of the 2023 ACM Conference on International Computing Education Research – Volume 2 (pp. 14–15). Association for Computing Machinery. https://doi.org/10.1145/3568812.3603474
- Stoyanovich, J., Van Bavel, J. J., & West, T. V. (2020). The imperative of interpretable machines. Nature Machine Intelligence, 2(4), 197-199. https://doi.org/10.1038/s42256-020-0171-8
- Sokal, R. R., & Rohlf, F. J. (2012). Biometry: The principles and practice of statistics in biological research (4th ed.). W. H. Freeman.
- Student. (1908). The probable error of a mean. Biometrika, 6(1), 1-25. https://doi.org/10.1093/biomet/6.1.1
- Thessen, A. E. (2016). Adoption of machine learning techniques in ecology and earth science. One Ecosystem, 1, e8621. https://doi.org/10.3897/oneeco.1.e8621
- Thomas, D. W., Kenfack, D., Chuyong, G. B., Moses, S. N., Losos, E. C., Condit, R. S. and Songwe, N. C. (2003). Tree species of the South Western Cameroon: Tree Distribution Maps, Diameter Tables, and Species documentation of the 50- hectare Korup Forest Dynamics plot. Center for Tropical Forest Science of the Smithsonian Tropical Research Institute and Bioresources Development and Conservation Program, Washington DC. 247pp
- Tippmann, S. (2015). Programming tools: Adventures with R. Nature, 517, 109-110. https://doi.org/10.1038/517109a
- Topol, E. J. (2019). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25, 44-56. https://doi.org/10.1038/s41591-018-0300-7
- Touchon, J. C., & McCoy, M. W. (2016). The mismatch between current statistical practice and doctoral training in ecology. Ecosphere, 7(8), e01394. https://doi.org/10.1002/ecs2.1394
- Tukey, J. W. (1977). Exploratory data analysis (Vol. 2). Addison-Wesley. 688pp
- Tukey, J. W. (2008). Exploratory data analysis. In N. Salkind (Ed.), The concise encyclopedia of statistics (pp. 192–194). Springer. https://doi.org/10.1007/978-0-387-32833-1_136
- Underwood, A. J. (1997). Experiments in ecology: Their logical design and interpretation using analysis of variance. Cambridge University Press. Pp504
- Vaithilingam, P., Zhang, T., & Glassman, E. L. (2022). Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In CHI Conference on Human Factors in Computing Systems Extended Abstracts (pp. 1-7). https://doi.org/10.1145/3491101.3519665
- van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. https://doi.org/10.18637/jss.v045.i03
- Wang, D., Weisz, J. D., Muller, M., Ram, P., Geyer, W., Dugan, C., ... & Gray, A. (2021). Human-AI collaboration in data science: Exploring data scientists' perceptions of automated AI. Proceedings of the ACM on Human-Computer Interaction, 3, 1-24. https://doi.org/10.1145/3359313
- Warton, D. I., & Hui, F. K. (2011). The arcsine is asinine: The analysis of proportions in ecology. Ecology, 92(1), 3-10. https://doi.org/10.1890/10-0340.1
- Warton, D. I., Stoklosa, J., Guillera-Arroita, G., MacKenzie, D. I., & Welsh, A. H. (2017). Graphical diagnostics for occupancy models with imperfect detection. Methods in Ecology and Evolution, 8(4), 408-419. https://doi.org/10.1111/2041-210X.12761
- White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., ... & Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv preprint arXiv:2302.11382. https://doi.org/10.48550/arXiv.2302.11382
- Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10), 1-23. https://doi.org/10.18637/jss.v059.i10
- Wickham, H., Averick, M., Bryan, J., Chang, W., D’Agostino McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., Takahashi, K., Vaughan, D., Wilke, C., Woo, K., & Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
- Wickham, H., Cook, D., & Hofmann, H. (2015). Visualizing statistical models: Removing the blindfold. Statistical Analysis and Data Mining, 8(4), 203-225. https://doi.org/10.1002/sam.11271
- Wickham, H., & Grolemund, G. (2016). R for data science: Import, tidy, transform, visualize, and model data (2nd ed., 520 pp.). O’Reilly Media.
- Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80-83.
- Xu, F., Uszkoreit, H., Du, Y., Fan, W., Zhao, D., & Zhu, J. (2021). Explainable AI: A brief survey on history, research areas, approaches and challenges. In Natural Language Processing and Chinese Computing (pp. 563-574).
- Zar, J.H. (2010) Biostatistical Analysis. 5th Edition, Prentice-Hall/Pearson, Upper Saddle River, xiii, 944 p. Springer. https://doi.org/10.1007/978-3-030-32236-6_51
- Zamfirescu-Pereira, J. D., Wong, R. Y., Hartmann, B., & Yang, Q. (2023). Why Johnny can't prompt: How non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (pp. 1-21). https://doi.org/10.1145/3544548.3581388
- Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2023). Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910. https://doi.org/10.48550/arXiv.2211.01910
- Zuur, A. F., Ieno, E. N., Walker, N., Saveliev, A. A., & Smith, G. M. (2009). Mixed effects models and extensions in ecology with R. New York, USA: Springer. https://doi.org/10.1007/978-0-387-87458-6r
This study presents a reproducible, AI-assisted framework for descriptive and univariate statistical analysis of
ecological count data, integrating vibe data analysis with conventional manual methods using snapping frequency
observations from 94 tree species in Korup National Park, Cameroon. Using Claude.ai to generate R statistical code
through structured prompt engineering, we systematically compare classical parametric approaches (t-test, Z-test) with
non-parametric alternatives (Wilcoxon signed-rank test, sign test, bootstrap confidence intervals) to determine the most
appropriate analytical framework for forestry count data across four stages; exploratory data analysis, normality
assessment, hypothesis testing, and outlier detection. Snapping frequency exhibited extreme positive skewness (5.087) and
leptokurtic distribution (kurtosis = 36.725), Protomegabaria stapfiana (8 snappings; z = 3.41). Comparison of vibe analysis and manual
analysis across 16 statistical outputs revealed complete numerical equivalence, with the AI demonstrating autonomous
assumption-aware method selection without explicit instruction. While vibe analysis completed all stages within a single
iterative session, mandatory validation through executed R code and analyst oversight remain essential. This framework
provides forestry researchers with accessible, validated tools for rigorous, reproducible statistical analysis of non-normal
count data.
Keywords :
Descriptive Statistics; Univariate Analysis; Parametric Tests; Non-Parametric Tests; Normality Testing; Artificial Intelligence; R Programming; Data Visualization; Ecological Data Analysis.