⚠ Official Notice: www.ijisrt.com is the official website of the International Journal of Innovative Science and Research Technology (IJISRT) Journal for research paper submission and publication. Please beware of fake or duplicate websites using the IJISRT name.



Descriptive and Univariate Vibe Analysis of Forestry Data with AI and R Statistics


Authors : Kato Samuel Namuene; Ndinge Nadia Mbella

Volume/Issue : Volume 11 - 2026, Issue 4 - April


Google Scholar : https://tinyurl.com/ycw3bv5w

Scribd : https://tinyurl.com/hppr4cya

DOI : https://doi.org/10.38124/ijisrt/26apr077

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : This study presents a reproducible, AI-assisted framework for descriptive and univariate statistical analysis of ecological count data, integrating vibe data analysis with conventional manual methods using snapping frequency observations from 94 tree species in Korup National Park, Cameroon. Using Claude.ai to generate R statistical code through structured prompt engineering, we systematically compare classical parametric approaches (t-test, Z-test) with non-parametric alternatives (Wilcoxon signed-rank test, sign test, bootstrap confidence intervals) to determine the most appropriate analytical framework for forestry count data across four stages; exploratory data analysis, normality assessment, hypothesis testing, and outlier detection. Snapping frequency exhibited extreme positive skewness (5.087) and leptokurtic distribution (kurtosis = 36.725), Protomegabaria stapfiana (8 snappings; z = 3.41). Comparison of vibe analysis and manual analysis across 16 statistical outputs revealed complete numerical equivalence, with the AI demonstrating autonomous assumption-aware method selection without explicit instruction. While vibe analysis completed all stages within a single iterative session, mandatory validation through executed R code and analyst oversight remain essential. This framework provides forestry researchers with accessible, validated tools for rigorous, reproducible statistical analysis of non-normal count data.

Keywords : Descriptive Statistics; Univariate Analysis; Parametric Tests; Non-Parametric Tests; Normality Testing; Artificial Intelligence; R Programming; Data Visualization; Ecological Data Analysis.

References :

  1. Alkaissi, H., & McFarlane, S. I. (2023). Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus, 15(2), e35179. https://doi.org/10.7759/cureus.35179
  2. Anderson, T. W., & Darling, D. A. (1954). A test of goodness of fit. Journal of the American Statistical Association, 49(268), 765–769. https://doi.org/10.2307/2281537
  3. Antoch, J., Jureckova, J., Maciak, M., & Pešta, M. (2017). Analytical methods in statistics: AMISTAT, Prague, November 2015 (1st ed.). Springer Proceedings in Mathematics & Statistics, Vol. 193, 216 pp. Springer. https://doi.org/10.1007/978-3-319-51313-3
  4. Anthropic. (2026). Claude (3.5 Sonnet version) [Large language model]. Available at: https://claude.ai/
  5. Barke, S., James, M. B., & Polikarpova, N. (2023). Grounded Copilot: How programmers interact with code-generating models. Proceedings of the ACM on Programming Languages, 7, 85-111. https://doi.org/10.1145/3586030
  6. Baumer, B., Cetinkaya-Rundel, M., Bray, A., Loi, L., & Horton, N. J. (2014). R Markdown: Integrating a reproducible analysis tool into introductory statistics. Technology Innovations in Statistics Education, 8(1), 1-29. https://doi.org/10.5070/T581020118
  7. Beam, A. L., & Kohane, I. S. (2018). Big data and machine learning in health care. JAMA, 319(13), 1317-1318. https://doi.org/10.1001/jama.2017.18391
  8. Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: can language models be too big? FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623. https://doi.org/10.1145/3442188.3445922
  9. Blei, D. M., & Smyth, P. (2017). Science and data science. Proceedings of the National Academy of Sciences of the United States of America114(33), 8689–8692. https://doi.org/10.1073/pnas.1702076114
  10. Bolker, B. M., Brooks, M. E., Clark, C. J., Geange, S. W., Poulsen, J. R., Stevens, M. H. H., & White, J. S. S. (2009). Generalized linear mixed models: A practical guide for ecology and evolution. Trends in Ecology & Evolution, 24(3), 127-135. https://doi.org/10.1016/j.tree.2008.10.008
  11. Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., ... & Liang, P. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258. https://doi.org/10.48550/arXiv.2108.07258
  12. Bonnini, S., Assegie, G. M., & Trzcinska, K. (2024). Review about the permutation approach in hypothesis testing. Mathematics, 12(17), 2617. https://doi.org/10.3390/math12172617
  13. Borcard, D., Gillet, F., & Legendre, P. (2018). Numerical ecology with R (2nd ed.). pp435 Springer. https://doi.org/10.1007/978-3-319-71404-2
  14. Breiman, L. (2001). Statistical modeling: The two cultures. Statistical Science, 16(3), 199–215. https://doi.org/10.1214/ss/1009213726
  15. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877-1901. https://doi.org/10.48550/arXiv.2005.14165
  16. Çetinkaya-Rundel, M., & Rundel, C. (2018). Infrastructure and tools for teaching computing throughout the statistical curriculum. The American Statistician72(1), 58–65. https://doi.org/10.1080/00031305.2017.1397549
  17. Chaoubi, F., & Djalab, M. S. (2025). Statistical methods and appropriate selection criteria. Pakistan Journal of Life and Social Sciences, 23(1), 1624–1631. https://doi.org/10.57239/PJLSS-2025-23.1.00125
  18. Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. D. O., Kaplan, J., ... & Zaremba, W. (2021). Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374https://doi.org/10.48550/arXiv.2107.03374
  19. Conover, W. J., & Iman, R. L. (1981). Rank transformations as a bridge between parametric and nonparametric statistics. The American Statistician, 35(3), 124-129. https://doi.org/10.2307/2683975
  20. Cox, N. J. (2007). The grammar of graphics. Journal of Statistical Software, 17(Book Review 3), 1–7. https://doi.org/10.18637/jss.v017.b03
  21. Crawley, M. J. (2013). The R Book (2nd ed., 1051 pp.). Chichester, UK: John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118448908
  22. D’Agostino, R., & Pearson, E. S. (1973). Tests for departure from normality. Empirical results for the distributions of b2 and √b1. Biometrika, 60(3), 613–622. https://doi.org/10.2307/2335012
  23. Davis, A. J., & Kay, S. (2023). Writing statistical methods for ecologists. Ecosphere, 14, e4539. https://doi.org/10.1002/ecs2.4539
  24. de Mendiburu, F. (2023). agricolae: Statistical procedures for agricultural research (Version 1.3-7) [Computer software]. Comprehensive R Archive Network. https://doi.org/10.32614/CRAN.package.agricolae
  25. Denny, P., Kumar, V., & Giacaman, N. (2024). Conversing with Copilot: Exploring prompt engineering for solving CS1 problems using natural language. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education(Vol. 1, pp. 427-433). https://doi.org/10.1145/3626252.3630793
  26. Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V., DePristo, M., Chou, K., Cui, C., Corrado, G., Thrun, S., & Dean, J. (2019). A guide to deep learning in healthcare. Nature Medicine, 25, 24–29. https://doi.org/10.1038/s41591-018-0316-z
  27. Fagerland, M. W., & Sandvik, L. (2009). Performance of five two-sample location tests for skewed distributions with unequal variances. Contemporary Clinical Trials, 30(5), 490-496. https://doi.org/10.1016/j.cct.2009.06.007
  28. Field, A. (2017). Discovering statistiIBM SPSS statistics (5th ed.). SAGE Publications. https://hdl.handle.net/10779/uos.23460641.v1
  29. Fox, J., & Weisberg, S. (2019). An R companion to applied regression (3rd ed.). Sage Publishing Inc. pp608
  30. Gandrud, C. (2018). Reproducible research with R and RStudio (2nd ed., 323 pp.). Chapman & Hall/CRC. https://doi.org/10.1201/9781315382548
  31. Gelman, A., & Loken, E. (2019). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no "fishing expedition" or "p-hacking" and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University, 348, 1-17.
  32. Gomes, D. G., Pottier, P., Crystal-Ornelas, R., Hudgins, E. J., Foroughirad, V., Sánchez-Reyes, L. L., ... & Gaynor, K. M. (2022). Why don't we share data and code? Perceived barriers and benefits to public archiving practices. Proceedings of the Royal Society B, 289(1987), 20221113. https://doi.org/10.1098/rspb.2022.1113
  33. Gross, J., & Ligges, U. (2015). nortest: Tests for normality (Version 1.0-4) [Computer software]. Comprehensive R Archive Network. https://doi.org/10.32614/CRAN.package.nortest
  34. Hampton, S. E., Anderson, S. S., Bagby, S. C., Gries, C., Han, X., Hart, E. M., Jones, M. B., Lenhardt, W. C., MacDonald, A., et al. (2015). The Tao of open science for ecology. Ecosphere, 6(7), 1–13. https://doi.org/10.1890/ES14-00402.1
  35. Hampton, S. E., Jones, M. B., Wasser, L. A., Schildhauer, M. P., Supp, S. R., Brun, J., Hernandez, R. R., Boettiger, C., Collins, S. L., Gross, L. J., Fernández, D. S., Budden, A., White, E. P., Teal, T. K., Labou, S. G., & Aukema, J. E. (2017). Skills and knowledge for data-intensive environmental researchBioScience, 67(6), 546–557. https://doi.org/10.1093/biosci/bix025
  36. Hampton, S. E., Strasser, C. A., Tewksbury, J. J., Gram, W. K., Budden, A. E., Batcheller, A. L., Duke, C. S. & Porter, J. H. (2013). Big data and the future of ecology. Frontiers in Ecology and the Environment, 11(3), 156-162. https://doi.org/10.1890/120103
  37. Harrison, X. A., Donaldson, L., Correa-Cano, M. E., Evans, J., Fisher, D. N., Goodwin, C. E., ... & Inger, R. (2018). A brief introduction to mixed effects modelling and multi-model inference in ecology. PeerJ, 6, e4794. https://doi.org/10.7717/peerj.4794
  38. Head, M. L., Holman, L., Lanfear, R., & Kahn, A. T. (2015). The extent and consequences of p-hacking in science. PLOS Biology13(3), e1002106. https://doi.org/10.1371/journal.pbio.1002106
  39. Hollander, M., Wolfe, D. A., & Chicken, E. (2015). Nonparametric Statistical Methods (3rd ed., 848 pp.). Hoboken, NJ: John Wiley & Sons. https://doi.org/10.1002/9781119196037.
  40. Husch, B., Beers, T. W., & Kershaw, J. A. (2003). Forest mensuration (4th ed.). John Wiley & Sons.
  41. Ihaka, R., & Gentleman, R. (1996). R: A Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics, 5(3), 299–314. https://doi.org/10.1080/10618600.1996.10474713
  42. Jupyter, P., Bussonnier, M., Forde, J., Freeman, J., Granger, B., Head, T., Holdgraf, C., Kelley, K., Nalvarte, G., Osheroff, A., Pacer, M., Panda, Y., Perez, F., Ragan-Kelley, B., & Willing, C. (2018). Binder 2.0: Reproducible, interactive, sharable environments for science at scale. In Proceedings of the 17th Python in Science Conference. SciPy. (pp. 113-120). https://doi.org/10.25080/Majora-4af1f417-011
  43. Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., ... & Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274
  44. Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data & Society, 1(1) https://doi.org/10.1177/2053951714528481
  45. Kolmogorov, A. (1933) Sulla determinazione empirica di una legge di distribuzione. Giornale dell’Istituto Italiano degliAttuari, 4, 83-91.
  46. Lai, J., Lortie, C. J., Muenchen, R. A., Yang, J., & Ma, K. (2019). Evaluating the popularity of R in ecology. Ecosphere, 10(1), e02567. https://doi.org/10.1002/ecs2.2567
  47. Liao, Q. V., & Vaughan, J. W. (2023). AI transparency in the age of LLMs: A human-centered research roadmap. arXiv preprint arXiv:2306.01941https://doi.org/10.48550/arXiv.2306.01941
  48. Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1-35. https://doi.org/10.1145/3560815
  49. Logan, M. (2012). Review of Biostatistical design and analysis using R: A practical guideEthology Ecology & Evolution, 24(2). https://doi.org/10.1080/03949370.2011.618191
  50. Lowndes, J., Best, B., Scarborough, C., Afflerbach, J., Brown, J., Cheng, C., Franklin, M., Goldsborough, D., M, M. E., Nelson, G., Pitz, K., Rocha, A., Sauer, J., & Strand, M. (2017). Our path to better science in less time using open data science tools. Nature Ecology & Evolution, 1, 0160. https://doi.org/10.1038/s41559-017-0160
  51. MacNeil, S., Tran, A., Hellas, A., Kim, J., Sarsa, S., Denny, P., Bernstein, S., Leinonen, J & Bernstein, S. (2023). Experiences from using code explanations generated by large language models in a web software development e-book. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education (Vol. 1, pp. 931-937). https://doi.org/10.1145/3545945.3569785
  52. Mandasini, A. A. (2022). Improving the validity of scientific research results through research methodology management. Kontigensi: Jurnal Ilmiah Manajemen, 10(2), 439–447. https://doi.org/10.56457/jimk.v10i2.389
  53. Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, 18(1), 50–60. https://www.jstor.org/stable/2236101
  54. McDonald, J. H. (2009). Handbook of Biological Statistics (2nd ed., 313 pp.). Baltimore, MD: Sparky House Publishing. https://e-ilami.unissa.edu.bn:8443/handle/20.500.14275/6724
  55. Miller, G., & Spiegel, E. (2025). Guidelines for Research Data Integrity (GRDI). Scientific Data, 12, 95. https://doi.org/10.1038/s41597-024-04312-x
  56. Mollick, E. R., & Mollick, L. (2023). Using AI to implement effective teaching strategies in classrooms: Five strategies, including prompts. Wharton School Research Paper. 26 pp. https://doi.org/10.2139/ssrn.4391243
  57. Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., Buck, S., Chambers, C. D., Chin, G., … Yarkoni, T. (2015). Promoting an open research culture: Author guidelines for journals could help to promote transparency, openness, and reproducibility. Science, 348(6242), 1422–1425. https://doi.org/10.1126/science.aab2374
  58. Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600-2606. https://doi.org/10.1073/pnas.1708274114
  59. O'Hara, R. B., & Kotze, D. J. (2010). Do not log-transform count data. Methods in Ecology and Evolution, 1(2), 118-122. https://doi.org/10.1111/j.2041-210X.2010.00021.x
  60. O'Neil, C. (2016). Cathy O’Neil. Weapons of math destruction: How big data increases inequality and threatens democracy. New York: Crown Publishers, 2016. 272p. Hardcover, $26 (ISBN 978-0553418811). College & Research Libraries78(3), 403. https://doi.org/10.5860/crl.78.3.403
  61. Oksanen, J., Simpson, G. L., Blanchet, F. G., Kindt, R., Legendre, P., Minchin, P. R., O’Hara, R. B., Solymos, P., Stevens, M. H. H., Szoecs, E., Wagner, H., Barbour, M., Bedward, M., Bolker, B., Borcard, D., Borman, T., Carvalho, G., Chirico, M., De Caceres, M., … Weedon, J. (2025). vegan: Community ecology package (Version 2.7-2) [Computer software]. Comprehensive R Archive Network. https://doi.org/10.32614/CRAN.package.vegan
  62. OpenAI. (2023). GPT-4 technical report. arXiv preprint arXiv:2303.08774https://doi.org/10.48550/arXiv.2303.08774
  63. Pearl, J., & Mackenzie, D. (2018). The book of why: The new science of cause and effect. Journal of MultiDisciplinary Evaluation14(31), 47–54. https://doi.org/10.56645/jmde.v14i31.507
  64. Peng, R. D. (2011). Reproducible research in computational science. Science, 334(6060), 1226-1227. https://doi.org/10.1126/science.1213847
  65. Peng, R. D., & Matsui, E. (2015). The art of data science. Leanpub. Available at: https://www.academia.edu/40368378/The_Art_of_Data_Science_A_Guide_for_Anyone_Who_Works_with_Data
  66. Poisot, T., Bruneau, A., Gonzalez, A., Gravel, D., & Peres-Neto, P. (2019). Ecological data should not be so hard to find and reuse. Trends in Ecology & Evolution, 34(6), 494-496. https://doi.org/10.1016/j.tree.2019.04.005
  67. Potvin, C., & Roff, D. A. (1993). Distribution-free and robust statistical methods: Viable alternatives to parametric statistics? Ecology, 74(6), 1617-1628. https://doi.org/10.2307/1939920
  68. Powers, S. M., & Hampton, S. E. (2019). Open science, reproducibility, and transparency in ecology. Ecological Applications, 29(1), e01822. https://doi.org/10.1002/eap.1822
  69. R Core Team. (2026). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
  70. Recknagel, F. (2001). Applications of machine learning to ecological modelling. Ecological Modelling, 146(1-3), 303-310. https://doi.org/10.1016/S0304-3800(01)00316-7
  71. Rane, N. (2023). ChatGPT and similar generative artificial intelligence (AI) for smart industry: Role, challenges and opportunities for Industry 4.0, Industry 5.0 and Society 5.0. SSRN Electronic Journal. 8 pp. https://doi.org/10.2139/ssrn.4603234
  72. Reynolds, L., & McDonell, K. (2021). Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (pp. 1-7). https://doi.org/10.1145/3411763.3451760
  73. Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1, 206-215. https://doi.org/10.1038/s42256-019-0048-x
  74. Sandve, G. K., Nekrutenko, A., Taylor, J., & Hovig, E. (2013). Ten simple rules for reproducible computational research. PLoS Computational Biology, 9(10), e1003285. https://doi.org/10.1371/journal.pcbi.1003285
  75. Sarker, I. H. (2022). AI-based modeling: Techniques, applications and research issues towards automation, intelligent and smart systems. SN Computer Science, 3, 158. https://doi.org/10.1007/s42979-022-01043-x
  76. Selbst, A. D., Boyd, D., Friedler, S. A., Venkatasubramanian, S., & Vertesi, J. (2019). Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency (pp. 59-68). https://doi.org/10.1145/3287560.3287598
  77. Senior, A. W., Evans, R., Jumper, J., Kirkpatrick, J., Sifre, L., Green, T., Qin, C., Žídek, A., Nelson, A. W. R., Bridgland, A., Penedones, H., Petersen, S., Simonyan, K., Crossan, S., Kohli, P., Jones, D. T., Silver, D., Kavukcuoglu, K., & Hassabis, D. (2020). Improved protein structure prediction using potentials from deep learning. Nature, 577, 706–710. https://doi.org/10.1038/s41586-019-1923-7
  78. Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples). Biometrika, 52(3–4), 591–611. https://doi.org/10.1093/biomet/52.3-4.591
  79. Shukla, R. M., Pandey, Y., Faisal, S., & Muzamil, M. (2025). Parametric and non-parametric analysis. In Advances in Agricultural Research Methodology (Vol. 2, pp. 260–272). S P Publishing.
  80. Singla, A. (2023). Evaluating ChatGPT and GPT-4 for visual programming. In ICER ’23: Proceedings of the 2023 ACM Conference on International Computing Education Research – Volume 2 (pp. 14–15). Association for Computing Machinery. https://doi.org/10.1145/3568812.3603474
  81. Stoyanovich, J., Van Bavel, J. J., & West, T. V. (2020). The imperative of interpretable machines. Nature Machine Intelligence, 2(4), 197-199. https://doi.org/10.1038/s42256-020-0171-8
  82. Sokal, R. R., & Rohlf, F. J. (2012). Biometry: The principles and practice of statistics in biological research (4th ed.). W. H. Freeman.
  83. Student. (1908). The probable error of a mean. Biometrika, 6(1), 1-25. https://doi.org/10.1093/biomet/6.1.1
  84. Thessen, A. E. (2016). Adoption of machine learning techniques in ecology and earth science. One Ecosystem, 1, e8621. https://doi.org/10.3897/oneeco.1.e8621
  85. Thomas, D. W., Kenfack, D., Chuyong, G. B., Moses, S. N., Losos, E. C., Condit,  R. S. and Songwe, N. C. (2003). Tree species of the South Western                Cameroon: Tree Distribution Maps, Diameter Tables, and Species  documentation of the 50- hectare Korup Forest Dynamics plot. Center for Tropical Forest Science of the Smithsonian Tropical Research Institute and Bioresources Development and Conservation Program, Washington DC. 247pp
  86. Tippmann, S. (2015). Programming tools: Adventures with R. Nature, 517, 109-110. https://doi.org/10.1038/517109a
  87. Topol, E. J. (2019). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25, 44-56. https://doi.org/10.1038/s41591-018-0300-7
  88. Touchon, J. C., & McCoy, M. W. (2016). The mismatch between current statistical practice and doctoral training in ecology. Ecosphere, 7(8), e01394. https://doi.org/10.1002/ecs2.1394
  89. Tukey, J. W. (1977). Exploratory data analysis (Vol. 2). Addison-Wesley. 688pp
  90. Tukey, J. W. (2008). Exploratory data analysis. In N. Salkind (Ed.), The concise encyclopedia of statistics (pp. 192–194). Springer. https://doi.org/10.1007/978-0-387-32833-1_136
  91. Underwood, A. J. (1997). Experiments in ecology: Their logical design and interpretation using analysis of variance. Cambridge University Press. Pp504
  92. Vaithilingam, P., Zhang, T., & Glassman, E. L. (2022). Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In CHI Conference on Human Factors in Computing Systems Extended Abstracts (pp. 1-7). https://doi.org/10.1145/3491101.3519665
  93. van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software45(3), 1–67. https://doi.org/10.18637/jss.v045.i03
  94. Wang, D., Weisz, J. D., Muller, M., Ram, P., Geyer, W., Dugan, C., ... & Gray, A. (2021). Human-AI collaboration in data science: Exploring data scientists' perceptions of automated AI. Proceedings of the ACM on Human-Computer Interaction, 3, 1-24. https://doi.org/10.1145/3359313
  95. Warton, D. I., & Hui, F. K. (2011). The arcsine is asinine: The analysis of proportions in ecology. Ecology, 92(1), 3-10. https://doi.org/10.1890/10-0340.1
  96. Warton, D. I., Stoklosa, J., Guillera-Arroita, G., MacKenzie, D. I., & Welsh, A. H. (2017). Graphical diagnostics for occupancy models with imperfect detection. Methods in Ecology and Evolution, 8(4), 408-419. https://doi.org/10.1111/2041-210X.12761
  97. White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., ... & Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv preprint arXiv:2302.11382https://doi.org/10.48550/arXiv.2302.11382
  98. Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10), 1-23. https://doi.org/10.18637/jss.v059.i10
  99. Wickham, H., Averick, M., Bryan, J., Chang, W., D’Agostino McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., Takahashi, K., Vaughan, D., Wilke, C., Woo, K., & Yutani, H. (2019). Welcome to the tidyverseJournal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
  100. Wickham, H., Cook, D., & Hofmann, H. (2015). Visualizing statistical models: Removing the blindfold. Statistical Analysis and Data Mining, 8(4), 203-225. https://doi.org/10.1002/sam.11271
  101. Wickham, H., & Grolemund, G. (2016). R for data science: Import, tidy, transform, visualize, and model data (2nd ed., 520 pp.). O’Reilly Media.
  102. Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80-83.
  103. Xu, F., Uszkoreit, H., Du, Y., Fan, W., Zhao, D., & Zhu, J. (2021). Explainable AI: A brief survey on history, research areas, approaches and challenges. In Natural Language Processing and Chinese Computing (pp. 563-574).
  104. Zar, J.H. (2010) Biostatistical Analysis. 5th Edition, Prentice-Hall/Pearson, Upper Saddle River, xiii, 944 p. Springerhttps://doi.org/10.1007/978-3-030-32236-6_51
  105. Zamfirescu-Pereira, J. D., Wong, R. Y., Hartmann, B., & Yang, Q. (2023). Why Johnny can't prompt: How non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (pp. 1-21). https://doi.org/10.1145/3544548.3581388
  106. Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2023). Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910. https://doi.org/10.48550/arXiv.2211.01910
  107. Zuur, A. F., Ieno, E. N., Walker, N., Saveliev, A. A., & Smith, G. M. (2009). Mixed effects models and extensions in ecology with R. New York, USA: Springer. https://doi.org/10.1007/978-0-387-87458-6r

This study presents a reproducible, AI-assisted framework for descriptive and univariate statistical analysis of ecological count data, integrating vibe data analysis with conventional manual methods using snapping frequency observations from 94 tree species in Korup National Park, Cameroon. Using Claude.ai to generate R statistical code through structured prompt engineering, we systematically compare classical parametric approaches (t-test, Z-test) with non-parametric alternatives (Wilcoxon signed-rank test, sign test, bootstrap confidence intervals) to determine the most appropriate analytical framework for forestry count data across four stages; exploratory data analysis, normality assessment, hypothesis testing, and outlier detection. Snapping frequency exhibited extreme positive skewness (5.087) and leptokurtic distribution (kurtosis = 36.725), Protomegabaria stapfiana (8 snappings; z = 3.41). Comparison of vibe analysis and manual analysis across 16 statistical outputs revealed complete numerical equivalence, with the AI demonstrating autonomous assumption-aware method selection without explicit instruction. While vibe analysis completed all stages within a single iterative session, mandatory validation through executed R code and analyst oversight remain essential. This framework provides forestry researchers with accessible, validated tools for rigorous, reproducible statistical analysis of non-normal count data.

Keywords : Descriptive Statistics; Univariate Analysis; Parametric Tests; Non-Parametric Tests; Normality Testing; Artificial Intelligence; R Programming; Data Visualization; Ecological Data Analysis.

Paper Submission Last Date
30 - April - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS
Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe