Descriptive and univariate vibe analysis of forestry data with ai and r statistics| International Journal of Innovative Science and Research Technology

Descriptive and Univariate Vibe Analysis of Forestry Data with AI and R Statistics

Authors : Kato Samuel Namuene; Ndinge Nadia Mbella

Volume/Issue : Volume 11 - 2026, Issue 4 - April

Google Scholar : https://tinyurl.com/ycw3bv5w

Scribd : https://tinyurl.com/hppr4cya

DOI : https://doi.org/10.38124/ijisrt/26apr077

PlumX Metrics

Semantic Scholar

ResearchGate

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Abstract : This study presents a reproducible, AI-assisted framework for descriptive and univariate statistical analysis of ecological count data, integrating vibe data analysis with conventional manual methods using snapping frequency observations from 94 tree species in Korup National Park, Cameroon. Using Claude.ai to generate R statistical code through structured prompt engineering, we systematically compare classical parametric approaches (t-test, Z-test) with non-parametric alternatives (Wilcoxon signed-rank test, sign test, bootstrap confidence intervals) to determine the most appropriate analytical framework for forestry count data across four stages; exploratory data analysis, normality assessment, hypothesis testing, and outlier detection. Snapping frequency exhibited extreme positive skewness (5.087) and leptokurtic distribution (kurtosis = 36.725), Protomegabaria stapfiana (8 snappings; z = 3.41). Comparison of vibe analysis and manual analysis across 16 statistical outputs revealed complete numerical equivalence, with the AI demonstrating autonomous assumption-aware method selection without explicit instruction. While vibe analysis completed all stages within a single iterative session, mandatory validation through executed R code and analyst oversight remain essential. This framework provides forestry researchers with accessible, validated tools for rigorous, reproducible statistical analysis of non-normal count data.

Keywords : Descriptive Statistics; Univariate Analysis; Parametric Tests; Non-Parametric Tests; Normality Testing; Artificial Intelligence; R Programming; Data Visualization; Ecological Data Analysis.

References :

Alkaissi, H., & McFarlane, S. I. (2023). Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus, 15(2), e35179. https://doi.org/10.7759/cureus.35179
Anderson, T. W., & Darling, D. A. (1954). A test of goodness of fit. Journal of the American Statistical Association, 49(268), 765–769. https://doi.org/10.2307/2281537
Antoch, J., Jureckova, J., Maciak, M., & Pešta, M. (2017). Analytical methods in statistics: AMISTAT, Prague, November 2015 (1st ed.). Springer Proceedings in Mathematics & Statistics, Vol. 193, 216 pp. Springer. https://doi.org/10.1007/978-3-319-51313-3
Anthropic. (2026). Claude (3.5 Sonnet version) [Large language model]. Available at: https://claude.ai/
Barke, S., James, M. B., & Polikarpova, N. (2023). Grounded Copilot: How programmers interact with code-generating models. Proceedings of the ACM on Programming Languages, 7, 85-111. https://doi.org/10.1145/3586030
Baumer, B., Cetinkaya-Rundel, M., Bray, A., Loi, L., & Horton, N. J. (2014). R Markdown: Integrating a reproducible analysis tool into introductory statistics. Technology Innovations in Statistics Education, 8(1), 1-29. https://doi.org/10.5070/T581020118
Beam, A. L., & Kohane, I. S. (2018). Big data and machine learning in health care. JAMA, 319(13), 1317-1318. https://doi.org/10.1001/jama.2017.18391
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: can language models be too big? FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623. https://doi.org/10.1145/3442188.3445922
Blei, D. M., & Smyth, P. (2017). Science and data science. Proceedings of the National Academy of Sciences of the United States of America, 114(33), 8689–8692. https://doi.org/10.1073/pnas.1702076114
Bolker, B. M., Brooks, M. E., Clark, C. J., Geange, S. W., Poulsen, J. R., Stevens, M. H. H., & White, J. S. S. (2009). Generalized linear mixed models: A practical guide for ecology and evolution. Trends in Ecology & Evolution, 24(3), 127-135. https://doi.org/10.1016/j.tree.2008.10.008
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., ... & Liang, P. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258. https://doi.org/10.48550/arXiv.2108.07258
Bonnini, S., Assegie, G. M., & Trzcinska, K. (2024). Review about the permutation approach in hypothesis testing. Mathematics, 12(17), 2617. https://doi.org/10.3390/math12172617
Borcard, D., Gillet, F., & Legendre, P. (2018). Numerical ecology with R (2nd ed.). pp435 Springer. https://doi.org/10.1007/978-3-319-71404-2
Breiman, L. (2001). Statistical modeling: The two cultures. Statistical Science, 16(3), 199–215. https://doi.org/10.1214/ss/1009213726
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877-1901. https://doi.org/10.48550/arXiv.2005.14165
Çetinkaya-Rundel, M., & Rundel, C. (2018). Infrastructure and tools for teaching computing throughout the statistical curriculum. The American Statistician, 72(1), 58–65. https://doi.org/10.1080/00031305.2017.1397549
Chaoubi, F., & Djalab, M. S. (2025). Statistical methods and appropriate selection criteria. Pakistan Journal of Life and Social Sciences, 23(1), 1624–1631. https://doi.org/10.57239/PJLSS-2025-23.1.00125
Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. D. O., Kaplan, J., ... & Zaremba, W. (2021). Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374. https://doi.org/10.48550/arXiv.2107.03374
Conover, W. J., & Iman, R. L. (1981). Rank transformations as a bridge between parametric and nonparametric statistics. The American Statistician, 35(3), 124-129. https://doi.org/10.2307/2683975
Cox, N. J. (2007). The grammar of graphics. Journal of Statistical Software, 17(Book Review 3), 1–7. https://doi.org/10.18637/jss.v017.b03
Crawley, M. J. (2013). The R Book (2nd ed., 1051 pp.). Chichester, UK: John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118448908
D’Agostino, R., & Pearson, E. S. (1973). Tests for departure from normality. Empirical results for the distributions of b2 and √b1. Biometrika, 60(3), 613–622. https://doi.org/10.2307/2335012
Davis, A. J., & Kay, S. (2023). Writing statistical methods for ecologists. Ecosphere, 14, e4539. https://doi.org/10.1002/ecs2.4539
de Mendiburu, F. (2023). agricolae: Statistical procedures for agricultural research (Version 1.3-7) [Computer software]. Comprehensive R Archive Network. https://doi.org/10.32614/CRAN.package.agricolae
Denny, P., Kumar, V., & Giacaman, N. (2024). Conversing with Copilot: Exploring prompt engineering for solving CS1 problems using natural language. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education(Vol. 1, pp. 427-433). https://doi.org/10.1145/3626252.3630793
Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V., DePristo, M., Chou, K., Cui, C., Corrado, G., Thrun, S., & Dean, J. (2019). A guide to deep learning in healthcare. Nature Medicine, 25, 24–29. https://doi.org/10.1038/s41591-018-0316-z
Fagerland, M. W., & Sandvik, L. (2009). Performance of five two-sample location tests for skewed distributions with unequal variances. Contemporary Clinical Trials, 30(5), 490-496. https://doi.org/10.1016/j.cct.2009.06.007
Field, A. (2017). Discovering statistiIBM SPSS statistics (5th ed.). SAGE Publications. https://hdl.handle.net/10779/uos.23460641.v1
Fox, J., & Weisberg, S. (2019). An R companion to applied regression (3rd ed.). Sage Publishing Inc. pp608
Gandrud, C. (2018). Reproducible research with R and RStudio (2nd ed., 323 pp.). Chapman & Hall/CRC. https://doi.org/10.1201/9781315382548
Gelman, A., & Loken, E. (2019). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no "fishing expedition" or "p-hacking" and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University, 348, 1-17.
Gomes, D. G., Pottier, P., Crystal-Ornelas, R., Hudgins, E. J., Foroughirad, V., Sánchez-Reyes, L. L., ... & Gaynor, K. M. (2022). Why don't we share data and code? Perceived barriers and benefits to public archiving practices. Proceedings of the Royal Society B, 289(1987), 20221113. https://doi.org/10.1098/rspb.2022.1113
Gross, J., & Ligges, U. (2015). nortest: Tests for normality (Version 1.0-4) [Computer software]. Comprehensive R Archive Network. https://doi.org/10.32614/CRAN.package.nortest
Hampton, S. E., Anderson, S. S., Bagby, S. C., Gries, C., Han, X., Hart, E. M., Jones, M. B., Lenhardt, W. C., MacDonald, A., et al. (2015). The Tao of open science for ecology. Ecosphere, 6(7), 1–13. https://doi.org/10.1890/ES14-00402.1
Hampton, S. E., Jones, M. B., Wasser, L. A., Schildhauer, M. P., Supp, S. R., Brun, J., Hernandez, R. R., Boettiger, C., Collins, S. L., Gross, L. J., Fernández, D. S., Budden, A., White, E. P., Teal, T. K., Labou, S. G., & Aukema, J. E. (2017). Skills and knowledge for data-intensive environmental research. BioScience, 67(6), 546–557. https://doi.org/10.1093/biosci/bix025
Hampton, S. E., Strasser, C. A., Tewksbury, J. J., Gram, W. K., Budden, A. E., Batcheller, A. L., Duke, C. S. & Porter, J. H. (2013). Big data and the future of ecology. Frontiers in Ecology and the Environment, 11(3), 156-162. https://doi.org/10.1890/120103
Harrison, X. A., Donaldson, L., Correa-Cano, M. E., Evans, J., Fisher, D. N., Goodwin, C. E., ... & Inger, R. (2018). A brief introduction to mixed effects modelling and multi-model inference in ecology. PeerJ, 6, e4794. https://doi.org/10.7717/peerj.4794
Head, M. L., Holman, L., Lanfear, R., & Kahn, A. T. (2015). The extent and consequences of p-hacking in science. PLOS Biology, 13(3), e1002106. https://doi.org/10.1371/journal.pbio.1002106
Hollander, M., Wolfe, D. A., & Chicken, E. (2015). Nonparametric Statistical Methods (3rd ed., 848 pp.). Hoboken, NJ: John Wiley & Sons. https://doi.org/10.1002/9781119196037.
Husch, B., Beers, T. W., & Kershaw, J. A. (2003). Forest mensuration (4th ed.). John Wiley & Sons.
Ihaka, R., & Gentleman, R. (1996). R: A Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics, 5(3), 299–314. https://doi.org/10.1080/10618600.1996.10474713
Jupyter, P., Bussonnier, M., Forde, J., Freeman, J., Granger, B., Head, T., Holdgraf, C., Kelley, K., Nalvarte, G., Osheroff, A., Pacer, M., Panda, Y., Perez, F., Ragan-Kelley, B., & Willing, C. (2018). Binder 2.0: Reproducible, interactive, sharable environments for science at scale. In Proceedings of the 17th Python in Science Conference. SciPy. (pp. 113-120). https://doi.org/10.25080/Majora-4af1f417-011
Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., ... & Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274
Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data & Society, 1(1) https://doi.org/10.1177/2053951714528481
Kolmogorov, A. (1933) Sulla determinazione empirica di una legge di distribuzione. Giornale dell’Istituto Italiano degliAttuari, 4, 83-91.
Lai, J., Lortie, C. J., Muenchen, R. A., Yang, J., & Ma, K. (2019). Evaluating the popularity of R in ecology. Ecosphere, 10(1), e02567. https://doi.org/10.1002/ecs2.2567
Liao, Q. V., & Vaughan, J. W. (2023). AI transparency in the age of LLMs: A human-centered research roadmap. arXiv preprint arXiv:2306.01941. https://doi.org/10.48550/arXiv.2306.01941
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1-35. https://doi.org/10.1145/3560815
Logan, M. (2012). Review of Biostatistical design and analysis using R: A practical guide. Ethology Ecology & Evolution, 24(2). https://doi.org/10.1080/03949370.2011.618191
Lowndes, J., Best, B., Scarborough, C., Afflerbach, J., Brown, J., Cheng, C., Franklin, M., Goldsborough, D., M, M. E., Nelson, G., Pitz, K., Rocha, A., Sauer, J., & Strand, M. (2017). Our path to better science in less time using open data science tools. Nature Ecology & Evolution, 1, 0160. https://doi.org/10.1038/s41559-017-0160
MacNeil, S., Tran, A., Hellas, A., Kim, J., Sarsa, S., Denny, P., Bernstein, S., Leinonen, J & Bernstein, S. (2023). Experiences from using code explanations generated by large language models in a web software development e-book. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education (Vol. 1, pp. 931-937). https://doi.org/10.1145/3545945.3569785
Mandasini, A. A. (2022). Improving the validity of scientific research results through research methodology management. Kontigensi: Jurnal Ilmiah Manajemen, 10(2), 439–447. https://doi.org/10.56457/jimk.v10i2.389
Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, 18(1), 50–60. https://www.jstor.org/stable/2236101
McDonald, J. H. (2009). Handbook of Biological Statistics (2nd ed., 313 pp.). Baltimore, MD: Sparky House Publishing. https://e-ilami.unissa.edu.bn:8443/handle/20.500.14275/6724
Miller, G., & Spiegel, E. (2025). Guidelines for Research Data Integrity (GRDI). Scientific Data, 12, 95. https://doi.org/10.1038/s41597-024-04312-x
Mollick, E. R., & Mollick, L. (2023). Using AI to implement effective teaching strategies in classrooms: Five strategies, including prompts. Wharton School Research Paper. 26 pp. https://doi.org/10.2139/ssrn.4391243
Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., Buck, S., Chambers, C. D., Chin, G., … Yarkoni, T. (2015). Promoting an open research culture: Author guidelines for journals could help to promote transparency, openness, and reproducibility. Science, 348(6242), 1422–1425. https://doi.org/10.1126/science.aab2374
Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600-2606. https://doi.org/10.1073/pnas.1708274114
O'Hara, R. B., & Kotze, D. J. (2010). Do not log-transform count data. Methods in Ecology and Evolution, 1(2), 118-122. https://doi.org/10.1111/j.2041-210X.2010.00021.x
O'Neil, C. (2016). Cathy O’Neil. Weapons of math destruction: How big data increases inequality and threatens democracy. New York: Crown Publishers, 2016. 272p. Hardcover, $26 (ISBN 978-0553418811). College & Research Libraries, 78(3), 403. https://doi.org/10.5860/crl.78.3.403
Oksanen, J., Simpson, G. L., Blanchet, F. G., Kindt, R., Legendre, P., Minchin, P. R., O’Hara, R. B., Solymos, P., Stevens, M. H. H., Szoecs, E., Wagner, H., Barbour, M., Bedward, M., Bolker, B., Borcard, D., Borman, T., Carvalho, G., Chirico, M., De Caceres, M., … Weedon, J. (2025). vegan: Community ecology package (Version 2.7-2) [Computer software]. Comprehensive R Archive Network. https://doi.org/10.32614/CRAN.package.vegan
OpenAI. (2023). GPT-4 technical report. arXiv preprint arXiv:2303.08774. https://doi.org/10.48550/arXiv.2303.08774
Pearl, J., & Mackenzie, D. (2018). The book of why: The new science of cause and effect. Journal of MultiDisciplinary Evaluation, 14(31), 47–54. https://doi.org/10.56645/jmde.v14i31.507
Peng, R. D. (2011). Reproducible research in computational science. Science, 334(6060), 1226-1227. https://doi.org/10.1126/science.1213847
Peng, R. D., & Matsui, E. (2015). The art of data science. Leanpub. Available at: https://www.academia.edu/40368378/The_Art_of_Data_Science_A_Guide_for_Anyone_Who_Works_with_Data
Poisot, T., Bruneau, A., Gonzalez, A., Gravel, D., & Peres-Neto, P. (2019). Ecological data should not be so hard to find and reuse. Trends in Ecology & Evolution, 34(6), 494-496. https://doi.org/10.1016/j.tree.2019.04.005
Potvin, C., & Roff, D. A. (1993). Distribution-free and robust statistical methods: Viable alternatives to parametric statistics? Ecology, 74(6), 1617-1628. https://doi.org/10.2307/1939920
Powers, S. M., & Hampton, S. E. (2019). Open science, reproducibility, and transparency in ecology. Ecological Applications, 29(1), e01822. https://doi.org/10.1002/eap.1822
R Core Team. (2026). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Recknagel, F. (2001). Applications of machine learning to ecological modelling. Ecological Modelling, 146(1-3), 303-310. https://doi.org/10.1016/S0304-3800(01)00316-7
Rane, N. (2023). ChatGPT and similar generative artificial intelligence (AI) for smart industry: Role, challenges and opportunities for Industry 4.0, Industry 5.0 and Society 5.0. SSRN Electronic Journal. 8 pp. https://doi.org/10.2139/ssrn.4603234
Reynolds, L., & McDonell, K. (2021). Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (pp. 1-7). https://doi.org/10.1145/3411763.3451760
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1, 206-215. https://doi.org/10.1038/s42256-019-0048-x
Sandve, G. K., Nekrutenko, A., Taylor, J., & Hovig, E. (2013). Ten simple rules for reproducible computational research. PLoS Computational Biology, 9(10), e1003285. https://doi.org/10.1371/journal.pcbi.1003285
Sarker, I. H. (2022). AI-based modeling: Techniques, applications and research issues towards automation, intelligent and smart systems. SN Computer Science, 3, 158. https://doi.org/10.1007/s42979-022-01043-x
Selbst, A. D., Boyd, D., Friedler, S. A., Venkatasubramanian, S., & Vertesi, J. (2019). Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency (pp. 59-68). https://doi.org/10.1145/3287560.3287598
Senior, A. W., Evans, R., Jumper, J., Kirkpatrick, J., Sifre, L., Green, T., Qin, C., Žídek, A., Nelson, A. W. R., Bridgland, A., Penedones, H., Petersen, S., Simonyan, K., Crossan, S., Kohli, P., Jones, D. T., Silver, D., Kavukcuoglu, K., & Hassabis, D. (2020). Improved protein structure prediction using potentials from deep learning. Nature, 577, 706–710. https://doi.org/10.1038/s41586-019-1923-7
Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples). Biometrika, 52(3–4), 591–611. https://doi.org/10.1093/biomet/52.3-4.591
Shukla, R. M., Pandey, Y., Faisal, S., & Muzamil, M. (2025). Parametric and non-parametric analysis. In Advances in Agricultural Research Methodology (Vol. 2, pp. 260–272). S P Publishing.
Singla, A. (2023). Evaluating ChatGPT and GPT-4 for visual programming. In ICER ’23: Proceedings of the 2023 ACM Conference on International Computing Education Research – Volume 2 (pp. 14–15). Association for Computing Machinery. https://doi.org/10.1145/3568812.3603474
Stoyanovich, J., Van Bavel, J. J., & West, T. V. (2020). The imperative of interpretable machines. Nature Machine Intelligence, 2(4), 197-199. https://doi.org/10.1038/s42256-020-0171-8
Sokal, R. R., & Rohlf, F. J. (2012). Biometry: The principles and practice of statistics in biological research (4th ed.). W. H. Freeman.
Student. (1908). The probable error of a mean. Biometrika, 6(1), 1-25. https://doi.org/10.1093/biomet/6.1.1
Thessen, A. E. (2016). Adoption of machine learning techniques in ecology and earth science. One Ecosystem, 1, e8621. https://doi.org/10.3897/oneeco.1.e8621
Thomas, D. W., Kenfack, D., Chuyong, G. B., Moses, S. N., Losos, E. C., Condit, R. S. and Songwe, N. C. (2003). Tree species of the South Western Cameroon: Tree Distribution Maps, Diameter Tables, and Species documentation of the 50- hectare Korup Forest Dynamics plot. Center for Tropical Forest Science of the Smithsonian Tropical Research Institute and Bioresources Development and Conservation Program, Washington DC. 247pp
Tippmann, S. (2015). Programming tools: Adventures with R. Nature, 517, 109-110. https://doi.org/10.1038/517109a
Topol, E. J. (2019). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25, 44-56. https://doi.org/10.1038/s41591-018-0300-7
Touchon, J. C., & McCoy, M. W. (2016). The mismatch between current statistical practice and doctoral training in ecology. Ecosphere, 7(8), e01394. https://doi.org/10.1002/ecs2.1394
Tukey, J. W. (1977). Exploratory data analysis (Vol. 2). Addison-Wesley. 688pp
Tukey, J. W. (2008). Exploratory data analysis. In N. Salkind (Ed.), The concise encyclopedia of statistics (pp. 192–194). Springer. https://doi.org/10.1007/978-0-387-32833-1_136
Underwood, A. J. (1997). Experiments in ecology: Their logical design and interpretation using analysis of variance. Cambridge University Press. Pp504
Vaithilingam, P., Zhang, T., & Glassman, E. L. (2022). Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In CHI Conference on Human Factors in Computing Systems Extended Abstracts (pp. 1-7). https://doi.org/10.1145/3491101.3519665
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. https://doi.org/10.18637/jss.v045.i03
Wang, D., Weisz, J. D., Muller, M., Ram, P., Geyer, W., Dugan, C., ... & Gray, A. (2021). Human-AI collaboration in data science: Exploring data scientists' perceptions of automated AI. Proceedings of the ACM on Human-Computer Interaction, 3, 1-24. https://doi.org/10.1145/3359313
Warton, D. I., & Hui, F. K. (2011). The arcsine is asinine: The analysis of proportions in ecology. Ecology, 92(1), 3-10. https://doi.org/10.1890/10-0340.1
Warton, D. I., Stoklosa, J., Guillera-Arroita, G., MacKenzie, D. I., & Welsh, A. H. (2017). Graphical diagnostics for occupancy models with imperfect detection. Methods in Ecology and Evolution, 8(4), 408-419. https://doi.org/10.1111/2041-210X.12761
White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., ... & Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv preprint arXiv:2302.11382. https://doi.org/10.48550/arXiv.2302.11382
Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10), 1-23. https://doi.org/10.18637/jss.v059.i10
Wickham, H., Averick, M., Bryan, J., Chang, W., D’Agostino McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., Takahashi, K., Vaughan, D., Wilke, C., Woo, K., & Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Wickham, H., Cook, D., & Hofmann, H. (2015). Visualizing statistical models: Removing the blindfold. Statistical Analysis and Data Mining, 8(4), 203-225. https://doi.org/10.1002/sam.11271
Wickham, H., & Grolemund, G. (2016). R for data science: Import, tidy, transform, visualize, and model data (2nd ed., 520 pp.). O’Reilly Media.
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80-83.
Xu, F., Uszkoreit, H., Du, Y., Fan, W., Zhao, D., & Zhu, J. (2021). Explainable AI: A brief survey on history, research areas, approaches and challenges. In Natural Language Processing and Chinese Computing (pp. 563-574).
Zar, J.H. (2010) Biostatistical Analysis. 5th Edition, Prentice-Hall/Pearson, Upper Saddle River, xiii, 944 p. Springer. https://doi.org/10.1007/978-3-030-32236-6_51
Zamfirescu-Pereira, J. D., Wong, R. Y., Hartmann, B., & Yang, Q. (2023). Why Johnny can't prompt: How non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (pp. 1-21). https://doi.org/10.1145/3544548.3581388
Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2023). Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910. https://doi.org/10.48550/arXiv.2211.01910
Zuur, A. F., Ieno, E. N., Walker, N., Saveliev, A. A., & Smith, G. M. (2009). Mixed effects models and extensions in ecology with R. New York, USA: Springer. https://doi.org/10.1007/978-0-387-87458-6r

This study presents a reproducible, AI-assisted framework for descriptive and univariate statistical analysis of ecological count data, integrating vibe data analysis with conventional manual methods using snapping frequency observations from 94 tree species in Korup National Park, Cameroon. Using Claude.ai to generate R statistical code through structured prompt engineering, we systematically compare classical parametric approaches (t-test, Z-test) with non-parametric alternatives (Wilcoxon signed-rank test, sign test, bootstrap confidence intervals) to determine the most appropriate analytical framework for forestry count data across four stages; exploratory data analysis, normality assessment, hypothesis testing, and outlier detection. Snapping frequency exhibited extreme positive skewness (5.087) and leptokurtic distribution (kurtosis = 36.725), Protomegabaria stapfiana (8 snappings; z = 3.41). Comparison of vibe analysis and manual analysis across 16 statistical outputs revealed complete numerical equivalence, with the AI demonstrating autonomous assumption-aware method selection without explicit instruction. While vibe analysis completed all stages within a single iterative session, mandatory validation through executed R code and analyst oversight remain essential. This framework provides forestry researchers with accessible, validated tools for rigorous, reproducible statistical analysis of non-normal count data.

Paper Submission Last Date
31 - July - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.