Comparing Multiple Imputation and Machine Learning Techniques for Longitudinal Data


Authors : Sanjana Rajamani; Seena Thomas

Volume/Issue : Volume 8 - 2023, Issue 10 - October

Google Scholar : https://tinyurl.com/267js72n

Scribd : https://tinyurl.com/y6r3c2w4

DOI : https://doi.org/10.5281/zenodo.10053050

Abstract : It is an essential part of research to find ways to impute the missing values in a data set. The missingness is unavoidable as it could be due to natural or non-natural reasons. Missing information is inevitable in longitudinal or multilevel studies, and can result in biased estimates, loss of power, variability and inaccuracy in results. For this study a complete data which showed the resistance scores of intellectually disabled children on giving behavioral skilltraining was considered in order to compare the variousimputation techniques. The secondary data collected was longitudinal in nature. The resistance score was noted beforethe training and at four different time points after the training. A random missingness was created under varying percentages in the complete data (5%, 10%, 15%, 20%, 30%) using the MAR mechanism. The obtained values after imputation were compared with full data using a linear mixed model. Various models built under the multiple imputation and machine learning techniques for imputing different features which are used to predict the resistance score, using the coefficients taken from the real data and the same mechanism was implemented for simulated data as well. The methods based on machine learning techniques were the most suited for the imputation of missing values and led to a significant enhancement of prognosis accuracy when compared to multiple imputation techniques using linear mixed models.

Keywords : Multiple Imputation, MAR Mechanisms, Machine Learning Techniques, Linear Mixed Effect Model.

It is an essential part of research to find ways to impute the missing values in a data set. The missingness is unavoidable as it could be due to natural or non-natural reasons. Missing information is inevitable in longitudinal or multilevel studies, and can result in biased estimates, loss of power, variability and inaccuracy in results. For this study a complete data which showed the resistance scores of intellectually disabled children on giving behavioral skilltraining was considered in order to compare the variousimputation techniques. The secondary data collected was longitudinal in nature. The resistance score was noted beforethe training and at four different time points after the training. A random missingness was created under varying percentages in the complete data (5%, 10%, 15%, 20%, 30%) using the MAR mechanism. The obtained values after imputation were compared with full data using a linear mixed model. Various models built under the multiple imputation and machine learning techniques for imputing different features which are used to predict the resistance score, using the coefficients taken from the real data and the same mechanism was implemented for simulated data as well. The methods based on machine learning techniques were the most suited for the imputation of missing values and led to a significant enhancement of prognosis accuracy when compared to multiple imputation techniques using linear mixed models.

Keywords : Multiple Imputation, MAR Mechanisms, Machine Learning Techniques, Linear Mixed Effect Model.

CALL FOR PAPERS


Paper Submission Last Date
31 - May - 2024

Paper Review Notification
In 1-2 Days

Paper Publishing
In 2-3 Days

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe