Abstract:
Missing data is a common problem in most pavement management systems (PMS) databases, affecting the accuracy of pavement performance predictions and the quality of pavement management decisions. A potential solution to this problem is the imputation of missing values. This study analyses the suitability and performance of four different imputation methods – mean imputation, multivariate imputation with chained equations (MICE), k-nearest neighbors (KNN), and nonparametric missing value imputation using random forest (MissForest) – as a solution for the missing data problem in PMS databases. A case study based on data from the Long-Term Pavement Performance (LTPP) database was used as a research method to illustrate the application of the imputation methods and to compare their performance. The results show that machine learning methods, in particular MissForest, outperform other methods. They also demonstrate the merits of imputation as a solution for missing values. The findings of this study are primarily of interest to road agencies, which can now complete
their PMS databases and improve their management practice.