Abstract:
This paper presents a comparative analysis of K-Nearest Neighbors (K-NN), Support Vector Regression (SVR), Decision Trees (DT), and Random Forests (RF) for estimating loss values under varying conditions of missing data (5%, 10%, 15%) and correlation coefficients (ρ). The study aims to determine which method performs best under different scenarios of data sparsity and correlation. Our methodology involves calculating the average absolute error (AAE) for each method across different rates of missing data and ρ values. The results indicate that SVR achieves the lowest AAE at lower missing data rates and lower ρ values, whereas RF excels as the rate of missing data and ρ increase. Specifically, RF demonstrates superior performance with the lowest AAEs at higher missing data rates and higher ρ values, making it the most reliable method overall. The discussion highlights the robustness of RF in handling incomplete and correlated datasets, and its consistent performance compared to other methods. The study concludes by suggesting future research directions, including the development of hybrid models that combine the strengths of SVR and RF, and the exploration of various imputation techniques to enhance model performance. These findings are significant for improving loss estimation and decision-making in fields such as finance, healthcare, and engineering.