Abstract:
This study demonstrates how a fitting graph can enhance explainability and robustness during the model development phase of a machine learning project. The approach is illustrated with a ridge regression task, where the goal is to identify the best-fitting regularization parameter, λ, from a range of values. A simple scatterplot of λ values (indicating model complexity) against average mean squared error, MSE (representing predictive accuracy), provides a visual representation to help the model developer determine if sufficient iterations of k-fold cross-validation have been performed. In addition, this study shows how fitting graph curves can be estimated and constructed from noisy scatterplots using regression splines. Instead of increasing the number of reps of cross-validation, a regression spline can save you time in estimating the fitting graph, using far fewer iterations.
The fitting graph is also presented as a tool to promote model robustness, defined as the model's ability to maintain performance levels across variations in the hyperparameter λ. This concept is demonstrated through a case study on an unstable polynomial regression model. The simulation study reveals that standard k-fold cross-validation, even when repeated 5 or 10 times, selects an incorrect and unstable λ by an overwhelming margin. In contrast, the fitting graph method reliably selects a λ that is both well-fitting and stable. Without the fitting graph, the model developer is led astray and is more likely to choose a highly unstable λ, leading to suboptimal model performance.