Abstract:
Heart disease remains a leading cause of mortality worldwide, underscoring the importance of accurate and transparent methods for early diagnosis. While many machine learning and artificial intelligence models have demonstrated strong predictive performance, their limited interpretability poses challenges for clinical adoption. In this study, we evaluate three interpretable linear classification models—Generalized Linear Model (GLM) logistic regression, L1-regularized (Lasso) logistic regression, and Linear Discriminant Analysis (LDA)—for heart disease prediction using the Cleveland Heart Disease dataset. Following comprehensive data preprocessing, the models are assessed on a held-out test set using standard evaluation metrics, including accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (ROC-AUC). The results show that all three models achieve strong discriminative performance. Among them, Lasso logistic regression attains the highest accuracy and F1-score, reflecting a favorable balance between sensitivity and specificity, while GLM and LDA exhibit comparable performance with slightly lower recall. Importantly, the GLM framework enables identification of clinically meaningful predictors, reinforcing its interpretability and relevance for medical decision-making. These findings demonstrate that interpretable linear models can provide reliable and transparent tools for heart disease prediction, offering a practical alternative to more complex black-box approaches in clinical settings.