Published 31.07.2024
Keywords
- heart disease,
- medical diagnosis support system (MDSS),
- clinical data,
- machine learning
Copyright (c) 2024 Nidhal Hazzaa; Oktay Yıldız (Co-Author)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
Early prediction and diagnosis of CVD are crucial for the effective management and prevention of advanced cases. In this study, a diagnosis system using supervised machine learning is proposed to predict CVD. The system employs multiple ML classifiers, including RF, DT, SVM, LR, and MLP, for predicting atherosclerosis. The UCI repository Sani Z-Alizadeh dataset was used for this research. The imbalanced nature of the dataset, which refers to the number of instances belonging to one class being significantly greater than the number of instances belonging to another class, was addressed using the Synthetic Minority Oversampling Technique (SMOTE) for data resampling. Ten-fold cross-validation procedures were used to split the dataset. The performance of the five machine learning (ML) classifiers was evaluated using standard performance metrics. The evaluation revealed that all classifiers achieved a performance improvement of at least 2%. The proposed model has potential applications in healthcare and can improve clinical diagnosis of CVD disorders, leading to optimized diagnosis, prevention of advanced cases, and lower treatment expenses.
References
- Rani, P., Kumar, R., Ahmed, N. M., & Jain, A. (2021). A decision support system for heart disease prediction based upon machine learning. Journal of Reliable Intelligent Environments, 7(3), 263-275.
- https://www.who.int/en/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds) ,11 June 2021.
- Shah, D., Patel, S., & Bharti, S. K. (2020). Heart disease prediction using machine learning techniques. SN Computer Science, 1(6), 1-6.
- Swathy, M., & Saruladha, K. (2021). A comparative study of classification and prediction of Cardio-Vascular Diseases (CVD) using Machine Learning and Deep Learning techniques. ICT Express.
- Baghel, N., Dutta, M. K., & Burget, R. (2020). Automatic diagnosis of multiple cardiac diseases from PCG signals using convolutional neural network. Computer Methods and Programs in Biomedicine, 197, 105750.
- Nangia, R., Singh, H., & Kaur, K. (2016). Prevalence of cardiovascular disease (CVD) risk factors. medical journal armed forces india, 72(4), 315-319.
- Ali, M. M., Paul, B. K., Ahmed, K., Bui, F. M., Quinn, J. M., & Moni, M. A. (2021). Heart disease prediction using supervised machine learning algorithms: performance analysis and comparison. Computers in Biology and Medicine, 136, 104672.
- Pavithra, V., & Jayalakshmi, V. (2021). Hybrid feature selection technique for prediction of cardiovascular diseases. Materials Today: Proceedings.
- Kolukisa, B., & Bakir-Gungor, B. (2023). Ensemble feature selection and classification methods for machine learning-based coronary artery disease diagnosis. Computer Standards & Interfaces, 84, 103706.
- Saboor, A., Usman, M., Ali, S., Samad, A., Abrar, M. F., & Ullah, N. (2022). A Method for Improving Prediction of Human Heart Disease Using Machine Learning Algorithms. Mobile Information Systems, 2022.
- Türkmenoğlu, B. K., & Yildiz, O. (2021, June). Predicting the survival of heart failure patients in unbalanced data sets. In 2021 29th Signal Processing and Communications Applications Conference (SIU) (pp. 1-4). IEEE.
- Patro, S. P., Nayak, G. S., & Padhy, N. (2021). Heart disease prediction by using novel optimization algorithm: A supervised learning prospective. Informatics in Medicine Unlocked, 26, 100696.
- Sudha, V. K., & Kumar, D. (2023). Hybrid CNN and LSTM Network For Heart Disease Prediction. SN Computer Science, 4(2), 172.
- Shah, W., Aleem, M., Iqbal, M. A., Islam, M. A., Ahmed, U., Srivastava, G., & Lin, J. C. W. (2021). A Machine-Learning-Based System for Prediction of Cardiovascular and Chronic Respiratory Diseases. Journal of Healthcare Engineering, 2021.
- Perumal, R., & Kaladevi, A. C. (2020). Early prediction of coronary heart disease from cleveland dataset using machine learning techniques. Int. J. Adv. Sci. Technol, 29, 4225-4234.
- Rajendran, R., & Karthi, A. (2022). Heart disease prediction using entropy based feature engineering and ensembling of machine learning classifiers. Expert Systems with Applications, 207, 117882.
- Pan, C., Poddar, A., Mukherjee, R., & Ray, A. K. (2022). Impact of categorical and numerical features in ensemble machine learning frameworks for heart disease prediction. Biomedical Signal Processing and Control, 76, 103666.
- Jackins, V., Vimal, S., Kaliappan, M., & Lee, M. Y. (2021). AI-based smart prediction of clinical disease using random forest classifier and Naive Bayes. The Journal of Supercomputing, 77(5), 5198-5219.
- Chinnasamy, P., Kumar, S. A., Navya, V., Priya, K. L., & Boddu, S. S. (2022). Machine learning based cardiovascular disease prediction. Materials Today: Proceedings.
- Gao, C., & Elzarka, H. (2021). The use of decision tree based predictive models for improving the culvert inspection process. Advanced Engineering Informatics, 47, 101203.
- Schober, P., & Vetter, T. R. (2021). Logistic regression in medical research. Anesthesia and analgesia, 132(2), 365.
- Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., & Lopez, A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing, 408, 189-215.
- Sheykhmousa, M., Mahdianpari, M., Ghanbari, H., Mohammadimanesh, F., Ghamisi, P., & Homayouni, S. (2020). Support vector machine versus random forest for remote sensing image classification: A meta-analysis and systematic review. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 6308-6325.
- Valupadasu, R., & Chunduri, B. R. R. (2019, May). Automatic classification of cardiac disorders using MLP algorithm. In 2019 Prognostics and System Health Management Conference (PHM-Paris) (pp. 253-257). IEEE.
- Juba, B., & Le, H. S. (2019, July). Precision-recall versus accuracy and the role of large data sets. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01, pp. 4039-4048).
- Powers, D. M. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061.
- Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC genomics, 21(1), 1-13.
- Chicco, D., Tötsch, N., & Jurman, G. (2021). The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData mining, 14(1), 1-22.
- https://archive.ics.uci.edu/ml/datasets/Z-Alizadeh+Sani.
- Blagus, R., & Lusa, L. (2015). Joint use of over-and under-sampling techniques and cross-validation for the development and assessment of prediction models. BMC bioinformatics, 16(1), 1-10.
- Kovács, G. (2019). An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. Applied Soft Computing, 83, 105662