Utilising Exploratory Data Analysis and Machine Learning Algorithms for Heart Disease Analysis and Prediction


  • Humra Khan Department of Computer Science and Engineering, Amity University Uttar Pradesh, Lucknow, India
  • P. Singh Department of Computer Science and Engineering, Amity University Uttar Pradesh, Lucknow Campus, India




Machine Learning, Statistical-based, Heart Disease Analysis, Heart Disease Prediction


As one of the most common and potentially fatal diseases in the world, heart disease must be detected early for proper treatment. With exploratory data analysis (EDA) and machine learning algorithms for predictive analysis, this research project seeks to thoroughly investigate the different aspects that contribute to heart disease. This will enable prompt diagnosis and risk mitigation. Numerous crucial features affecting the diagnosis of heart disease have been found through in-depth exploratory analysis of data. Among these features, the number of major arteries stained by fluoroscopy, the various forms of chest pain, the maximum heart rate reached, exercise-induced angina, the slope of the peak exercise ST segment, and the ST depression brought on by activity relative to rest stand out as most significant factors. Clinicians can learn a great deal about a patient's risk of developing heart disease by carefully examining these characteristics. In order to put this research's predictive component into practice, machine learning classifiers are built using the UCI heart disease dataset, which contains important variables pertaining to cardiac health. For comparison analysis, six different methods are used: Random Forest (RF), Gradient Boost (GB), K-Nearest Neighbour (KNN), Decision Tree (DT), Support Vector Machine (SVM), and Logistic Regression (LR). After conducting a comprehensive analysis, it has been determined that the Random Forest classifier has the best accuracy rate, attaining a remarkable 85.25%.


Download data is not yet available.


A. L. Bui, T. B. Horwich, and G. C. Fonarow, "Epidemiology and risk profile of heart failure," Nat. Rev. Cardiol., vol. 8, no. 1, pp. 30–41, 2011. [Online]. Available: https://doi.org/10.1038/nrcardio.2010.165

K. Vanisree, "Decision Support System for Congenital Heart Disease Diagnosis based on Signs and Symptoms using Neural Networks," Int. J. Comput. Appl., vol. 19, 2011.

R. Indrakumari, T. Poongodi, and S. R. Jena, "heart disease prediction using exploratory data analysis," Procedia Comput. Sci., vol. 173, pp. 130–139, 2020. [Online]. Available: https://doi.org/10.1016/j.procs.2020.06.017

H. C. Koh and G. Tan, "Data mining applications in healthcare," J. Healthcare Inform. Manag., vol. 19, no. 2, pp. 64–72, 2011.

J. Manhas, R. K. Gupta, and P. P. Roy, "A review on automated cancer detection in medical images using machine learning and deep learning based computational techniques: Challenges and opportunities," Arch. Comput. Methods Eng., vol. 29, pp. 2893–2933, 2021. [Online]. Available: https://doi.org/10.1007/s11831-021-09676-6

A. Barragán-Montero et al., "Artificial intelligence and machine learning for medical imaging: a technology review," Phys. Med., vol. 83, pp. 242–256, 2021.

S. Nazir, S. Shahzad, S. Mahfooz, and M. Nazir, "Fuzzylogicbased decision support system for component security evalua-tion," Int. Arab J. Inf. Technol., vol. 15, pp. 224–231, 2018.

M. I. Hossain et al., "Heart disease prediction using distinct artificial intelligence techniques: performance analysis and comparison," Iran J. Comput. Sci., 2023. [Online]. Available: https://doi.org/10.1007/s42044-023-00148-7

O. Sami, Y. Elsheikh, and F. Almasalha, "The role of data pre-processing techniques in improving machine learning accu-racy for predicting coronary heart disease," Int. J. Adv. Comput. Sci. Appl., vol. 12, 2021. [Online]. Available: https://doi.org/10.14569/ijacsa.2021.0120695

H. Benhar, A. Idri, and J. L. Fernández-Alemán, "Data preprocessing for heart disease classification: A systematic literature review," Comput. Methods Programs Biomed., vol. 195, p. 105635, 2020. [Online]. Available: https://doi.org/10.1016/j.cmpb.2020.105635

R. Detrano et al., "International application of a new probability algorithm for the diagnosis of coronary artery disease," Am. J. Cardiol., vol. 64, no. 5, pp. 304–310, 1989. [Online]. Available: https://doi.org/10.1016/0002-9149(89)90524-9

B. Edmonds, "Using Localised ‘Gossip’ to Structure Distributed Learning," Struct. Distrib. Learn., 2005.

K. Mahalakshmi and P. Sujatha, "The role of exploratory data analysis and pre-processing in the machine learning predic-tive model for heart disease," in 2023 Int. Conf. Adv. Comput. Commun. Appl. Informatics (ACCAI). IEEE, 2023, pp. 1–8. [Online]. Available: https://doi.org/10.1109/ACCAI58221.2023.10199714

Heart-disease-dataset Homepage, https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset/data, last ac-cessed 2023/09/27.

F. Pedregosa et al., "Scikit-learn: Machine Learning in Python," arXiv [cs.LG], 2012. [Online]. Available: http://arxiv.org/abs/1201.0490

S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy, "Improvements to Platt’s SMO algorithm for SVM classi-fier design," Neural Comput., vol. 13, no. 3, pp. 637–649, 2001. [Online]. Available: https://doi.org/10.1162/089976601300014493

D. W. Aha, D. Kibler, and M. K. Albert, "Instance-based learning algorithms," Mach. Learn., vol. 6, pp. 37–66, 1991. [Online]. Available: https://doi.org/10.1007/bf00153759

E. Zeinulla, K. Bekbayeva, and A. Yazici, "Effective diagnosis of heart disease imposed by incomplete data based on fuzzy random forest," in 2020 IEEE Int. Conf. Fuzzy Syst. (FUZZ-IEEE).

I. H. Sarker, P. Watters, and A. Kayes, "Effectiveness analysis of machine learning classification models for predicting per-sonalized context-aware smartphone usage," J. Big Data, vol. 6, pp. 1–28, 2019.

Y. Amit and D. Geman, "Shape quantization and recognition with randomized trees," Neural Comput., vol. 9, pp. 1545–1588, 1997.

J. Brownlee, "Tour of Evaluation Metrics for Imbalanced Classification," Mach. Learn. Mastery, 2021.

JMSS 056




How to Cite

H. Khan and P. Singh, “Utilising Exploratory Data Analysis and Machine Learning Algorithms for Heart Disease Analysis and Prediction”, J. Manage. Serv. Sci., vol. 4, no. 1, pp. 1–9, Apr. 2024.




Research Article