CARDIOPREN: An Explainable Autoencoder–RNN Ensemble Framework for Accurate Cardiovascular Disease Prediction
HarpreetKaur1Email
AmlanKumarSarkar2
ArunSingh1EmailEmail
JalakanuriGnaneswarRaju1
Md KhadimulIslam Zim3
RaunakRaj4✉Email
1Department of Computer Science and EngineeringLovely Professional UniversityPhagwaraPunjabIndia
2
A
School of Computational SciencesGNA UniversityPhagwaraPunjab, Inida
3Faculty of Mathematics, Institute of Computer ScienceCzech Academy of SciencesPod Vodárenskou věží 271/218200Praha 8Czech Republic
4Parul Institute of Engineering and TechnologyParul UniversityGujaratIndia
Harpreet Kaur1, Amlan Kumar Sarkar2, Arun Singh3*, Jalakanuri Gnaneswar Raju4, Md Khadimul Islam Zim5, Raunak Raj6*
1,3,4 Department of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India
2 School of Computational Sciences, GNA University, Phagwara, Punjab, Inida
5 Faculty of Mathematics and Institute of Computer Science, Czech Academy of Sciences, Pod Vodárenskou věží 271/2, 18200 Praha 8, Czech Republic
6 Parul Institute of Engineering and Technology, Parul University, Gujarat, India
1 drharpreetarora81@gmail.com, 2amlansarkarbd@gmail.com, 3 arunmandiarun2001@gmail.com, 4 jalakanuri.12326750@lpu.in, 5 mekuzim@gmail.com, 6 raunak.raj40706@paruluniversity.ac.in
Corresponding Author: raunak.raj40706@paruluniversity.ac.in, arunmandiarun2001@gmail.com
Abstract—
Cardiovascular diseases are a leading cause of death throughout the world. Thus, there is a pressing requirement for developing a model for early and effective prediction based on symptoms. Here, a novel system for predicting heart attack using deep learning is proposed which blends a combination of temporal and unsupervised learning approaches in order to enhance diagnostic accuracy. To ensure model validity, proposed methodology includes proper data preprocessing which includes handling missing values, cleaning contradictory records and making use of large patient health databases like UCI Cleveland Heart Disease Dataset and Kaggle Heart Failure Clinical Records Dataset. Feedforward Neural Networks, Recurrent Neural Networks, Autoencoders and a hybrid ensemble model referred as CARDIOPREN are some of the deep learning frameworks which are compared. Feature extraction and dimensional reduction are taken up by autoencoders, while patient histories' sequential patterns are identified by RNNs which are augmented with Long Short-Term Memory units. From the results it is observed that 84%, 85.25%, and 84% accuracy were achieved by FNN, RNN, and Autoencoder-classifier combinations respectively. However, proposed CARDIOPREN model is a combination of Autoencoder and RNN modules which is superior over any other model with accuracy 93.4%, F1-score 0.91 and AUC-ROC 0.95. It is revealed in this study how accurately cardiac attack is predicted when it is combined in ensemble strategies which take up temporal modeling along with feature compression. A robust, universally applicable solution for clinical decision support and early cardiovascular disease diagnosis is offered by CARDIOPREN architecture.
Click here to Correct
Index Terms—
Cardiovascular Disease
Ensemble Learning
RNN
LSTM
Autoencoder
Feature Extraction
Pattern Analysis
Dimensionality Reduction
AUC-ROC Score
I.
INTRODUCTION
Cardiovascular disease is the top cause of death among people in the world today. This study introduces a new and original plan of forecasting heart incidents by using deep learning algorithms and techniques on large amounts of health-related data. Such deep learning algorithms can be used to construct a smart model to predict heart attack, depending on the history of a patient. In addition to the deep learning classification approaches such as Recurrent Neural Network, Feedforward Neural Network and Autoencoder, we introduce the approach of CARDIOPREN, through which various datasets of patients were prepared comprising of their medical histories and clinical evaluation. Pre-processing of data is also essential in training the deep learning model since it is heavily relying on it [1]. Cleaning methods were, therefore, utilized in facilitating a more comprehensive model performance, as shown through techniques in treating missing variables. The trained and improved version of our deep learning model can then be trained using the already processed data [2]. Therefore, this paper makes a glimpse into the deep learning models of the heart attack prediction of CARDIOPREN, Autoencoders, RNN, and FNN models. The findings showed that FNN was 84% accurate, RNN was bring 85.25 percent, autoencoders and a classifier were 84 percent, and guilty of CARDIOPREN (Cardiovascular Prediction using AE and RNN) was 90 percent [3]. In this paper, the Recurrent Neural Network, Feed Forward Neural Network, Autoencoder, and hybrid model CARDIOPREN deep learning algorithms have been compared and the importance of data pre-processing stressed.
Cardiovascular diseases, particularly Heart attacks are the major cause of death in both males and females nowadays. There is a record that says more than 17.9 million deaths are due to heart attacks worldwide [4]. Predicting heart attacks remains a challenging task due to the complex interplay of risk factors such as age, lifestyle, genetics, and underlying health conditions. These Heart attack deaths are increasing day by day which might be due to the food we eat, drinks we consume and there are many other reasons. For predicting the diseases associated with the heart, it is important to understand the complete working structure of human health and its biological processes. The damage cause in heart can visualized through different modality of images, one of them is shown below (see. Figure 1).
In these many heart attacks are happening because they were unknown. We might not cure the diseases, but we can prevent these heart attacks by early prediction through Machine Learning and Deep Learning. In recent years, the advent of machine learning and deep learning has opened new avenues for predictive analytics in healthcare, enabling the development of automated systems for early disease detection. Deep learning algorithms can predict future Heart attacks of patients so that they can take preventive from now and see the improvement in their health.
A
Fig. 1
Heart image visualization for muscle damage
A heart attack occurs when the flow of blood stops suddenly to a part of the heart and this is usually done when there is a lot of plaque building up inside the coronary arteries. It stops oxygen-rich blood supplied from other parts of the body to the heart which causes the heart muscle cells to die, if the blockage isn't fixed quickly, Today, heart attacks are one of the main concerns of death worldwide. On the other hand, it is a big sign of cardiovascular disease also. According to the World Health Organization, heart attacks are responsible for 32% of all deaths globally [5]. The most common signs of a heart attack are chest pain or discomfort having difficulty breathing. Therefore, cold sweats, nausea or vomiting and feeling faint or dizzy are also concerns of the heart attack that fails the circulation of the blood.
One of the main reasons for heart attacks is taking unhealthy food, lack of involvement in physical activity, obesity and an unhealthy lifestyle. On the other hand, consumption of fast foods, sugars, unhealthy fats and alcohol consumes a high level of cholesterol that is also a high-risk factor for heart attack. Genetic predisposition and diabetes are also increasing the risk of heart attacks across all age groups. Therefore, stressful professions and an imbalanced work schedule can also the major causes of heart attacks. Statistics show that the age group of above 60 years has having most heart attacks due to their aging arteries, long-term exposure to risk factors and underlying conditions.
This paper mainly focuses on the Deep learning algorithms to predict Heart attacks of humans with the use of algorithms like Feed forward Neural Network, Recurrent Neural Network, and Autoencoders and compares their accuracy results. As the results of model algorithms mainly depend on the training data, Data pre-processing plays a vital role in the training of models. The data sets should be clean and pre-processed so that the model can be trained properly and can be able to predict the results accurately. This will improve and enhance the ability and efficiency of the model.
The dataset contains clinical features of patients, and the target variable indicates the presence or absence of heart disease. This extraction is very important in model training because models train according to these features to predict the presence or absence of heart attacks of patients. By evaluating the accuracy, precision, recall, and F1-score of each model, we aim to identify the most effective approach for heart attack prediction.
A
A
A. Background on Cardiovascular Diseases
Cardiovascular diseases are the leading cause of death worldwide because they are a severe menace to human health. The fact is that taking patients seriously can completely transform patients outcomes and reduce the pressure on healthcare systems. Therefore, prediction accuracy and early detection becomes an essential mode in this scenario. A conventional method of diagnosis is electrocardiography, the primary method of identifying heart abnormalities over the years. Electrocardiogram is a recording of the electrical activity of the heart. In addition to this, it is one of the primary parameters that are considered in the diagnosis of different conditions, such as cardiomyopathy, arrhythmias, myocardial ischemia, and heart diseases. It is a vital device in the field of medicine because it is an indispensable tool due to its availability, cost-effectiveness, and non-invasiveness. Despite the significance of the ECG, sometimes its interpretation may be challenging. Again, manually performed ecg interpretation tends to cause inter-rater variability simply because it is laborious and depends on the expertise of trained physicians. In addition, even with standard ECG machines, in non-pathological conditions, rare or non-persistent arrhythmia might not be recorded because they are not present during the recording period; which can be a massive drawback of the method. The chronic nature of the patient and the challenge posed by electrode placement against the precision of recording also must be taken into consideration [16].
Moreover, ECG signal processing is needed to interpret the signal accurately. Their inherent nature is susceptible to motion artifacts and noise, muscle activity, and powerline interference. Conventional methods of diagnosis leave a lot to be desired. With all this in mind, we desperately require automated, reliable and scalable solutions to simple augmentation and possibly revolutionary applications. This will assist in early detection and prevention. The complexity of real medical data, which consists of multiple biological predictors and risk factors, such as age, gender, high blood pressure, diabetes, and cholesterol level, complicates the process of proper prediction even further [17].
B. Overview of Deep Learning's Potential in Healthcare
Deep Learning architectures have become extremely effective tools, outperforming conventional Machine Learning techniques in a variety of healthcare applications, such as knowledge discovery and disease classification. Advanced features may be automatically extracted from multi-modal patient data using DL models, improving risk assessment and facilitating individualized health care. The primary factor driving the shift in healthcare from classical machine learning to deep learning is DL's innate capacity to handle unstructured data and automatically carry out feature engineering. Conventional machine learning models frequently depend on the labor-intensive, highly domain-specific and sometimes biased manual feature selection and extraction procedure. Four, to overcome the drawbacks, deep learning techniques can be applied by automatically identifying complex patterns and representations from raw data. Autonomous feature learning is therefore necessary in complicated and high-dimensional medical data where manual methods are often expensive and poor. Current practice, in order to achieve scalability and efficiency especially in healthcare setup, ought to embark toward an analytical pipeline that is more autonomous and less reliant on human labor [16–18].
C. Motivation for the Proposed Ensemble Approach
Individual deep learning models have great potential to predict cardiac diseases but fail miserably especially when faced with the task of dealing with medical datasets that are more advanced. Class imbalance is a common characteristic of medical data whose samples representing various types of heart disease or health versus disease are vastly disproportionate [17]. This imbalance may result in both worse performance on the minority class, which is typically the most valuable disease state, and model bias in prediction. In order to establish ensemble learning, an approach to combining multiple models versus single models, it is generally considered an efficient tool to improve robustness, alleviate imbalance challenges in the dataset, and a broad increase in prediction accuracy [19].
The suggested approach leverages the complementary capabilities of two powerful deep learning models- RNNs and AEs. Autoencoders also possess their benefits in processing high-dimensional and potentially noisy medical data, as they do not rely on supervised learning to produce features, dimension reduction, and denoising. First, unlike them, RNNs can quite well interpret temporal data, such as ECGs signals and longitudinal Electronic Health Records (EHRs), by detecting sequential dependencies and long-term correlations. Second, the various models or different but complementary models may even serve to compensate the deficiencies of each other with the benefits of synergy through the ensemble scheme, including stacking. The aim of this method of integration is to construct a stronger and more precise prediction system of cardiac disease, which must confront the complexities traditionally linked with the sequential asymmetry of clinical information [20–22].
II. RELATED WORK
Several researchers have developed various machine learning and deep learning tools for predicting cardiovascular illness in an attempt to provide early diagnosis and reduce mortality. Kundavaram Joseph Sujith Kumar [1] showed how ensemble learning could work effectively by proposing a multi-model strategy for cardiac stroke prediction that uses Logistic Regression, Random Forest, and SVC combined at a high testing accuracy of 99%. S. Durai [2] contrasted among Naive Bayes, KNN, SVM, Random Forest, and Decision Tree with a view to improving prediction through feature selection and achieved 84% accuracy. Deep learning includes recurrent neural network-based models in the early detection of Ganesh C. [3], which obtained 90% precision, thus outperforming several earlier models by 4.4%. According to Gurpreet Singh [4], although ML models compared with DL models, Decision Trees had the best accuracy of 98.04%. With a 90% accuracy rate and a deep learning model tuned by the Satin Bowerbird algorithm, Kamal Kumar Gola [5] demonstrated the promise of wearable health monitoring devices. The Random Forest algorithm has the highest accuracy of 99.49%, followed by Decision Tree (98.99%) and KNN (98.91%), according to Sachin R. Jadhav's study of several machine learning approaches [6]. Using WEKA and hybrid machine learning techniques, Mahaveer [7] improved accuracy by 3.33% over previous models, with Random Forest achieving 86.89%. A multi-layer perceptron-based deep learning model was suggested by Paranthaman M [8], who focused on better healthcare analytics and prompt diagnosis. To improve predicted accuracy, V. Kannagi [9] introduced Intelligent Learning Assisted SVM, a system that combined data mining with DL. Using eight machine learning algorithms on several datasets, Bassam A. Abdelghani [10] found that Random Forest identified important predictive variables and performed the best, with an accuracy of 89.9%. In his analysis of machine learning applications in healthcare, Deepak Kumar Rathore [11] concentrated on feature extraction from medical imaging and electronic health data to enhance results. In his analysis of the usefulness of machine learning and data analytics in heart attack prediction, Muhammad Nabeel [12] focused on variables such as blood pressure and nutrition, and both Random Forest and KNN achieved 90.1% accuracy. In order to improve classification performance, Wang Fangyu [13] used a deep learning-based oversampling model with variational autoencoders to solve class imbalance. In the Cleveland dataset, Dr. M. Kavitha [14] proposed a hybrid model of Random Forest with Decision Tree and the accuracy for this model was 88.7%. Youness Khourdifi [15] also worked in machine learning (ML) for breast cancer prediction but outside the cardiovascular spectrum. An accuracy of 97.9% was achieved using SVM, stressing the importance of ML for early detections in various medical domains.
The heart disease prediction is another amazing revolution in deep learning due to the use of the different architectures based on the requirements of the different modalities of medical data. Convolutional Neural Networks (CNNs) like even an area to be mostly analyzed by means of fixed or image-like signals in the shape of medically obtained data such as Eco Cardiograms (ECGs). They can recognize the types of the heartbeats, and also learn hierarchical features to classify the ECG data into normal and abnormal conditions automatically. The hybrid models that use CNNs and include CNN-Long Short-Term Memory (LSTM) and CNN-Gated Recurrent Unit (GRU) networks can be seen as the most common ones that combine the time factor with the space in order to enhance the predictive capabilities [23]. RNNs suit sequential and time-series data, hence they accommodate learning long-term dependencies and temporal correlations in medical records, physiological parameters, and activity data [24].
For example, RNN extensions like GRU and LSTM outperform traditional methods in predicting future disease diagnoses from historical electronic health records (EHRs) because of their ability to encode time-stamped events and learn latent representations for classification tasks. This ability is required for understanding disease courses and predicting further events based on the history of health trajectories of a patient [25]. Furthermore, autoencoders (AEs) work excellently on unsupervised extraction of useful features particularly good for denoising and dimensionality reduction of high-dimensional medical applications in the form of ECG waveforms and EHRs [26]. Sparse Autoencoders (SAEs), for instance, can be several to differentiate between cases such as Atrial Fibrillation (AFib) and Normal Sinus Rhythm (NSR) through morphological ECG signal information retrieval. Larger and more complex clinical datasets therefore more often will see AEs downproject the input data to a lower-dimensional latent representation, preserving relevant information. Autoencoders can also be trained with classification loss to further fine-tune features on a specific classification task. Beyond this, AEs may offer more simplistic accounting vis-a-vis time in representation, and their pretrained weights may reduce training time and positively affect predictive performance [27]. This deep learning advancement in heart disease predictions has been rather tangible towards the augmentation of data complexity addressed by the models. It implied that this used or processed something very basic using CNNs on images with the aid of static features. Then came adoption of RNNs due to temporal nature of physiological signals and medical history. Then it is the autoencoders to address the difficulties of the high dimensionality discussed above, noise, and powerful feature learning of complex inputs. Therefore, this direction means transition to more comprehensive analysis of patient data, i.e. data type and time pattern analysis rather than individual symptom anticipation [28].
Indeed, it is well known that ensemble techniques can indeed be used to influence fine-tuning the overall robustness of the prediction performance of a set of classifiers [29]. The best of them is stacking ensemble, where the meta-learner considers the prediction of various simple models to optimally come up with a refined final decision [30]. All these available models of ensembles have been tested in different combinations to predict heart disease. In this vein, some researchers have incorporated hybrid deep learning architectures like CNN-GRU and CNN-LSTM with SVM commonly identified as a meta-learner to enhance performance prediction [31]. To improve cardiovascular disease diagnoses, various researchers have combined deep learning models such as GRU and LSTM and traditional machine learning classifiers such as Random Forest, Gradient Boosting, SVM and K-Nearest Neighbors (KNN) in Voting or stacking models [32].
Additionally, it has been demonstrated that by utilizing the combined capabilities of various models, ensemble techniques like bagging and boosting greatly improve the accuracy of forecasts for cardiovascular disease [33]. Despite the proven benefits of ensemble methods and the individual strengths of Autoencoders (AEs) and Recurrent Neural Networks (RNNs) in handling medical data, a significant gap exists in the current literature regarding direct ensemble architectures that specifically combine Autoencoders and RNNs for heart disease prediction within a unified framework. Most existing ensemble work focuses on combining different types of classifiers, such as CNNs with LSTMs, or various traditional machine learning algorithms [34]. While Autoencoders are frequently employed for anomaly detection or as a preliminary step for feature engineering prior to classification, there is limited explicit discussion of an ensemble where AE-derived features are directly integrated with RNN processing in a combined, stacked ensemble specifically for heart disease prediction [35]. For instance, one study described an LSTM-Autoencoder model for noise signal detection in time-series data, followed by a separate ensemble for classification; however, this does not constitute a direct ensemble of AE-RNN components for the primary predictive task itself [27]. This identified gap presents a unique opportunity for proposed research to contribute a novel and potentially more powerful predictive system by combining the unsupervised feature learning capabilities of Autoencoders with the sequential data processing strengths of RNNs within a unified ensemble framework for heart disease prediction.
III. LITERATURE REVIEW
Table I shows various ML and DL algorithmic approaches has been done by different researchers with detailed findings.
Year
Title
Authors
Findings
2024
Multi-Model Supervised Machine Learning Techniques for Heart Stroke Prediction
Kundavaram Joseph Sujith Kumar, Stewart Kirubakaran S, M. Roshni Thanka, V. Ebenezer, E. Bijolin Edwin, Priscilla Joy
Proposes a multi-model approach (Logistic Regression, Random Forest, SVC) for heart stroke prediction.
2024
Revolutionizing Cardiovascular Attack Prediction: A Comprehensive Machine Learning Approach for Accurate and Timely Detection
S. Durai, Prabhu V, D. Jaganathan, R. Harini
Uses ML algorithms (Naive Bayes, KNN, SVM, Random Forest, Decision Tree, Logistic Regression) for cardiovascular attack prediction.
2023
Heart Disease Diagnosis Using Deep Learning
Ganesh C, Lordson Gnana Durai A, Baavana Bandarupalli, Pavithra S, Prabanjan R
Proposes a RNN deep learning appraoch for heart disease prediction.
2023
Machine Learning and Deep Learning Models for Early Detection of Heart Disease
Gurpreet Singh, Kalpna Guleria, Shagun Sharma
Compares Naive Bayes, MLP, Decision Tree, Logistic Regression for heart disease prediction.
2023
Satin Bowerbird Optimization-Based Classification Model for Heart Disease Prediction Using Deep Learning in E-Healthcare
Kamal Kumar Gola, Shikha Arya
Proposes a deep learning model which is optimized by Satin Bowerbird algorithm for heart disease prediction.
2023
Monitoring and Predicting Heart Diseases Using Machine Learning Techniques
Sachin R. Jadhav, Prajot Pujari, Rohan Kulkarni, Swapnil Patwari, Aditya Yendralwar
Reviews ML techniques for heart disease prediction using healthcare data.
2022
Cardiovascular Disease Prediction Analysis Using Classification Techniques
Mahaveer, Puneet
Uses WEKA and hybrid ML techniques to predict cardiovascular disease.
2022
Cardiovascular Disease Prediction Using Deep Learning
Paranthaman M, Santhosh S, Sanjairam M, Yaathash B
Proposes a deep learning model using Multi-Layer Perceptron for heart disease prediction.
2022
Logical Mining Assisted Heart Disease Prediction Scheme in Association with Deep Learning Principles
V. Kannagi, M. Rajkumar, I. Chandra, K. Sangeethalakshmi, V. Mohanavel
Develops a heart disease prediction through data mining and deep learning approach.
2022
Prediction of Heart Attacks Using Data Mining Techniques
Bassam A. Abdelghani, Sophia Fadal, Shadi Bedoor, Shadi Banitaan
Combines multiple datasets to predict heart attacks using eight ML algorithms.
2021
A Review of Machine Learning Techniques and Applications for Health Care
Deepak Kumar Rathore
Reviews ML and deep learning techniques in healthcare, focusing on medical imaging and electronic health records.
2021
Heart Attack Disease Data Analytics and Machine Learning
Muhammad Nabeel, Hooria Muslih-ud-Din, Mazhar Javed Awan, Shumaila Majeed, Mohsin Raza
Analyzes heart attack disease using machine learning and data analytics.
2021
Research on Imbalanced Data Set Preprocessing Based on Deep Learning
Wang Fangyu, Zhang Jianhui, Bu Youjun, Chen Bo
Proposed deep learning oversampling model to address imbalanced datasets.
2021
Heart Disease Prediction Using Hybrid Machine Learning Model
Dr. M. Kavitha, G. Gnaneswar, R. Dinesh, Y. Rohith Sai, R. Sai Suraj
Proposed hybrid model combining Random Forest and Decision Tree for heart disease prediction.
2018
Applying Best Machine Learning Algorithms for Breast Cancer Prediction and Classification
Youness Khourdifi
The study applies machine learning algorithms to predict breast cancer.
2023
Morphological autoencoders for beat-by-beat atrial fibrillation detection using single-lead ECG
R. Silva, A. Fred, and H. Plácido da Silva
AEs for unsupervised feature learning and dimensionality reduction are well-established for heart disease applications
2023
Autoencoder-based feature learning for predicting cardiovascular disease
A. P. Giovani, H. F. Pardede, and A. Subekti
Individual strengths of AEs (for unsupervised feature learning and dimensionality reduction are well-established for heart disease applications
2023
Sequential autoencoders for feature engineering and pretraining in major depressive disorder risk prediction
B. W. Jones, W. D. Taylor, and C. G. Walsh
Individual strengths of AEs (for unsupervised feature learning and dimensionality reduction are well-established for heart disease applications
2018
An autoencoder-based deep learning approach for heart disease prediction
H. Li, Y. Zhang, Y. Wang, and S. Li
Individual strengths of AEs (for unsupervised feature learning and dimensionality reduction are well-established for heart disease applications
2018
Heart disease prediction using sparse autoencoder and support vector machine
K. B. Singh and S. S. Choudhary
Individual strengths of AEs (for unsupervised feature learning and dimensionality reduction are well-established for heart disease applications
2019
ECG signal classification using autoencoder for anomaly detection
P. Panthong, C. Ketuwan, and S. Prueksakorn
Individual strengths of AEs (for unsupervised feature learning and dimensionality reduction are well-established for heart disease applications
2019
Predicting heart disease using recurrent neural network
N. K. Singh, A. Tripathi, and A. K. Singh
Individual strengths of RNNs (for sequential and time-series data are well-established for heart disease applications
2018
Predicting heart disease with deep learning based on electronic health records
Predicting heart disease using recurrent neural network
Individual strengths of RNNs (for sequential and time-series data are well-established for heart disease applications
2020
Ensemble deep learning models for heart disease classification: A case study from Mexico
A. Baccouche, B. Garcia-Zapirain, C. Castillo Olea, and A. Elmaghraby
Lackings of direct, integrated stacking ensemble where AE-derived features are systematically combined with RNN processing for the primary predictive task of heart disease
2019
Ensemble learning for heart disease prediction: A survey
G. K. Singh, V. Kumar, and A. Kumar
Lackings of direct, integrated stacking ensemble where AE-derived features are systematically combined with RNN processing for the primary predictive task of heart disease
2021
Heart disease prediction based on stacking ensemble learning
L. L. Wang, Y. Lu, and X. X. Li
Lackings of direct, integrated stacking ensemble where AE-derived features are systematically combined with RNN processing for the primary predictive task of heart disease
2020
Heart disease prediction using a hybrid deep learning model with an ensemble approach
A. Al-Ajlan
Lackings of direct, integrated stacking ensemble where AE-derived features are systematically combined with RNN processing for the primary predictive task of heart disease
2020
Ensemble learning for heart disease diagnosis using machine learning algorithms
P. B. Mane and R. R. Mudholkar
Lackings of direct, integrated stacking ensemble where AE-derived features are systematically combined with RNN processing for the primary predictive task of heart disease
2021
Effective heart disease prediction using hybrid machine learning techniques
S. Mohan, C. Thirumalai, and G. Srivastava
Lackings of direct, integrated stacking ensemble where AE-derived features are systematically combined with RNN processing for the primary predictive task of heart disease
2023
Deep learning for ECG arrhythmia detection and classification: An overview of progress for period 2017–2023
Y. Ansari, O. Mourad, K. Qaraqe, and E. Serpedin
Deep learning approach such as CNN but not experimented AE-RNN integration
2021
Heart disease prediction using deep learning: A comprehensive review
A. P. Singh, A. Kumar, and M. Hanmandlu
Deep learning approach such as CNN but not experimented AE-RNN integration
2019
Deep learning in healthcare: A review
C. K. Singh, P. K. Singh, and S. Gupta
Deep learning approach such as CNN but not experimented AE-RNN integration
2021
A comprehensive review of deep learning approaches for heart disease prediction
S. Bashir, Z. Qamar, and F. U. Islam
Deep learning approach such as CNN but not experimented AE-RNN integration
IV. PROBLEM STATEMENT & RESEARCH GAP
Effectively managing heterogeneous, high-dimensional, and frequently temporal clinical data continues to be a persistent difficulty, despite notable progress in deep learning for heart disease prediction. These difficulties are made worse by problems like widespread data imbalance and the urgent need for interpretable models that medical professionals can rely on and use in actual clinical situations.
A. GAP 1
As identified, there is limited direct literature on ensemble architectures that specifically combine Autoencoders and Recurrent Neural Networks for heart disease prediction. Existing work often treats Autoencoders as a preprocessing step for feature learning, or combines different types of classifiers in an ensemble. A unified ensemble framework that leverages Autoencoders for robust feature learning and RNNs for temporal sequence modeling is not widely explored.
B. GAP 2
High class imbalance is a common feature in medical datasets, when the proportion of samples from healthy people greatly exceeds that from people with particular cardiac diseases. 1 Usually the minority class is the class of clinical interest, and this imbalance frequently results in poor performance on the minority class and biased model predictions. Two Additionally, deep learning models are frequently seen as "black boxes," with opaque decision-making processes, which impedes their general adoption and credibility in crucial therapeutic situations.
V.
RESEARCH GAP PROPOSED SOLUTION
A. GAP SOLUTION 1
The research proposes a novel ensemble architecture where Autoencoders are utilized to learn robust, low-dimensional feature representations from raw or minimally preprocessed patient data, including both static clinical features and time-series components. These learned features are then fed into Recurrent Neural Networks (e.g., LSTMs or GRUs) to capture temporal dependencies and make preliminary predictions. The final ensemble will combine the outputs of multiple such AE-RNN branches, or use a meta-learner to aggregate their predictions, thereby creating a synergistic model that capitalizes on the strengths of both architectures.
B. GAP SOLUTION 2
In order to provide synthetic examples for the minority class, the study will use adaptive sampling techniques or robust resampling techniques like SMOTE to specifically address data imbalance. Specific Explainable AI techniques SHAP and LIME incorporated to enhance transparency and foster trust among healthcare professionals. In doing so, these methods highlight the decision-making processes of models and allow for the identification of important risk variables as well as contributions to the forecasts in terms of features.
VI. RESEARCH OBJECTIVES & METHODOLOGY
a.
A. Objectives
1.
In designing and implementing a new, deep-learning ensemble framework, the blend of an autoencoder for unsupervised feature learning along with Recurrent Neural Networks for temporal pattern recognition will lead to an accurate prediction of heart disease.
2.
To critically assess the operation of the integrated model compared to the most modern deep learning and conventional machine learning models using publicly available heart disease datasets.
3.
To create and incorporate explainable AI techniques such as SHAP, LIME, into the ensemble model's predictions for the benefit of interpretation and transparency, with actionable insights for health professionals.
4.
To understand how hyperparameters influence the performance of the developed deep learning ensemble models for heart disease prediction.
B. Methodology
Autoencoder-RNN Ensemble Model
This model is a Stacked Ensemble model where multiple specialized Autoencoder-RNN hybrid models serve as base learners, and a meta-learner combines their predictions. The initial input to each base learner will consist of raw or minimally preprocessed patient data. This includes structured clinical features such as demographics, physiological measurements, laboratory results and time-series data such as ECG signals, longitudinal EHR event sequences. The model is designed to handle the heterogeneity of clinical data. An unsupervised or supervised Autoencoder will be trained as the first stage. Variants such as Sparse Autoencoders, Denoising Autoencoders, or Variational Autoencoders can be explored. This component's primary role is to learn a compressed, robust latent representation from the input data. This process effectively performs dimensionality reduction and noise attenuation which is particularly beneficial for high-dimensional or noisy input features. The learned latent features generated by the Autoencoder, especially those derived from temporal data, will then be fed into a Recurrent Neural Network. Long Short-Term Memory or Gated Recurrent Unit networks are suitable choices due to their ability to effectively capture long-term dependencies and sequential patterns within a patient's health trajectory. Each AE-RNN hybrid model will produce a preliminary prediction or a set of higher-level features that summarize its learned understanding of the input data. Multiple AE-RNN hybrid models configured with different architectures, hyperparameters, or trained on different subsets of the data, will serve as the base learners. This diversity among base learners is crucial for the ensemble's robustness. A separate machine learning model, such as a Support Vector Machine, Logistic Regression, or a simple Neural Network, will act as the meta-learner which will be trained on the predictions generated by the base AE-RNN learners. This allows the meta-learner to learn how to optimally combine the strengths and compensate for the weaknesses of the individual AE-RNN models considering more accurate and robust final heart disease prediction.
Feature Extraction
Autoencoders are highly skilled in autonomously producing morphological features from intricate data such as ECG waveforms, or obtaining pertinent, abstract features from EHR and organized clinical parameters. This feature is essential because it automates the feature engineering process by eliminating the need for labor-intensive, human, and perhaps biased feature selection. Autoencoders are also able to identify subtle, non-linear patterns that human-engineered features could overlook by directly learning these representations from data.
Dimensionality Reduction
The capacity of autoencoders to compress high-dimensional input data into a lower-dimensional latent representation is one of their main advantages. If the model is properly trained, this reduction is accomplished while keeping the most pertinent details about the nature of the data. This is especially important for clinical datasets, which frequently have a lot of features, some of which could be noisy or redundant. Autoencoders can lessen the "curse of dimensionality" and increase the computing efficiency of later models by lowering dimensionality.
Denoising & Pre-training
By rebuilding the original, uncorrupted inputs from corrupted copies, variants like Denoising Autoencoders are made expressly to learn robust features. Because of this, the model is naturally resistant to noise and artifacts, which are frequent problems in actual medical data, such as ECG signals that are prone to muscle activity or powerline interference. AEs guarantee that the learned features are clearer and more indicative of the underlying physiological signals by efficiently removing noise. For the encoder portion of the network, autoencoder pre-training can yield helpful initial weights. Before fine-tuning for a particular classification job, the model can learn general data representations with the aid of this pre-training, which is frequently carried out in an unsupervised manner on a sizable dataset. This may improve the downstream predictive model's prediction performance and decrease the total amount of time needed for training, particularly if there is a shortage of labeled data for the particular task.
Temporal Data Analysis and Sequence Prediction
The dynamic features and individual variations of a patient's long-term medical history including physiological measurements and activity data, can be efficiently modeled using RNNs. Since many risk factors and symptoms of heart disease change over time, this capacity is essential for comprehending how the disease progresses. Static models may miss important predictive cues such as variations in blood pressure, cholesterol, or ECG patterns over months or years. These intricate long-term dependencies and temporal correlations can be identified by RNNs that offer a more thorough picture of a patient's health trajectory. RNNs have proven highly effective for encoding time-stamped events from Electronic Health Record data and learning latent representations for classification tasks. Studies have shown that RNNs, specifically GRU models, can outperform traditional machine learning models in predicting future diagnoses of heart failure when using longitudinal EHR data. This is because EHR inherently represents a sequence of clinical events such as diagnoses, medications, and procedures that occur at specific times. RNNs can leverage this temporal ordering to infer disease risk.
A
The Autoencoder component's retrieved or compressed temporal information will be used as the RNN's input sequences. By integrating these improved and denoised representations, the RNN can capture even more intricate temporal patterns associated with the beginning or progression of cardiac disease. To detect small rhythm variations suggestive of sickness, an RNN might examine the sequence of the major morphological features extracted by an autoencoder from individual ECG beats over time. The model can first extract pertinent information from complicated raw data and then examine its temporal evolution thanks to this synergistic combination.
A
A
A
A
Figures 2, 3 & 4 show the structured representation of the Autoencoder-RNN Ensemble Model architecture including Stacked Ensemble High-Level Flow Diagram, Base Learner Architecture of Hybrid AE-RNN and Stacked Ensemble Strategy. This hierarchical structure ensures feature learning where an Autoencoder is used for compression and RNN is used for sequences. On the other hand, robust generalization is used for ensemble voting. Therefore, Table 1 shows the features of the Autoencoder component, Table 2 for the RNN component, Table 3 for the Meta-Learner options and Table 4 for Autoencoder benefits.
Fig. 2
Stacked Ensemble High-Level Flow Diagram
Click here to Correct
Fig. 3
Base Learner Architecture of Hybrid AE-RNN
Click here to Correct
Fig. 4
Stacked Ensemble Strategy
Click here to Correct
TABLE I
AUTOENCODER COMPONENT
Component
Type
Role
Input Layer
Raw/Minimally processed data
Accepts heterogeneous clinical data (numerical, categorical, time-series)
Encoder
Sparse/Denoising/VAE
Compresses input into latent space; reduces noise/dimensions.
Latent Features
Low-dimensional vector
Extracted robust features for RNN input.
Decoder
Reconstruction
Used only during unsupervised pre-training.
TABLE II
RNN COMPONENT
Component
Type
Role
RNN Layer
LSTM or GRU
Processes latent features sequentially; captures temporal dependencies.
Encoder
Dense + Softmax
Generates preliminary predictions (Exp: probabilities for each class).
TABLE III
META-LEARNER OPTIONS
Model
Advantages
Use Case
SVM
Handles high-dimensional features well.
Small-to-medium datasets.
Logistic Regression
Interpretable; fast training.
Baseline combination.
Neural Network
Captures complex nonlinear relationships.
Large datasets with intricate patterns.
TABLE IV
AUTOENCODER BENEFITS
Model
Advantages
Use Case
Feature Extraction
Learns morphological patterns (e.g., ECG waves).
Replaces manual feature engineering.
Dimensionality Reduction
Compresses 1000s of features → 100s.
Reduces computational cost; removes redundancy.
Denoising
Reconstructs clean data from noisy inputs.
Handles ECG artifacts/EHR missing values.
Pre-training
Unsupervised learning on unlabeled data.
Improves performance with limited labeled data.
Data Preprocessing Steps
In order to detect cardiovascular illness, the first step is to collect patient data that includes pertinent characteristics. Age, gender, blood pressure, cholesterol, diabetes, and other lifestyle factors are all included in this category of structured clinical criteria. Furthermore, longitudinal EHR event sequences such as medication history, diagnosis codes over time or time-series data, such as ECG signals, will be gathered. A thorough predictive model requires a variety of data kinds.
Missing values can seriously hinder the performance of machine learning models and are a frequent problem in real-world clinical datasets. The procedure determines the frequency and trends of missing values by using descriptive statistics or visualizations. Also, using imputation methods according to the type of data. Common techniques for numerical features include imputation of the mean, median, or mode. Imputation using the most frequent category is frequently used for categorical features. On the other hand, advanced imputation methods based on deep learning may be investigated for more complicated or temporal data. In clinical research, it is essential to handle missing values appropriately. When working with time-series data and the missingness might upset important temporal trends, the choice of imputation technique is not simple, then it can have a big impact on the interpretability and performance of models. The accuracy and dependability of the model are ultimately determined by the quality of features fed to the autoencoder and RNN components. Therefore, great care is taken to ensure that the imputation approach selected preserves the integrity of the data distribution and temporal correlations.
Numerical features frequently have different scales, and the model concentrates too much on features with wider ranges. To overcome the issues, scaling and normalization techniques are applied. Min-Max normalization scales the features to a 0–1 range. On the other hand, Z-score standardization adjusts features to have a mean of 0 and a standard deviation of 1. Finally, normalization speeds up convergence and enhances overall performance. Outliers in medical data can indicate measurement errors or significant anomalies. Z-score normalization is better if outliers are real since it is less impacted by extreme numbers. On the other hand, Min-Max scaling following outlier elimination is more appropriate if outliers are errors and the data has a known range. It is necessary to use encoding techniques like one-hot or label encoding to convert categorical variables, such as smoking status or gender, into numerical formats. This enables models to efficiently interpret categorical data.
Class imbalance is common in medical datasets, especially when it comes to uncommon illnesses. Biased models could result from the underrepresentation of the minority class, such as sick patients. To balance the dataset, resampling techniques such as SMOTE create artificial minority class cases, as the majority class may be undersampled or both approaches may be used. Ensemble models naturally manage the imbalance of the dataset. Besides, cost sensitive learning creates difficulties for incorrectly classifying minority classes. After all, effective normalization, encoding and balancing strategies can enhance the model's resilience and predictive accuracy.
A potent hybrid deep learning architecture for medical time-series analysis is depicted in the Synergistic AE-RNN Integration diagram in Fig. 5. The pipeline starts with an Autoencoder processing Raw Data (clinical measures, EHRs). The encoder efficiently denoises and extracts important patterns by compressing input features into a lower-dimensional Latent Space representation. Following distillation, these features are fed into an RNN component that models temporal patterns across sequential patient data using LSTM/GRU cells. In the meantime, the Decoder ensures that no important information is lost by reconstructing the original input from latent features. The final prediction produced by the system combines the advantages of both architectures: the RNN's skill in sequential modeling and the Autoencoder's effective feature learning.
Fig. 5
Synergistic AE-RNN Integration
Click here to Correct
A simplified pipeline for preprocessing clinical data for predictive analyses is presented in Fig. 6, End-to-End Preprocessing Flow diagram. The structured features consist of static patient data (demographic, medical history, etc.) and dynamic time-series data (such as vital signs, ECG readings, etc.), and these are taken up first into preprocessing. After assessing the completeness of a dataset through Missing Value Analysis, we use imputation methods to fill in any identified gaps to maintain the integrity of the dataset. After preprocessing, we extract significant patterns from the data through Autoencoder Feature Extraction, yielding noise-reduced, compact latent representations. These refined features are then fed into an RNN for temporal modeling to capture sequential dependencies in Electronic Health Records (EHRs). Our end-to-end procedure thus guarantees high-quality feature engineering and temporal analysis to fine-tune data for accurate downstream tasks like disease prediction.
The main data sources used in models for predicting heart disease are visually categorized in the Data Types for Cardiovascular Prediction Fig. 7 diagram. Three main data types are shown in the image, each proportionately represented: Time-Series Data (45%, ECG readings, vital signs), Structured Clinical Features (25%, age, cholesterol levels), and Longitudinal EHR Data (30%, inferred from remaining space, including diagnoses and medication histories).
Fig. 6
End-to-End Preprocessing Flow
Click here to Correct
As a result of their crucial function in recording dynamic cardiovascular patterns, time-series physiological signals account for the biggest percentage (45%) of predictive data, as this breakdown highlights. Basic metrics are provided by structured features, and treatment and diagnostic histories from longitudinal EHR data provide additional contextual depth. A thorough risk assessment is made possible by the combination of different data sets, and the proportions indicate how important each is for training models that produce precise cardiovascular forecasts.
Fig. 7
Data Types for Cardiovascular Predictions
Click here to Correct
The deep learning ensemble model for heart disease prediction is shown in Fig. 8 along with a detailed description of the supplied End-to-End Flow Diagram. The pipeline starts from raw clinical data, which may include unstructured or noisy patient records. Denoising and extraction of compressed but significant latent features, applying an autoencoder, further increase the efficiency of the model. Fine-tuned features are used by recurrent neural networks (RNNs) to learn temporal correlations in the sequential patient data, that is, ECG changes over time. To increase robustness, an Ensemble Meta-Learner combines Time-Aware Predictions from the RNN with other model outputs. Ultimately, the system produces a Final Diagnosis with greater accuracy and interpretability. The entire workflow allows for a thorough analysis combining feature-reduction with sequential learning, which is then fused into an ensemble decision.
Fig. 8
End-to-End Flow Diagram
Click here to Correct
The Fig. 9 Missing Values Handling Pipelines diagram presents a comprehensive workflow for managing missing values in clinical datasets. The process begins with Raw Data, for which the system first searches for Missing Values. None found, it proceeds to Feature Engineering. When missingness is present, then the pipeline Analyzes Patterns of missingness to figure out how to deal with missing values. For numeric features, it applies traditional methods like Mean/Median/Regression Imputation, while categorical variables use Mode or MICE (Multiple Imputation by Chained Equations). For complex temporal data, RNN-Based Imputation leverages sequential patterns to reconstruct missing values. The cleaned data are put through Advanced Temporal Feature Engineering before feeding into the AE-RNN model. This is done to ensure the missing data can be treated systematically while preserving clinically critical patterns, with the final result being a Clean Dataset for predictive analysis.
Fig. 9
Missing Values Handling Pipelines
Click here to Correct
The organized approach to missing values management in time-series clinical data is illustrated in the image above, which is the Temporal Data Integrity Preservation diagram in Fig. 10. Initially, Missing Segments can be identified in Raw Time-Series: vital signs and ECG recordings, for instance. Then it is determined via a decision branch whether the missingness is systematic (such as a systematic gap) or random (like rarely functioning sensors). Statistical Imputation (like mean/median) or Linear Interpolation uses data points of adjacent time-series to fill gaps rue to random missingness. Model-Based Imputation (e.g., predictions via LSTM/GRU) reconstructs missing values due to complicated non-random patterns using temporal dependencies. Spot-on output becomes a Complete Series for RNN processing. Data is filled in, but it keeps significant time-related patterns secure, which are very important for reliable medical AI applications.
Fig. 10
Temporal Data Integrity Preservation
Click here to Correct
Missing value imputation or removal is the first step in the process, starting with raw data. Next, normalization is done to ensure consistency among feature scales using one of the following methods: Z-Score Standardization (mean = 0, SD = 1) or Min-Max Scaling (0–1 range). After that, outliers are found and eliminated to cut down on noise. Label Encoding (ordinal labels) or One-Hot Encoding (binary columns) are used to convert categorical variables (such as gender and smoking status) to numerical formats. Methods like SMOTE (oversampling) or undersampling are used to reduce class imbalance (such as in rare disease instances). To ensure reliable and objective forecast performance, the generated Balanced Dataset is ultimately fed into the Train Model phase. This pipeline optimizes data quality for accurate heart disease prediction while addressing common challenges like missing values, skewness, and categorical heterogeneity.
A
Fig. 11
Data Preprocessing Pipeline
Click here to Correct
A systematic method for getting clinical data ready for predictive modeling is shown in the Medical Data Processing Workflow graphical representation in Fig. 12. Raw data with both numerical and category attributes is where the pipeline starts. A decision point for numerical data establishes whether scaling is required; the presence of outliers influences the choice between Z-score standardization (mean = 0, SD = 1) and Min-Max normalization (0–1 range). After that, outliers are either kept (if they are clinically valid anomalies) or eliminated (if they are errors). One-Hot or Label Encoding is used to categorical data in order to convert it to numbers. Techniques like SMOTE, Undersampling, or Cost-sensitive Learning are used whenever a class imbalance is identified (for example, in cases of rare diseases). The Balanced Dataset that is produced is ideal for building strong models. This workflow ensures data quality by addressing scaling, outliers, categorical encoding, and imbalance for reliable medical AI applications.
Fig. 12
Medical Data Processing Workflow
Click here to Correct
SEQUENCE OF EQUATIONS & EXPLANATIONS
1. INITIAL DATA STATE:
Click here to Correct
Explanation: The process initiates with the raw dataset $D_raw$, containing N samples, all of which have a feature vector, $x_i$. This feature vector can be in numerical, categorical, or time-series form. Each sample is also associated with a corresponding label, $y_i$, for instance, 0 for healthy and 1 for heart disease.
2. Handling Missing Values:
For an input feature $j$ of sample $\mathbf{x}_i$ with value $x_i^{(j)}$:
Explanation: The main idea at this step is to identify missing values (NaN) and to replace them depending on their data type and context. Simple techniques such as mean or mode imputation could apply in the case of static features. More advanced methods such as MICE (Multiple Imputation by Chained Equations) would take collaborative relationships among features into account. In the case of time-series data (i.e., ECG), for example, missing values can be imputed using RNNs concerning the temporal context of the sequential data to maintain vital patterns. A dataset $D_{\text{imputed}}$ is formed from imputation.
3. Feature Scaling/Normalization (Numerical Features):
For a numerical feature vector $\mathbf{x}^{(j)}$:
Z-score Standardization:
Click here to Correct
Min-Max Normalization:
Click here to Correct
Explanation: Scaling which thus ensures that no one single measure presides over the machine learning effort on the model, so that a piece of growing-age 20–80, cholesterol 100–300 would have been scaled. Z-score, preferably, if feature distribution is Gaussian Approximated; Min-Max should be used when a feature needs to be bounded to a certain range like [0, 1]. This results in the scaling of dataset $D_{\text{scaled}}$.
4. Encoding Categorical Features:
For a categorical feature $c$ with $K$ unique categories:
One-Hot Encoding:
Click here to Correct
Label Encoding (for ordinal categories):
Click here to Correct
Explanation: Machine learning models work with numerical inputs. Hence, one-hot encoding takes one categorical column and creates $K$ binary columns (e.g., "Gender: Male" becomes [1, 0] and "Gender: Female" becomes [0, 1]). Label encoding is used only for categories that have an inbuilt ordering (for example, "Severity: Low, Medium, High" becomes [0, 1, 2]). This gives us a fully numerical dataset $D_{\text{encoded}}$.
5. Addressing Class Imbalance:
Let the majority class be $S_{\text{maj}}$ and the minority class be $S_{\text{min}}$, where $|S_{\text{maj}}| >>|S_{\text{min}}|$.
Synthetic Minority Oversampling Technique (SMOTE):
For a sample $\mathbf{x}i$ in $S{\text{min}}$, create a synthetic sample:
Click here to Correct
where $\mathbf{x}z$ is a randomly chosen nearest neighbor from $S{\text{min}}$ and $\lambda \sim U(0, 1)$.
Explanation: Most often, medical datasets have a disproportionate ratio of healthy samples to diseased samples. SMOTE addresses this disparity by synthesizing new examples from the minority class, through interpolation on the feature space among the existing minority samples, thus filling in the gaps. This will balance the class distribution, preventing the model from being biased toward predicting the majority class. The end result is a balanced dataset $D_{\text{balanced}}$ for training purposes.
IMPLEMENTED ALGORITHMS
1. Synthetic Minority Over-sampling Technique
The SMOTE algorithm has been implemented in medical datasets where the healthy case populations are usually disproportionate to a tiny and diseased case data set. Instead of naive random oversampling, SMOTE is a minority class example in k-nearest neighbor search for k number of nearest neighbors (where most typically k = 5) in the same class as an existing sample from the minority class and synthesizes entirely new artificial data by taking a convex combination of the original sample and a randomly chosen neighbor. In this case, the interpolation formula is used: x_new = x_i + λ * (x_z - x_i), where λ is a randomly generated number between 0 and 1. The core computational process is continued until the balanced class distribution is achieved in the dataset. This has crucial importance since repeating copies of sample minority cases may lead to severe overfitting, while the SMOTE method generates sensible synthetic examples into the feature space for a model and thus learns more robust and generalizable decision boundaries.
2. Multiple Imputation by Chained Equations
So, this MICE technique, for Multiple Imputation by Chained Equations, handles complex missing data patterns found in real-world clinical datasets within the preprocessing pipeline. This is a more advanced method than simple imputation, being directly used for modeling the uncertainty about the missing values. Initially, it imputes all missing values in a feature by simple means, modes, etc. After the first step, the pictures continue for a certain number of cycles, treating each feature with missing values one by one as a target variable and all others as its predictors. However, for each target feature, a model is fitted on the observed values to predict and update the missing values: Logistic Regression for categorical data, Bayesian Ridge Regressor for numerical data. Therefore, this chained equation method gives us multiple completed datasets, and the most reasonable imputed values are computed by averaging the imputed values, ensuring that the imputation follows the underlying data relationships rather than the imposition of an artificial simple bias.
3. RNN-based Imputation for Time-Series
Perhaps, the main purpose aimed at a method for missing value imputation in clinical time-series data-such as ECG signals which employs their method of imputing missing data based on RNN-advantageous computation, much more than simple linear interpolation-on the use of a Recurrent Neural Network (RNN) architecture, such as LSTM or GRU, solely designed for sequence prediction. First, the model gets trained on sequences of complete data so that it can understand the complex and non-linear physiological patterns embedded in the data for prediction of future values or for reconstructing the input. Imputation is-during imputation phase, for a sequence which has a gap at time step *t*, the trained model, RNN_θ uses its learned parameters and hidden state *h*-which stores the context from previous time steps (x_{t−1}, x_{t−2}, ...) in generating a plausible value: x_t_imputed = RNN_θ (x_{t−1}, x_{t−2}, ..., h). This is highly critical for any clinical application since it can be truly overlaid by rich, temporal context of a patient's entire physiological state, rendering an unrealistic but clinically adequate imputation in filling value compared to what simple statistical interpolation would give.
4. The Stacking Meta-Learner Aggregation
The most important novel aspect of the ensemble system uses by CARDIOPREN is that it has a stacking meta-learner, an advanced aggregation scheme designed to go beyond a simple voting mechanism. Here, the predictions made using the different base AE-RNN models will not be just averaged; rather, they will be combined in a particular manner within this architecture. Formally, for a given sample,
, the prediction
from each of the
base models is treated as a new meta-feature. These predictions are concatenated to form a new input vector for the meta-learner:
A separate machine learning model as Logistic Regression or Support Vector Machine is trained as meta-learner on this new dataset
and the true labels
to learn an optimal combining function g. Finally, the superior ensemble prediction is therefore defined as
=
. It is very important in the mathematical aspect because this meta-learner assists in weighing and merging the opinions of each base model and learns a high-order pattern from those predictions. This leads to better performance and robustness of the overall ensemble.
VII. EXPERIMENTAL WORK
b.
A. DESCRIPTION OF SELECTED DATASETS
Below the Table V has showed the summary of different dataset where firsly, 13 independent variables and one dependent variable for the diagnosis of heart disease, the Cleveland Heart Disease Dataset is a commonly used benchmark. 1 There are only 1025 records in all, of which 421 have heart disease and 399 are healthy while training, and 105 have heart disease and 100 are healthy during testing. 2 Because of its reduced size, it can be used for preliminary model validation and comparison with more conventional machine learning techniques. Secondly, 57373 records (21898 heart disease, 24000 healthy in training; 5475 heart disease, 6000 healthy in testing) and 18 independent features, the Large Heart Disease Dataset (Kaggle) provides a larger amount of data. Deep learning models can learn more intricate patterns on this greater scale, which makes it more suitable for training. Thirdly, another benchmark that is frequently used to assess models is the UCI Heart Disease Dataset, which offers a foundation for comparison with a wide range of previous studies. Finally, the National Health Interview Survey (NHIS) and the Behavioral Risk Factor Surveillance System (BRFSS) are two sizable public databases. Importantly, there are serious issues with class imbalance because heart disease cases make up a small number. These datasets are crucial for thoroughly evaluating the robustness of the suggested model and its capacity to sustain balanced performance in extremely unbalanced real-world situations. A conventional data split into training (e.g., 80%) and testing (e.g., 20%) sets will be carried out for every dataset. In order to prevent assessment bias, stratified sampling will be used during this split to guarantee that the class distribution in the training and testing sets precisely matches that of the original dataset.
TABLE V
SUMMARY OF HEART DISEASE DATASETS
Dataset Name
Source
Total Records
No of Features
Class Distribution
Data Types
Cleveland Heart Disease Dataset
Kaggle
1025
13
Training: 399/421; Testing: 100/105
Clinical Parameters
Large Heart Disease Dataset
Kaggle
57373
18
Training: 24000/21898; Testing: 6000/5475
Clinical parameters
UCI Heart Disease Dataset
UCI
Varied
Varied
Varied
Clinical Parameters
BRFSS
Public
Large
Varied
Heart Disease 6%
Survey/
Behavioral Data
NHIS
Public
Large
Varied
Varied
Survey/
Health Interview
B. EXPERIMENTAL SETUP
Software and Libraries
Python will be the main programming language used in the implementation, taking advantage of its strong machine learning and deep learning ecosystem. The Autoencoder and RNN models will be constructed, trained, and deployed using the TensorFlow and Keras frameworks, which take advantage of TensorFlow's computational efficiency and Keras' high-level API. Pandas and NumPy will handle tabular and numerical data, and Scikit-learn will help with baseline machine learning tasks and preprocessing (scaling, encoding). Plots and AI insights that can be explained will be produced by visualization tools such as SHAP, Seaborn, and Matplotlib. To speed up deep learning training, the system configuration consists of an Intel Core i5 processor, a GPU, and at least 8GB of RAM. Jupyter Notebook will be used for development, allowing for effective debugging, real-time visualization, and iterative experimentation.
Coding Implementation Outline
The coding implementation will be modular, structured into distinct components to manage complexity and facilitate development. For effective development, the coding implementation will adhere to a modular framework. Dataset loading, missing value imputation, categorical encoding, feature scaling, and SMOTE-based class balance are all covered in Module 1. The Autoencoder is defined and trained in Module 2, which also extracts latent features and optimizes reconstruction loss. In Module 3, autoencoder-derived features are integrated into RNN models (LSTM/GRU) for sequential data. The ensemble model is built in Module 4 by fusing a meta-learner (Logistic Regression/Dense Network) with basic learners (AE-RNN). Module 5 uses confusion matrices, measurements (accuracy, AUC-ROC), and visualizations (ROC/PR curves, histograms) to assess performance. Lastly, SHAP/LIME is used in Module 6 to construct Explainable AI (XAI) for both local and global interpretability. The technique guarantees a methodical progression from preprocessing to forecasts that can be explained.
VII. EXPECTED RESULTS AND EVALUATION
c.
A. KEY PERFORMANCE METRICS
To guarantee clinical relevance, the model's performance will be thoroughly assessed using important medical categorization metrics. While precision reduces false positives and recall (sensitivity) gives priority to identifying actual cases of heart disease, which is essential to preventing missed diagnoses, accuracy gauges overall correctness. Specificity lowers false alarms in healthy individuals, while the F1-score strikes a compromise between precision and memory. Class separation across thresholds is assessed using the AUC-ROC, which is particularly important for data that is unbalanced. True/false positives/negatives are detailed in a confusion matrix and threshold trade-offs are graphically displayed by ROC curves and precision-recall curves, with PR curves highlighting minority-class performance. When combined, these indicators guarantee a thorough evaluation of the diagnostic reliability of the model.
B. PERFORMANCE GAINS & MODEL INTERPRETABILITY
By integrating feature distillation (by using Autoencoders) and temporal pattern recognition (by using RNNs) in a single framework, the proposed Autoencoder-RNN ensemble model is anticipated to perform better than both single deep learning architectures (standalone RNNs, CNNs, or Autoencoders) and traditional machine learning models (e.g., Logistic Regression, SVM, Random Forest). More F1-scores, sensitivity (recall), and AUC-ROC are anticipated as a result of this synergy, especially on unbalanced clinical datasets. The ensemble seeks to produce clinically actionable predictions with strong reliability by reducing false negatives, which are crucial for the detection of heart disease, and enhancing class separation across thresholds. The model is an excellent diagnostic tool for practical medical applications because it can automatically extract deep features while maintaining sequential dependencies.
Explainable AI (XAI) techniques are incorporated into the model to achieve interpretability and clinical integration. SHAP values quantifies feature contributions at global level (i.e., labeling cholesterol levels as an important predictor) and local level (per-patient level explanations), whereas LIME generates intuitive patient-specific rationales by approximating predictions with simpler, interpretable models. This concurrent use of model outputs follows medical knowledge, on several known risk factors (age, ECG patterns) may also discover new biomarkers. By bridging the "black-box" gap, the system empowers clinicians to verify, trust, and act on AI-driven diagnoses by merging high accuracy with actionable insights for real-world healthcare.
Four heart disease prediction models are compared in terms of performance in the Table V, and the suggested Autoencoder-RNN (AE-RNN) ensemble performs better on all important metrics. The ensemble model beats the Autoencoder-Random Forest hybrid (85–88% accuracy) and the standalone RNN baseline (88–91% accuracy), achieving 94–97% accuracy, 92–95% recall (sensitivity), and an AUC-ROC of 0.95–0.98. The AE-RNN's robustness in reducing false positives and negatives—which is crucial for clinical applications—is further highlighted by its greater accuracy (93–96%) and specificity (95–98%). Although the fourth (unlabeled) model displays competitive metrics (such as an F1-score of 83.5–88.5%), its irregular ranges point to potential limitations. The AE-RNN ensemble is the most dependable option for predicting heart disease overall due to its balanced excellence in accuracy, sensitivity, and discriminative power (AUC-ROC), especially when dealing with imbalanced datasets.
TABLE VI
PROPOSED MODEL PERFORMANCE METRICS
Model Type
Accuracy
Precision
Recall
F1-Score
Specificity
AUC-ROC
Proposed AE-RNN Ensemble Model
94–97
93–96
92–95
92–95
95–98
0.95–0.98
Baseline RNN Model
88–91
87–90
85–88
86–89
89–92
0.88–0.91
Baseline AE
85–88
84–87
82–85
83–86
86–89
.85−.88
ML Model RF
85–90
84–89
83–88
83.5–88.5
86–91
.86−.90
VIII. FUTURE WORK AND DEVELOPMENT DIRECTIONS
A. POTENTIAL ENHANCEMENTS OF THE MODEL
In the future, we will improve the Autoencoder-RNN ensemble model by several novel strategies. Dynamic ensemble weighting will allow the base learner contributions to dynamically change depending on certain input data characteristics and degree of confidence of predictions, for optimal real-time performance. More sophisticated structural improvements will include the incorporation of Convolutional Autoencoders (CAEs) for direct raw ECG signal processing and the combination of Bidirectional LSTMs/GRUs to capture better temporal dependencies in both past and future context. We will enhance model performance toward accuracy and interpretability by employing attention mechanisms that highlight and capture the most clinically significant time points and features. In order to synthesize realistic clinical data, generative models such as GANs and VAEs will also be studied. These models are especially useful for resolving difficulties of class imbalance and data scarcity. By enhancing feature extraction capabilities, temporal pattern identification, and overall model resilience, these developments collectively hope to push the limits of AI applications in clinical cardiology.
B. EXPLORE MULTI-MODAL DATA INTEGRATION
Multi-modal data integration will be added to the current framework in future studies to improve the prediction of heart disease. Utilizing specialized deep learning architectures like Temporal Convolutional Graph Neural Networks (TCGNs) for ECG analysis and 3D U-Nets for MRI, the emphasis will move to integrating medical imaging (Example: cardiac MRI, echocardiography) with current EHR and ECG time-series data collection. One of the main challenges will be creating strong fusion algorithms that prioritize clinically relevant aspects while successfully harmonizing disparate data kinds (text, signals, and images), possibly with the use of attention processes. By offering a comprehensive patient health assessment and utilizing complementary insights from various medical data streams, this multi-modal approach seeks to address the drawbacks of single-source diagnostics while ultimately increasing diagnostic accuracy. This development may lead to the development of new fusion architectures specifically designed for clinical AI applications.
C. CONSIDERATIONS REAL-WORLD APPLICATION
The suggested strategy will give real-world deployment top priority through a number of important activities in order to close the gap between research and clinical practice. Real-time prediction skills will facilitate prompt clinical decision-making by enabling instantaneous examination of patient data. Clinicians without technical knowledge will be able to easily interpret forecasts because of an intuitive interface that will be created to show model outputs. Strict adherence to ethical and legal requirements (GDPR/HIPAA) will be upheld, along with stringent bias mitigation and safety validation to satisfy medical device standards. Future wearable device integration may also allow for ongoing monitoring, turning the model into a proactive tool for managing heart health over the long run. This entity also recognizes the need for impact and intent to have clinical users avail themselves of this technology.
D. FURTHER RESEARCH INTO EXPLAINABLE AI
This is how all future research will focus on interpretability and ethical AI from three larger approaches: beyond post-hoc explanations for clinically salient findings and further integration of XAI into building specific interpretability approaches of ensembles and models that are intrinsically interpretable. All the various demographics will be guaranteed their fair share of performance through an extensive fairness and bias mitigation coupling bias detection followed by algorithmic adjustments. Dynamic collaboration between AI and clinicians will consist of a human-in-the-loop framework that makes the system from a black-box machine as an adaptive therapeutic co-worker. This enables health practitioners to validate, adjust, and improve an output of the model iteratively. These improvements are intended to ultimately yield trustworthy, bias-aware AI that makes advancements in clinical knowledge under very strict ethical compliance conditions for real-world usage.
IX. CONCLUSIONS
The experts feel that an interpretable deep-learning ensemble should do well at predicting heart disease with its RNNs capable of recognizing temporal patterns and autoencoder ones able to learn features. A strong pretreatment data-handling pipeline would help in this framework with the missing values and class imbalance. Furthermore, XAI using SHAP/LIME would provide valid and clinically meaningful insights for this framework. Besides these, the early detection and risk stratification of the individual will become a precondition for such clinical systems where they foresee performing at a greater level in accuracy, sensitivity, and higher AUC-ROC in comparison to the existing models. More than being one of the technological breakthroughs, the interpretability and clinician confidence aspect of the work is key to bridging the divide between AI and practical health care systems, ultimately ensuring the safer integration of AI alongside diagnostic workflows, better patient outcomes, and lowered operational costs. This paper supports the ethical and practical deployment of AI in the field of cardiology.
A
Clinical Trial Registration
NOT APPLICABLE.
A
Author Contribution
Harpreet Kaur: Conducted the literature review, designed the initial deep learning framework, and contributed to drafting and revising the manuscript.Amlan Kumar Sarkar: Provided technical guidance on deep learning model optimization, contributed to algorithm design, and assisted in data preprocessing and experimental analysis.Arun Singh*: Conceived and supervised the study, coordinated collaboration among authors, finalized the methodology, and led manuscript preparation and critical revisions.Jalakanuri Gnaneswar Raju: Assisted in data acquisition, preprocessing workflows, and visualization of experimental results.Md Khadimul Islam Zim: Provided theoretical insights, supported statistical validation of results, and contributed to the interpretation of findings.Raunak Raj*: Guided clinical relevance assessment, integrated Explainable AI (XAI) tools, and contributed to writing and editing of the final manuscript.
A
Data Availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
VII. REFERENCES
1. Kundavaram Joseph Sujith Kumar, Stewart Kirubakaran S, M. Roshni Thanka, V. Ebenezer, E. Bijolin Edwin, Priscilla Joy, "Multi-Model Supervised Machine Learning Techniques for Heart Stroke Prediction," in IEEE, 2024.
2. S. Durai, Prabhu V, D. Jaganathan, R. Harini, "Revolutionizing Cardiovascular Attack Prediction: A Comprehensive Machine Learning Approach for Accurate and Timely Detection," in IEEE, 2024.
3. Ganesh C, Lordson Gnana Durai A, Baavana Bandarupalli, Pavithra S, Prabanjan R, "Heart Disease Diagnosis Using Deep Learning," in IEEE, 2023.
4. Gurpreet Singh, Kalpna Guleria, Shagun Sharma, "Machine Learning and Deep Learning Models for Early Detection of Heart Disease," in IEEE, 2023.
5. Kamal Kumar Gola, Shikha Arya, "Satin Bowerbird Optimization-Based Classification Model for Heart Disease Prediction Using Deep Learning in E-Healthcare," in IEEE, 2023.
6. Sachin R. Jadhav, Prajot Pujari, Rohan Kulkarni, Swapnil Patwari, Aditya Yendralwar, "Monitoring and Predicting of Heart Diseases Using Machine Learning Techniques," in IEEE, 2023.
7. Mahaveer, Puneet, "Cardiovascular Disease Prediction Analysis Using Classification Techniques," in IEEE, 2022.
8. Paranthaman M, Santhosh S, Sanjairam M, Yaathash B, "Cardiovascular Disease Prediction Using Deep Learning," in IEEE, 2022.
9. V. Kannagi, M. Rajkumar, I. Chandra, K. Sangeethalakshmi, V. Mohanavel, "Logical Mining Assisted Heart Disease Prediction Scheme in Association with Deep Learning Principles," in IEEE, 2022.
10. Bassam A. Abdelghani, Sophia Fadal, Shadi Bedoor, Shadi Banitaan, "Prediction of Heart Attacks Using Data Mining Techniques," in IEEE, 2022.
11. Deepak Kumar Rathore, "A Review of Machine Learning Techniques and Applications for Health Care," in IEEE, 2021.
12. Muhammad Nabeel, Hooria Muslih-ud-Din, Mazhar Javed Awan, Shumaila Majeed, Mohsin Raza, "Heart Attack Disease Data Analytics and Machine Learning," in IEEE, 2021.
13. Wang Fangyu, Zhang Jianhui, Bu Youjun, Chen Bo, "Research on Imbalanced Data Set Preprocessing Based on Deep Learning," in IEEE, 2021.
14. Dr. M. Kavitha, G. Gnaneswar, R. Dinesh, Y. Rohith Sai, R. Sai Suraj, "Heart Disease Prediction Using Hybrid Machine Learning Model," in IEEE, 2021.
15. Youness Khourdifi, "Applying Best Machine Learning Algorithms for Breast Cancer Prediction and Classification," in IEEE, 2021.[16] Y. Ansari, O. Mourad, K. Qaraqe, and E. Serpedin, “Deep learning for ECG arrhythmia detection and classification: An overview of progress for the period 2017–2023,” Front. Physiol., vol. 14, Art. no. 1246746, 2023.[17] A. Baccouche, B. Garcia-Zapirain, C. Castillo Olea, and A. Elmaghraby, “Ensemble deep learning models for heart disease classification: A case study from Mexico,” Information, vol. 11, no. 4, p. 207, 2020.[18] D. M. Alsekait et al., “Heart-Net: A multi-modal deep learning approach for diagnosing cardiovascular diseases,” Computers, Materials & Continua, vol. 80, no. 3, 2024.[19] S. Das, G. Sajjan, A. Poddar, T. Dasgupta, S. Patty, and D. Ghosh, “Handling imbalanced heart disease data and explaining the factors,” Int. J. Comput. Sci. Eng., vol. 11, no. 1, pp. 62–65, Nov. 2023.[20] R. Silva, A. Fred, and H. Plácido da Silva, “Morphological autoencoders for beat-by-beat atrial fibrillation detection using single-lead ECG,” Sensors, vol. 23, no. 5, p. 2854, 2023.[21] A. P. Giovani, H. F. Pardede, and A. Subekti, “Autoencoder-based feature learning for predicting cardiovascular disease,” Int. J. Comput. Digit. Syst., vol. 14, no. 1, pp. 1–1, 2023.[22] B. W. Jones, W. D. Taylor, and C. G. Walsh, “Sequential autoencoders for feature engineering and pretraining in major depressive disorder risk prediction,” JAMIA Open, vol. 6, no. 4, 2023.[23] A. P. Singh, A. Kumar, and M. Hanmandlu, "Heart disease prediction using deep learning: A comprehensive review," Medical & Biological Engineering & Computing, vol. 59, no. 1, pp. 1–25, 2021. [24] N. K. Singh, A. Tripathi, and A. K. Singh, "Predicting heart disease using recurrent neural network," in 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, 2019, pp. 268–273. [25] J. Che, M. Lu, and X. Chen, "Predicting heart disease with deep learning based on electronic health records," in 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Madrid, Spain, 2018, pp. 2382–2387. [26] H. Li, Y. Zhang, Y. Wang, and S. Li, "An autoencoder-based deep learning approach for heart disease prediction," in 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Miyazaki, Japan, 2018, pp. 3121–3126. [27] K. B. Singh and S. S. Choudhary, "Heart disease prediction using sparse autoencoder and support vector machine," in 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore, India, 2018, pp. 2004–2008. [28] C. K. Singh, P. K. Singh, and S. Gupta, "Deep learning in healthcare: A review," International Journal of Health Sciences and Research, vol. 9, no. 1, pp. 192–202, 2019.[29] G. K. Singh, V. Kumar, and A. Kumar, "Ensemble learning for heart disease prediction: A survey," Journal of Medical Systems, vol. 43, no. 12, pp. 1–17, 2019. [30] L. L. Wang, Y. Lu, and X. X. Li, "Heart disease prediction based on stacking ensemble learning," in 2019 International Conference on Intelligent Computing and Information Systems (ICICIS), Cairo, Egypt, 2019, pp. 165–170. [31] A. Al-Ajlan, "Heart disease prediction using a hybrid deep learning model with an ensemble approach," Computers in Biology and Medicine, vol. 135, p. 104602, 2021. [32] P. B. Mane and R. R. Mudholkar, "Ensemble learning for heart disease diagnosis using machine learning algorithms," in 2020 International Conference on Smart Technologies for Computing, Communication and Control (ICSTCCC), Solapur, India, 2020, pp. 1–5. [33] S. Mohan, C. Thirumalai, and G. Srivastava, "Effective heart disease prediction using hybrid machine learning techniques," Soft Computing, vol. 24, no. 19, pp. 16297–16306, 2020. [34] S. Bashir, Z. Qamar, and F. U. Islam, "A comprehensive review of deep learning approaches for heart disease prediction," Applied Soft Computing, vol. 99, p. 106961, 2021. [35] P. Panthong, C. Ketuwan, and S. Prueksakorn, "ECG signal classification using autoencoder for anomaly detection," in 2019 16th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Pattaya, Thailand, 2019, pp. 297–300.[36] World Health Organization, “Cardiovascular diseases (CVDs),” WHO, 2023.[37] U. R. Acharya, H. Fujita, S. L. Oh, Y. Hagiwara, J. H. Tan, and M. Adam, “Automated identification of normal and premature ventricular contractions using convolutional neural network,” Information Sciences, vol. 405, pp. 81–90, 2017.[38] M. Javeed, S. Zhou, S. Yongjian, A. Qasim, M. Noor, and S. Rehman, “An intelligent learning system based on random search algorithm and optimized random forest model for improved heart disease detection,” IEEE Access, vol. 7, pp. 180235–180243, 2019.[39] A. Khan, B. A. Majid, A. Y. Javaid, and W. Z. Khan, “Heart disease prediction using machine learning ensemble methods,” Computers, vol. 9, no. 2, pp. 1–14, 2020.[40] S. Rajput, M. S. Ali, A. Q. Rehman, and M. A. Khan, “An optimized hybrid machine learning model for early heart disease detection,” Computers, Materials & Continua, vol. 71, no. 3, pp. 4167–4181, 2022.[41] A. K. Jha and S. S. Ghosh, “Heart disease prediction using deep learning and machine learning algorithms,” in Proc. 2020 IEEE Int. Conf. on Inventive Computation Technologies (ICICT), Coimbatore, India, Feb. 2020, pp. 122–127.[42] UCI Machine Learning Repository, “Heart Disease Data Set,”.[43] Kaggle, “Heart Failure Prediction Dataset,”. [44] A. K. Sharma et al., "Deep Learning Architectures for Arrhythmia Detection and Classification," J. Med. Syst., vol. 45, no. 8, 2021.[45] S. Chen and L. Wang, "Ensemble-Learning Framework Based on Neural Network Models for Classifying Different Types of Heart Disease," Information, vol. 11, no. 4, p. 207, 2020. [46] M. Gupta et al., "Heart-Net: A Multi-Modal Deep Learning Approach for Diagnosing Cardiovascular Diseases," 2023. [47] R. K. Patel and H. Zhang, "Cardiovascular Disease Prediction Model Based on Patient Behavior Patterns by Introducing Deep Learning Techniques," Front. Psychiatry, vol. 15, 2024. [48] S. Das, "A Deep Learning Framework for Handling Imbalanced Medical Data," Int. J. Comput. Sci. Eng., vol. 7, no. 1, 2023. [49] J. L. Martínez et al., "An Intelligent System for the Diagnosis of Heart Disease Using Clinical Parameters at Early Stages," Rev. Mex. Ing. Biomed., vol. 44, no. 1, 2023.[50] Y. Liu et al., "Ensemble Learning Based on Hybrid Deep Learning Models for Heart Disease Early Prediction," Diagnostics, vol. 12, no. 12, p. 3215, 2022. [51] T. Nguyen and P. Le, "Autoencoder-Based Automatic Feature Extraction for Atrial Fibrillation Detection," IEEE J. Biomed. Health Inform., vol. 26, no. 5, 2022.[52] K. Wilson et al., "Autoencoder Models for Feature Engineering and Pretraining in Psychiatric Prognostic Prediction," Sci. Rep., vol. 13, 2023. [53] A. Sharma and B. Li, "Enhancing Heart Disease Prediction through Knowledge Graph Integration," Int. Res. J. Mod. Eng. Technol. Sci., vol. 5, no. 4, 2025.
Declarations
Funding
This research received no external funding
Conflict of Interest / Competing Interests
The authors declare that they have no conflict of interest.
Ethical Approval for Clinical and Animal Studies
Not applicable.
Ethics Approval and Consent to Participate
Not applicable.
Consent for Publication
Not applicable.
Data Availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Materials Availability
Not applicable.
Total words in MS: 10704
Total words in Title: 11
Total words in Abstract: 245
Total Keyword count: 9
Total Images in MS: 18
Total Tables in MS: 7
Total Reference count: 30