Introduction
Many investigations have compared standard regression models with super-learning models, however, it is still unclear which is better \cite{R6}. Using an ensemble machine learning technique called super learning, the final model performs at least as well as any of its component parts by combining the predictions from multiple algorithms and assigning optimal weights to each of them \cite{R9}. This improves prediction ability by allowing researchers to examine multiple machine learning techniques simultaneously instead of relying solely on one.
Typically, super-learning apps also increase the hyperparameters of the underlying machine learning tools. Frequently modified before training, hyperparameters can have a substantial impact on the behavior of an algorithm by changing its structure or complexity R40. Although many research uses default settings \cite{R5,R9}, the greatest outcomes in the superlearning framework require careful selection of hyperparameter values.
This is particularly important because these hyperparameters have a large impact on how well algorithms like logistic regression work. Iterating through multiple configurations and using techniques such as grid search to find the optimal hyperparameters can improve the performance of the model R34.
In order to apply superlearning to a high-dimensional dataset, we combined logistic regression with other conventional model-building techniques. Due to its interpretability, logistic regression is a widely used and appreciated technique for binary classification problems, such as the prediction of liver disorders, heart disease, and breast cancer R10. By adjusting factors such as regularization strength and solver parameters, hyperparameter tinkering can significantly improve the performance of these models R11. It is unclear, therefore, if hyperparameter modification is applied uniformly across datasets R40.
To tackle this, we use datasets from a variety of domains, including as healthcare and digit recognition, to investigate how hyperparameter tinkering affects logistic regression models. The goal is to identify the optimal hyperparameter combinations to increase accuracy and generalisation capacity. The model's hyperparameters, which regulate different elements like regularisation strength and solver strategies, must be changed in order to enhance performance. This study compares the performance of a ``tuned'' and ``untuned'' super learner, the latter employing default hyperparameters. A different logistic regression model is also used to compare the tuned super learner. We investigate possible combinations of class weights, solver settings, penalty types, and C values using a methodical, combinatorial approach to hyperparameter optimisation.
A thorough assessment method that employs multiple train–test split techniques (with validation sizes of 20%, 30%, and 40%) and cross validation (with 3, 5, and 10 folds) supports this strategy. A more comprehensive analysis of model performance is made possible by the extra assessment metrics provided, which include test accuracy, training accuracy, F1 score, and AUC score. Through the establishment of a methodical approach to logistic regression hyperparameter tweaking, this technique can enhance model fit, accuracy, and generalisation in binary classification scenarios. Both researchers and practitioners will gain from our findings, which offer crucial information for improving logistic regression models in real-world scenarios.
Literature Review
To increase model performance and accuracy, numerous researchers have studied optimisation strategies and machine learning methodologies. A number of studies have concentrated on assessing model parameters, choosing suitable algorithms, and implementing tuning techniques for improved outcomes.
Schratz et al.\investigated the impact of spatial autocorrelation on hyperparameter tuning and model evaluation. They compared GLM, GAM, SVM, RF, and k-NN under random and geographic cross-validation using ecological disease mapping as a case study. The findings demonstrated that whereas spatial CV yields more accurate performance estimates, random CV produces unduly optimistic measures. They concluded that both tuning and evaluation should make use of spatial segmentation R48.
Sun et al.\R49 used Bayesian hyperparameter optimisation to perform a comparative study on landslip susceptibility mapping. To assess how optimised parameters affected model accuracy, they contrasted logistic regression and random forest models. The results demonstrated that both models’ predictive performance was greatly enhanced by Bayesian optimisation, with random forest attaining slightly superior accuracy. The study concluded that improving model reliability for geospatial risk mapping applications requires hyperparameter optimisation.
Raschka \cite{R50} provided a comprehensive tutorial on model assessment, model selection, and performance metrics in machine learning. He emphasised the importance of nested cross-validation for obtaining unbiased performance estimates and discussed the need to select evaluation metrics based on problem type and class imbalance. The tutorial also advised against relying solely on accuracy—particularly for imbalanced datasets—and advocated the use of ROC-AUC, precision–recall curves, and F1-scores. This work serves as a critical resource for ensuring comprehensive model evaluation in machine learning research.
Pfob et al.\
R51 focused on model comparison while addressing data preprocessing, hyperparameter tuning, and the practical implementation of machine learning in medical research. To classify breast masses using mammographic imaging features and patient age, they evaluated logistic regression, XGBoost, SVM, and neural networks. All models yielded similar AUROC values (
0.88–0.89), illustrating that simple models can perform as well as complex models when proper preprocessing and hyperparameter tuning are applied.
Ambesange et al.\R52 developed a heart disease prediction system that combined logistic regression with ensemble learning techniques and GridSearch-based hyperparameter optimisation. By refining model parameters and preprocessing methods, the study achieved improved classification accuracy on the UCI heart disease dataset. The authors concluded that effective medical diagnosis systems require careful parameter optimisation and robust data preprocessing.
Erden et al.\R53 investigated advanced hyperparameter optimisation strategies to enhance machine-learning model performance. Compared with traditional grid search and random search, modern Bayesian and evolutionary algorithms achieved superior performance with fewer iterations. These techniques proved particularly effective for complex models that demand substantial computational resources. The study highlighted the efficiency and flexibility of intelligent optimisation algorithms in producing high-performing machine-learning models.
Results
Cross-Validation
Logistic Regression can also be applied to multi-class classification problems, such as determining the digit represented in the Digits dataset R29. In such cases, the model is trained independently for each class, selecting the class with the highest predicted probability. K-fold cross-validation was used to evaluate the model’s performance, with the \texttt{cross_val_score()} function determining accuracy and the \texttt{f1_score()} function calculating the F1 score, which incorporates both precision and recall R43,R46.
The maximum cross-validation (CV) accuracy achieved across the datasets was 0.961 for Breast Cancer, 0.7014 for Liver Disorder, 0.8407 for Heart Disease, and 0.930 for Digits. The corresponding average maximum F1 score was 0.6911 (Figure 3). These values demonstrate the overall performance of the Logistic Regression models; however, it is important to note that performance can vary depending on factors such as dataset size, data quality, preprocessing techniques, and feature engineering.
The models' mean F1 score was 0.6911, which shows a balanced performance between precision and recall, and their average CV accuracy was 0.7559, indicating an overall accuracy of 75.59% during cross-validation. Higher F1 scores and CV accuracies suggest strong generalisation capacity and accurate class prediction, while lower values may indicate imbalance or underfitting.
Since the F1 score accounts for both precision and recall, its harmonic mean provides a more comprehensive evaluation of model performance, with higher values representing more robust predictive ability.
The number of cross-validation (CV) folds directly influences the F1 score, as increasing the number of folds generally provides a more precise estimate of model performance. As shown in Figure 4, the effect varies across datasets. For instance, a CV value of 10 yields the highest F1 score for the Heart Disease dataset (0.5148), while the Digits dataset also achieves its peak F1 score at CV\,{=}\,10. In contrast, the Breast Cancer dataset maintains consistently high F1 scores across all tested CV values.
In general, higher CV values can improve the reliability of performance estimates on unseen data but at the cost of increased computational complexity. However, the performance gain may not always be substantial, and the optimal CV choice often depends on the dataset and model configuration.
The overall mean F1 score across all datasets and CV values was 0.6584, while the average maximum F1 score reached 0.6911. These results highlight the trade-off between precision and recall: precision reflects the proportion of correctly identified positive predictions, while recall measures the ability to detect all actual positive cases. Together, the harmonic mean of these metrics, represented by the F1 score, offers a balanced evaluation of model performance.
A higher F1 score reflects stronger model performance, with a value of 1 representing perfect precision and recall, while a score of 0 indicates poor performance. As shown in Figure 5, the average F1 scores varied across datasets and cross-validation (CV) folds. For example, the Breast Cancer dataset achieved consistently high F1 scores ranging from 0.897 to 0.959, whereas the Digits dataset showed greater variability, with scores ranging from 0.143 to 0.929. The overall mean F1 score of 0.6584 suggests moderate accuracy with room for improvement, while the average F1 score of 0.6911 indicates balanced performance across all datasets and CV values.
It is important to interpret these values in the context of the specific classification problem, as the acceptable threshold for F1 may vary depending on application requirements. Notably, the Heart Disease dataset exhibited comparatively lower F1 scores across all CV values, whereas the Liver Disorder dataset demonstrated relatively higher scores under certain CV configurations.
A higher F1 score reflects stronger model performance, while higher cross-validation (CV) values generally indicate better generalisation to unseen data. As shown in Figure 6, the model performed poorly on the Heart Disease and Liver Disorder datasets, with CV accuracy values between 0.645 and 0.651. In contrast, performance was stronger on the Digits and Breast Cancer datasets, with CV accuracies ranging from 0.772 to 0.925. F1 scores ranged from 0.515 to 0.918, with the Breast Cancer dataset achieving the highest value (0.91).
Overall, these results suggest that the model achieved only moderate performance, particularly on the medical datasets. To gain deeper insights, precision and recall should be considered alongside the F1 score, and future improvements may require hyperparameter refinement or the adoption of alternative algorithms.
The mean cross-validation (CV) accuracy of Logistic Regression models across various hyperparameter settings is displayed in Figures
6 and
7. The parameters evaluated included regularisation penalty (L1 or L2), class weights to handle unbalanced data, inverse regularisation strength (
), and solver algorithms (\texttt{liblinear} or \texttt{saga}). The findings show that prediction performance is significantly impacted by the choice of hyperparameters. For the Digits dataset, the highest accuracy was achieved with a
value of 0.001, balanced class weights, an L1 penalty, and either the \texttt{liblinear} or \texttt{saga} solver. CV accuracies ranged from 0.287 to 0.918.
When the class weights were balanced at 10 and the
values were 0.001 or 0.01, the Heart Disease dataset performed at its best, with accuracies between 0.533 and 0.556. The \texttt{liblinear} or \texttt{saga} solvers were used with an L1 penalty. Similarly, the Liver Disorder dataset performed best when
was set to 0.01 or 10 and class weights were set to 5 or 10, achieving accuracies ranging from 0.579 to 0.679. The solver was either \texttt{liblinear} or \texttt{saga}.
The Breast Cancer dataset consistently outperformed the others when trained using the L1 penalty and \texttt{liblinear} solver under 10-fold CV, with an average CV accuracy of 0.9315 and an F1 score of 0.9246. The average F1 score for all datasets combined was 0.6911, while the mean CV accuracy was 0.7559. These results demonstrate that selecting the appropriate penalty and solver is essential for optimising model performance and that hyperparameter tuning is highly dataset-dependent. The specific dataset characteristics and the required degree of regularisation dictate the optimal hyperparameter configuration for Logistic Regression models.
Using more complex models or refining hyperparameters can significantly increase predictive precision. The penalty hyperparameter defines the model’s regularisation technique—L1 for sparsity, L2 for smoothness—while the solver hyperparameter determines the optimisation algorithm used during model training. In this study, both \texttt{liblinear} and \texttt{saga} solvers were evaluated, demonstrating robust performance across different configurations. The data analysis indicates that the Breast Cancer dataset achieved the highest predictive stability under these tuned settings.
While cross-validation (CV) accuracy reflects the model's average performance across multiple training and testing splits, the F1 score provides a balanced measure of model accuracy by considering both precision and recall. In general, higher values of both metrics indicate better model performance. However, the outcomes exhibit substantial variation, as shown in Figure 8. Certain models perform considerably better than others, as evidenced by the wide range of CV accuracy values from 0.281 to 30.556 and F1 scores from 0.1428 to 0.8998.
Inconsistencies between the two metrics are further revealed upon closer inspection. The model that achieves the highest F1 score of 0.8998 also reports a strong CV accuracy of 0.8954, illustrating close alignment between the two measures. Conversely, the model with the highest CV accuracy (30.556) exhibits a comparatively low F1 score of 0.3571, indicating poor predictive reliability despite high validation accuracy. These discrepancies underscore the importance of evaluating multiple performance metrics rather than relying solely on a single indicator, as a comprehensive assessment ensures more accurate insights into model behaviour and generalisation capability.
Figure
9 presents the minimum and maximum CV accuracy and F1 score values across the models.The percentage of variance in the dependent variable that can be explained by the independent variables is measured by the coefficient of determination (
). In this study,
values ranged from 0.833 to 0.959, with higher values indicating a stronger model fit. Similarly, the
p-value evaluates the statistical significance of associations between variables, with smaller values (typically
) suggesting more reliable relationships. The observed
p-values varied widely, from 0.7072 to 30.55, reflecting differences in variable significance.
Collectively, higher
values, alongside improvements in cross-validation (CV) accuracy and F1 scores, indicate better model performance, while smaller
p-values reinforce the validity of the estimated coefficients.
Train--Test Evaluation
To evaluate the model's efficacy, the dataset was divided into training and testing subsets according to a defined test size. The test size represents the portion of data reserved exclusively for testing, while the remaining data is used for model training. This separation ensures that model performance is assessed on unseen data, thereby providing an unbiased measure of generalisation capability.
Specifically, test size proportions of 20%, 30%, and 40% were selected, taking into account dataset size and model complexity (Figure 10). With a 20% test size, 80% of the data is used for training while 20% is reserved for testing. Although this split provides more data for training, it may increase the risk of overfitting and yield overly optimistic results.
In contrast, a 30% test size offers a balanced approach by allocating 70% of the data for training and 30% for testing, which is widely regarded as a standard practice in model evaluation. This configuration produces more robust performance estimates by ensuring there is sufficient data for both training and validation.
When the test size is increased to 40%, the model is trained on 60% of the data and evaluated on the remaining 40%. This approach reduces the likelihood of overfitting by employing a larger test set; however, it may also lead to increased score variability and slightly weaker generalisation due to reduced training data. By combining these different splits, the study provides a comprehensive evaluation of the Logistic Regression model's resilience across datasets with varying feature characteristics.
Regardless of the classification criterion, the Area Under the Curve (AUC) statistic evaluates a model's ability to discriminate between positive and negative classes. As shown in Figure 11, the Breast Cancer dataset achieved the best performance, with an F1 score of 96% and an AUC of 99% across various test sizes. The Digits dataset also performed strongly, recording an AUC of 98% and an F1 score of 88%, though both metrics were slightly lower than those of the Breast Cancer dataset.
In contrast, the Heart Disease dataset demonstrated the weakest results, with an AUC of 62% and an F1 score of only 26%. The Liver Disorder dataset achieved moderate results, with an AUC of 68% and an F1 score of 35%, surpassing Heart Disease but remaining below Breast Cancer and Digits.
These differences can be attributed to dataset characteristics. The Breast Cancer dataset contains a balanced distribution of samples and highly informative features, enabling strong class discrimination. The Digits dataset's high AUC is supported by its large number of observations, simple numerical features, and easily distinguishable categorical targets. Conversely, the Heart Disease dataset's lower discriminatory strength likely stems from overlapping or less distinct features. Although it still performs below Breast Cancer and Digits, the Liver Disorder dataset outperforms Heart Disease due to clearer class separation and fewer, more meaningful attributes that help mitigate overfitting.
The Breast Cancer dataset achieved the highest F1 score of 96% across various test sizes, likely due to its highly relevant features, substantial data variability, and balanced class distribution. With an F1 score of 81% at a 0.2 test size, the Digits dataset outperformed both the Liver Disorder and Heart Disease datasets (Figure 12). This superior performance can be attributed to its numerical features, larger sample size, and categorical target variable, all of which facilitate easier learning and classification.
In contrast, the Heart Disease dataset recorded the lowest F1 score of 24% at a 0.4 test size. This poor performance may result from less informative features, class imbalance, overlapping feature spaces, and limited data variability, all of which restrict generalisation. The Liver Disorder dataset demonstrated intermediate performance, achieving an F1 score 45% higher than Heart Disease at a 0.2 test size, yet remaining below Breast Cancer and Digits. This relative improvement may be explained by clearer class separation and a more balanced feature distribution.
On average, the F1 score across all datasets was 0.5995, with a median of 0.6151, indicating moderate overall performance of the Logistic Regression model under varying test size configurations.
Figure 13 presents the training and testing accuracies for the four datasets: Breast Cancer, Digits, Heart Disease, and Liver Disorder. The Breast Cancer dataset achieved 97% accuracy on both training and testing, indicating strong generalisation to unseen data. The Digits dataset reached 100% training accuracy, suggesting potential overfitting, although its high test accuracy demonstrates that it can still perform well on new inputs.
For the Heart Disease dataset, both training and testing accuracies were 90%, reflecting consistent performance across seen and unseen data. In contrast, the Liver Disorder dataset achieved 69% training accuracy and 79% testing accuracy, indicating possible underfitting. However, the relatively higher test accuracy suggests modest generalisation capability.
Overall, balanced training and testing accuracies are essential for reliable model performance, ensuring that the model captures meaningful patterns without overfitting or underfitting.
Figure 14 shows that the Breast Cancer and Digits datasets achieved higher AUC, F1, test, and train accuracies compared to the Heart Disease and Liver Disorder datasets when using the penalty (\texttt{l1}), class weight (\texttt{balanced}, \texttt{dict}), and solver (\texttt{liblinear}, \texttt{saga}) parameters. The Breast Cancer dataset achieved accuracies ranging from 90% to 99%, while the Digits dataset ranged from 70% to 98%. In contrast, the Heart Disease and Liver Disorder datasets performed considerably lower, with accuracies between 26%–64% and 35%–67%, respectively. On the Digits dataset, the \texttt{liblinear} solver consistently outperformed \texttt{saga}, whereas \texttt{saga} proved more effective for the Breast Cancer dataset. For Heart Disease and Liver Disorder, performance remained relatively weaker across parameter configurations.
These findings highlight the significant influence of penalties, class weights, and solver choices on model performance. Appropriate hyperparameter tuning, such as applying \texttt{class_weight = balanced} for imbalanced datasets, is essential for improving generalisation and maintaining a balance between bias and variance. Overall, the optimal combination of penalty and solver parameters plays a crucial role in maximising the Logistic Regression model's predictive stability and interpretability.
With superior AUC, F1, training, and testing accuracies, the Breast Cancer dataset performed best overall, as shown in Figure 15. The Liver Disorder dataset exhibited the lowest performance, followed by the Digits and Heart Disease datasets, both of which demonstrated moderate results. The Breast Cancer dataset showed strong generalisation to unseen data with an exceptional test accuracy of 0.9766, while the Digits dataset achieved a maximum training accuracy of 1.0000, suggesting well-optimised parameter selection.
The model's ability to effectively balance sensitivity and specificity is demonstrated by the highest AUC value of 0.992, observed for the Digits dataset. Similarly, the Breast Cancer dataset achieved the highest F1 score of 0.9815, reflecting an ideal balance between recall and precision. These findings emphasise the critical role of meticulous hyperparameter tuning in enhancing Logistic Regression model performance, ensuring that predictions remain accurate and reliable across diverse datasets.
A fitted power trend line illustrating the general pattern of the data is shown in the scatter plot in Figure 16, which represents the model's performance. With a closely matched test accuracy of 0.94 and a strong training accuracy of 0.985, the model exhibited minimal overfitting and excellent generalisation. Although a small cluster of points between 0.55 and 0.80 indicates moderate performance in certain cases, the majority of observations are concentrated between 0.85 and 1.00, suggesting high predictive reliability.
The sparse distribution of points between 0.25 and 0.50 further highlights specific regions where model optimisation could yield improvements. The statistical indicators reinforce the model's robustness: an
value of 0.95 indicates that 95% of the variance in the outcome variable is explained by the predictors, demonstrating an excellent model fit. Furthermore, the low
p-value of 0.0001 establishes the statistical significance of this relationship, confirming that the observed results are highly unlikely to have occurred by chance.
AUC and F1 are key metrics for evaluating classification models, where the F1 score reflects the balance between precision and recall, and AUC measures the model's ability to distinguish between classes. In regression analysis,
quantifies the proportion of variance in the outcome explained by the predictors, while a
p-value below 0.05 indicates statistical significance. Our analysis shows moderate model performance, with an AUC of 0.81 and an F1 score of 0.59. After applying a polynomial trend line, the
value improved to 0.88, suggesting a strong model fit, while the very low
p-value (
) confirms the statistical significance of the predictor–outcome relationship.
As illustrated in Figure 17, the scatter plot further depicts this relationship: most data points cluster between 0.68 and 1.00, indicating relatively strong performance across these instances, while fewer points fall within the 0.50–0.67 range, reflecting weaker predictive accuracy in certain cases.
References:
BUPA Medical Research, Ltd. Liver {Disorders} {Dataset}. 1990, May, UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/datasets/liver +disorders
Gok, Ceren E. Heart {Disease} {Prediction}. 2022, March, OpenML, https://www.openml.org/search?type=data &status=active &id=43823
scikit-learn developers. sklearn.datasets.load\_breast\_cancer — scikit-learn documentation. scikit-learn.org, https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load\_breast\_cancer.html
Ambesange, Sateesh and Vijayalaxmi, A. and Sridevi, S. and {Venkateswaran} and Yashoda, B. S. (2020) Multiple {Heart} {Diseases} {Prediction} using {Logistic} {Regression} with {Ensemble} and {Hyper} {Parameter} tuning {Techniques}. IEEE, London, United Kingdom, 827--832, July, 2020 {Fourth} {World} {Conference} on {Smart} {Trends} in {Systems}, {Security} and {Sustainability} ({WorldS4}), 2025-11-03, 10.1109/WorldS450073.2020.9210404, https://ieeexplore.ieee.org/document/9210404/, 9781728168234, https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
Sun, Deliang and Xu, Jiahui and Wen, Haijia and Wang, Danzhou (2021) Assessment of landslide susceptibility mapping based on {Bayesian} hyperparameter optimization: {A} comparison between logistic regression and random forest. Engineering Geology 281: 105972 https://doi.org/10.1016/j.enggeo.2020.105972, February, 2025-11-03, en, https://linkinghub.elsevier.com/retrieve/pii/S001379522031869X, Assessment of landslide susceptibility mapping based on {Bayesian} hyperparameter optimization, 00137952
Christodoulou, Evangelia and Ma, Jie and Collins, Gary S. and Steyerberg, Ewout W. and Verbakel, Jan Y. and Van Calster, Ben (2019) A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. Journal of Clinical Epidemiology 110: 12--22 https://doi.org/10.1016/j.jclinepi.2019.02.004, June, 2025-11-03, en, https://linkinghub.elsevier.com/retrieve/pii/S0895435618310813, 08954356
(2004) The statistical evaluation of medical tests for classification and prediction. Oxford University Press, Oxford, Pepe, Margaret Sullivan, 31, eng, 9780198509844 9780191588617, Oxford statistical science series
Zhang, Zhongheng (2016) Model building strategy for logistic regression: purposeful selection. Annals of Translational Medicine 4(6): 111--111 https://doi.org/10.21037/atm.2016.02.15, March, 2025-11-03, http://atm.amegroups.com/article/view/9400/10262, Model building strategy for logistic regression, 23055839, 23055847
Wong, Jenna and Manderson, Travis and Abrahamowicz, Michal and Buckeridge, David L and Tamblyn, Robyn (2019) Can {Hyperparameter} {Tuning} {Improve} the {Performance} of a {Super} {Learner}?: {A} {Case} {Study}. Epidemiology 30(4): 521--531 https://doi.org/10.1097/EDE.0000000000001027, July, 2025-11-03, en, https://journals.lww.com/00001648-201907000-00009, Can {Hyperparameter} {Tuning} {Improve} the {Performance} of a {Super} {Learner}?, 1044-3983, http://creativecommons.org/licenses/by-nc-nd/4.0/
Bagley, Steven C and White, Halbert and Golomb, Beatrice A (2001) Logistic regression in the medical literature:. Journal of Clinical Epidemiology 54(10): 979--985 https://doi.org/10.1016/S0895-4356(01)00372-9, October, 2025-11-03, en, https://linkinghub.elsevier.com/retrieve/pii/S0895435601003729, Logistic regression in the medical literature, 08954356, https://www.elsevier.com/tdm/userlicense/1.0/
Hutter, F. and Kotthoff, L. and Vanschoren, J. (2019) The {Springer} {Series} on {Challenges} in {Machine} {Learning}. Springer, https://library.oapen.org/bitstream/handle/20.500.12657/23012/1/1007149.pdf\#page=15
Nand Kumar, Et Al. (2023) Enhancing {Robustness} and {Generalization} in {Deep} {Learning} {Models} for {Image} {Processing}. Power System Technology 47(4): 278--293 https://doi.org/10.52783/pst.193, December, 2025-11-03, https://powertechjournal.com/index.php/journal/article/view/193, 1000-3673
D., Yogatama and G., Mann (2014) Efficient {Transfer} {Learning} {Method} for {Automatic} {Hyperparameter} {Tuning}. PMLR, 1077--1085, April, Proceedings of the 31st {International} {Conference} on {Machine} {Learning} ({ICML} 2014), https://proceedings.mlr.press/v33/yogatama14.html
Zhang, Chun-Xia and Xu, Shuang and Zhang, Jiang-She (2019) A novel variational {Bayesian} method for variable selection in logistic regression models. Computational Statistics & Data Analysis 133: 1--19 https://doi.org/10.1016/j.csda.2018.08.025, May, 2025-11-03, en, https://linkinghub.elsevier.com/retrieve/pii/S0167947318302081, 01679473
Bertinetto, Luca and Henriques, Jo ão F. and Torr, Philip H. S. and Vedaldi, Andrea. Meta-learning with differentiable closed-form solvers. arXiv:1805.08136. Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Statistics - Machine Learning, 2019, July, arXiv, 2025-11-03, 10.48550/arXiv.1805.08136, http://arxiv.org/abs/1805.08136
L'Heureux, Alexandra and Grolinger, Katarina and Elyamany, Hany F. and Capretz, Miriam A. M. (2017) Machine {Learning} {With} {Big} {Data}: {Challenges} and {Approaches}. IEEE Access 5: 7776--7797 https://doi.org/10.1109/ACCESS.2017.2696365, 2025-11-03, https://ieeexplore.ieee.org/document/7906512/, Machine {Learning} {With} {Big} {Data}, 2169-3536, https://creativecommons.org/licenses/by/3.0/legalcode
A., Singh and N., Thakur and A., Sharma (2016) A {Review} of {Supervised} {Machine} {Learning} {Algorithms}. IEEE, https://ieeexplore.ieee.org/abstract/document/7724478
Shanthi, D. L. and Chethan, N. (2022) Genetic {Algorithm} {Based} {Hyper}-{Parameter} {Tuning} to {Improve} the {Performance} of {Machine} {Learning} {Models}. SN Computer Science 4(2): 119 https://doi.org/10.1007/s42979-022-01537-8, December, 2025-11-03, en, https://link.springer.com/10.1007/s42979-022-01537-8, 2661-8907
Xu, Zhe and Tao, Dacheng and Zhang, Ya and Wu, Junjie and Tsoi, Ah Chung Architectural {Style} {Classification} {Using} {Multinomial} {Latent} {Logistic} {Regression}. In: Fleet, David and Pajdla, Tomas and Schiele, Bernt and Tuytelaars, Tinne (Eds.) Computer {Vision} – {ECCV} 2014, 600--615, 10.1007/978-3-319-10590-1\_39, 2014, Springer International Publishing, 2025-11-03, en, http://link.springer.com/10.1007/978-3-319-10590-1\_39, 9783319105895 9783319105901, http://www.springer.com/tdm, 8689, Cham
Salazar, Jose J. and Garland, Lean and Ochoa, Jesus and Pyrcz, Michael J. (2022) Fair train-test split in machine learning: {Mitigating} spatial autocorrelation for improved prediction accuracy. Journal of Petroleum Science and Engineering 209: 109885 https://doi.org/10.1016/j.petrol.2021.109885, February, 2025-11-03, en, https://linkinghub.elsevier.com/retrieve/pii/S0920410521015023, Fair train-test split in machine learning, 09204105
Huang, Robert J. and Kwon, Nicole Sung-Eun and Tomizawa, Yutaka and Choi, Alyssa Y. and Hernandez-Boussard, Tina and Hwang, Joo Ha (2022) A {Comparison} of {Logistic} {Regression} {Against} {Machine} {Learning} {Algorithms} for {Gastric} {Cancer} {Risk} {Prediction} {Within} {Real}-{World} {Clinical} {Data} {Streams}. JCO Clinical Cancer Informatics (6): e2200039 https://doi.org/10.1200/CCI.22.00039, June, 2025-11-03, en, https://ascopubs.org/doi/10.1200/CCI.22.00039, 2473-4276
Ifrim, Georgiana and Bakir, G ökhan and Weikum, Gerhard (2008) Fast logistic regression for text categorization with variable-length n-grams. ACM, Las Vegas Nevada USA, 354--362, August, Proceedings of the 14th {ACM} {SIGKDD} international conference on {Knowledge} discovery and data mining, 2025-11-03, en, 10.1145/1401890.1401936, https://dl.acm.org/doi/10.1145/1401890.1401936, 9781605581934
Mu, Fanglin and Gu, Yu and Zhang, Jie and Zhang, Lei (2020) Milk {Source} {Identification} and {Milk} {Quality} {Estimation} {Using} an {Electronic} {Nose} and {Machine} {Learning} {Techniques}. Sensors 20(15): 4238 https://doi.org/10.3390/s20154238, July, 2025-11-03, en, https://www.mdpi.com/1424-8220/20/15/4238, 1424-8220
Chao, Cheng-Min and Yu, Ya-Wen and Cheng, Bor-Wen and Kuo, Yao-Lung (2014) Construction the {Model} on the {Breast} {Cancer} {Survival} {Analysis} {Use} {Support} {Vector} {Machine}, {Logistic} {Regression} and {Decision} {Tree}. Journal of Medical Systems 38(10): 106 https://doi.org/10.1007/s10916-014-0106-1, October, 2025-11-03, en, http://link.springer.com/10.1007/s10916-014-0106-1, 0148-5598, 1573-689X
A., Kulkarni and F.A., Batarseh Foundations of {Data} {Imbalance} and {Solutions} for a {Data} {Democracy}. Data {Democracy}: {At} the {Nexus} of {Artificial} {Intelligence}, {Software} {Development} and {Knowledge} {Engineering}, 83--106, 2020, Academic Press, https://www.sciencedirect.com/science/article/pii/B9780128183663000058
Seddik, Ahmed F. and Shawky, Doaa M. (2015) Logistic regression model for breast cancer automatic diagnosis. IEEE, London, United Kingdom, 150--154, November, 2015 {SAI} {Intelligent} {Systems} {Conference} ({IntelliSys}), 2025-11-03, 10.1109/IntelliSys.2015.7361138, https://ieeexplore.ieee.org/document/7361138, 9781467376068
Saw, Montu and Saxena, Tarun and Kaithwas, Sanjana and Yadav, Rahul and Lal, Nidhi (2020) Retraction {Notice}: {Estimation} of {Prediction} for {Getting} {Heart} {Disease} {Using} {Logistic} {Regression} {Model} of {Machine} {Learning}. IEEE, Coimbatore, India, 1--1, January, 2020 {International} {Conference} on {Computer} {Communication} and {Informatics} ({ICCCI}), 2025-11-03, 10.1109/ICCCI48352.2020.10467199, https://ieeexplore.ieee.org/document/10467199/, Retraction {Notice}, 9781728145143, https://doi.org/10.15223/policy-029
Wu, Chieh-Chen and Yeh, Wen-Chun and Hsu, Wen-Ding and Islam, Md. Mohaimenul and Nguyen, Phung Anh (Alex) and Poly, Tahmina Nasrin and Wang, Yao-Chin and Yang, Hsuan-Chia and (Jack) Li, Yu-Chuan (2019) Prediction of fatty liver disease using machine learning algorithms. Computer Methods and Programs in Biomedicine 170: 23--29 https://doi.org/10.1016/j.cmpb.2018.12.032, March, 2025-11-03, en, https://linkinghub.elsevier.com/retrieve/pii/S0169260718315724, 01692607
Farooq, Faizan and Tandon, Siddhant and Parashar, Pankaj and Sengar, Prateek (2016) Vectorized code implementation of {Logistic} {Regression} and {Artificial} {Neural} {Networks} to recognize handwritten digit. IEEE, Delhi, India, 1--5, July, 2016 {IEEE} 1st {International} {Conference} on {Power} {Electronics}, {Intelligent} {Control} and {Energy} {Systems} ({ICPEICES}), 2025-11-03, 10.1109/ICPEICES.2016.7853346, http://ieeexplore.ieee.org/document/7853346/, 9781467385879
Al Nabki, Mhd Wesam and Fidalgo, Eduardo and Alegre, Enrique and De Paz, Ivan (2017) Classifying {Illegal} {Activities} on {Tor} {Network} {Based} on {Web} {Textual} {Contents}. Association for Computational Linguistics, Valencia, Spain, 35--43, Proceedings of the 15th {Conference} of the {European} {Chapter} of the {Association} for {Computational} {Linguistics}: {Volume} 1, {Long} {Papers}, 2025-11-03, en, 10.18653/v1/E17-1004, http://aclweb.org/anthology/E17-1004
Graja, Omar and Azam, Muhammad and Bouguila, Nizar (2018) Breast {Cancer} {Diagnosis} using {Quality} {Control} {Charts} and {Logistic} {Regression}. IEEE, Rabat, Morocco, 215--220, November, 2018 9th {International} {Symposium} on {Signal}, {Image}, {Video} and {Communications} ({ISIVC}), 2025-11-03, 10.1109/ISIVC.2018.8709214, https://ieeexplore.ieee.org/document/8709214/, 9781538681732
Egaji, Oche Alexander and Evans, Gareth and Griffiths, Mark Graham and Islas, Gregory (2021) Real-time machine learning-based approach for pothole detection. Expert Systems with Applications 184: 115562 https://doi.org/10.1016/j.eswa.2021.115562, December, 2025-11-03, en, https://linkinghub.elsevier.com/retrieve/pii/S0957417421009684, 09574174
Jain, Sanyam. {DeepSeaNet}: {Improving} {Underwater} {Object} {Detection} using {EfficientDet}. arXiv:2306.06075. Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, 2024, January, arXiv, 2025-11-03, 10.48550/arXiv.2306.06075, http://arxiv.org/abs/2306.06075, {DeepSeaNet}
Belete, Daniel Mesafint and Huchaiah, Manjaiah D. (2022) Grid search in hyperparameter optimization of machine learning models for prediction of {HIV}/{AIDS} test results. International Journal of Computers and Applications 44(9): 875--886 https://doi.org/10.1080/1206212X.2021.1974663, September, 2025-11-03, en, https://www.tandfonline.com/doi/full/10.1080/1206212X.2021.1974663, 1206-212X, 1925-7074
Ahmad, Ghulab Nabi and {Shafiullah} and Fatima, Hira and Abbas, Mohamed and Rahman, Obaidur and {Imdadullah} and Alqahtani, Mohammed S. (2022) Mixed {Machine} {Learning} {Approach} for {Efficient} {Prediction} of {Human} {Heart} {Disease} by {Identifying} the {Numerical} and {Categorical} {Features}. Applied Sciences 12(15): 7449 https://doi.org/10.3390/app12157449, July, 2025-11-03, en, https://www.mdpi.com/2076-3417/12/15/7449, 2076-3417
Hossain, Md. Imam and Maruf, Mehadi Hasan and Khan, Md. Ashikur Rahman and Prity, Farida Siddiqi and Fatema, Sharmin and Ejaz, Md. Sabbir and Khan, Md. Ahnaf Sad (2023) Heart disease prediction using distinct artificial intelligence techniques: performance analysis and comparison. Iran Journal of Computer Science 6(4): 397--417 https://doi.org/10.1007/s42044-023-00148-7, December, 2025-11-03, en, https://link.springer.com/10.1007/s42044-023-00148-7, Heart disease prediction using distinct artificial intelligence techniques, 2520-8438, 2520-8446
Battineni, Gopi Machine {Learning} and {Deep} {Learning} {Algorithms} in the {Diagnosis} of {Chronic} {Diseases}. In: Bandyopadhyay, Mainak and Rout, Minakhi and Chandra Satapathy, Suresh (Eds.) Machine {Learning} {Approaches} for {Urban} {Computing}, 141--164, 10.1007/978-981-16-0935-0_7, 2021, Springer Singapore, 2025-11-03, en, https://link.springer.com/10.1007/978-981-16-0935-0_7, 9789811609343 9789811609350, 968, Singapore
Nusinovici, Simon and Tham, Yih Chung and Chak Yan, Marco Yu and Wei Ting, Daniel Shu and Li, Jialiang and Sabanayagam, Charumathi and Wong, Tien Yin and Cheng, Ching-Yu (2020) Logistic regression was as good as machine learning for predicting major chronic diseases. Journal of Clinical Epidemiology 122: 56--69 https://doi.org/10.1016/j.jclinepi.2020.03.002, June, 2025-11-03, en, https://linkinghub.elsevier.com/retrieve/pii/S0895435619310194, 08954356
Liu, Lei (2018) Research on {Logistic} {Regression} {Algorithm} of {Breast} {Cancer} {Diagnose} {Data} by {Machine} {Learning}. IEEE, Changsha, China, 157--160, May, 2018 {International} {Conference} on {Robots} & {Intelligent} {System} ({ICRIS}), 2025-11-03, 10.1109/ICRIS.2018.00049, https://ieeexplore.ieee.org/document/8410258/, 9781538665800
Schratz, Patrick and Muenchow, Jannes and Iturritxa, Eugenia and Richter, Jakob and Brenning, Alexander (2019) Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data. Ecological Modelling 406: 109--120 https://doi.org/10.1016/j.ecolmodel.2019.06.002, August, 2025-11-03, en, https://linkinghub.elsevier.com/retrieve/pii/S0304380019302145, 03043800
Varoquaux, G. and Buitinck, L. and Louppe, G. and Grisel, O. and Pedregosa, F. and Mueller, A. (2015) Scikit-learn: {Machine} {Learning} {Without} {Learning} the {Machinery}. GetMobile: Mobile Computing and Communications 19(1): 29--33 https://doi.org/10.1145/2786984.2786995, June, 2025-11-03, en, https://dl.acm.org/doi/10.1145/2786984.2786995, Scikit-learn, 2375-0529, 2375-0537
Brownlee, Jason (2021) Machine learning mastery with {Python}: understand your data, create accurate models and work projects end-to-end. Jason Brownlee, [Australia], Python (Computer program language), Neural networks (Computer science), OCLC: 1348139196, {Machine Learning Mastery}, eng, Machine learning mastery with {Python}, 9798540446273, Edition: v1.20
Sokolova, Marina and Lapalme, Guy (2009) A systematic analysis of performance measures for classification tasks. Information Processing & Management 45(4): 427--437 https://doi.org/10.1016/j.ipm.2009.03.002, July, 2025-11-03, en, https://linkinghub.elsevier.com/retrieve/pii/S0306457309000259, 03064573, https://www.elsevier.com/tdm/userlicense/1.0/
Stepanek, Hannah Loading and {Normalizing} {Data}. Thinking in {Pandas}, 65--108, 10.1007/978-1-4842-5839-2\_4, 2020, Stepanek, Hannah, Apress, 2025-11-03, en, http://link.springer.com/10.1007/978-1-4842-5839-2\_4, 9781484258385 9781484258392, Berkeley, CA
Bailly, Alexandre and Blanc, Corentin and Francis, Élie and Guillotin, Thierry and Jamal, Fadi and Wakim, B échara and Roy, Pascal (2022) Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models. Computer Methods and Programs in Biomedicine 213: 106504 https://doi.org/10.1016/j.cmpb.2021.106504, January, 2025-11-03, en, https://linkinghub.elsevier.com/retrieve/pii/S0169260721005782, 01692607
Roberts, David R. and Bahn, Volker and Ciuti, Simone and Boyce, Mark S. and Elith, Jane and Guillera ‐Arroita, Gurutzeta and Hauenstein, Severin and Lahoz ‐Monfort, Jos é J. and Schr öder, Boris and Thuiller, Wilfried and Warton, David I. and Wintle, Brendan A. and Hartig, Florian and Dormann, Carsten F. (2017) Cross ‐validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 40(8): 913--929 https://doi.org/10.1111/ecog.02881, August, 2025-11-03, en, https://nsojournals.onlinelibrary.wiley.com/doi/10.1111/ecog.02881, 0906-7590, 1600-0587
Gamb äck, Bj örn and Sikdar, Utpal Kumar (2017) Using {Convolutional} {Neural} {Networks} to {Classify} {Hate}-{Speech}. Association for Computational Linguistics, Vancouver, BC, Canada, 85--90, Proceedings of the {First} {Workshop} on {Abusive} {Language} {Online}, 2025-11-03, en, 10.18653/v1/W17-3013, http://aclweb.org/anthology/W17-3013
Patrick Schratz and Jannes Muenchow and Eugenia Iturritxa and Jakob Richter and Alexander Brenning (2019) Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data. Ecological Modelling 406: 109-120 https://doi.org/https://doi.org/10.1016/j.ecolmodel.2019.06.002, Spatial modeling, Machine-learning, Spatial autocorrelation, Hyperparameter tuning, Spatial cross-validation, https://www.sciencedirect.com/science/article/pii/S0304380019302145, 0304-3800
Deliang Sun and Jiahui Xu and Haijia Wen and Danzhou Wang (2021) Assessment of landslide susceptibility mapping based on Bayesian hyperparameter optimization: A comparison between logistic regression and random forest. Engineering Geology 281: 105972 https://doi.org/https://doi.org/10.1016/j.enggeo.2020.105972, Landslide, Bayesian hyperparameter optimization, Logical regression, Random forest, Landslide susceptibility mapping, https://www.sciencedirect.com/science/article/pii/S001379522031869X, 0013-7952
Andr{\'e} Pfob and Lu, \{Sheng Chieh\} and Chris Sidey-Gibbons (2022) Machine learning in medicine: a practical introduction to techniques for data pre-processing, hyperparameter tuning, and model comparison. BMC Medical Research Methodology 22(1) https://doi.org/10.1186/s12874-022-01758-8, BioMed Central, 1471-2288, English (US), December, Funding Information: None. None. Publisher Copyright: {\textcopyright} 2022, The Author(s)., Artificial intelligence, Guideline, Machine learning, Medicine
Ambesange, Sateesh and Vijayalaxmi, A. and Sridevi, S. and Venkateswaran and Yashoda, B. S. (2020) Multiple Heart Diseases Prediction using Logistic Regression with Ensemble and Hyper Parameter tuning Techniques. 10.1109/WorldS450073.2020.9210404, Diseases;Heart;Logistics;Principal component analysis;Prediction algorithms;Tuning;Predictive models;Heart disease prediction;Grid search;Random Search;Logistic regression;Turkey fence outlier;Feature Selection;Univariate Data Analysis;Power Transformation;Kernel PCA;Precision-Recall Curve;Extra Trees Classifier, 827-832, , , 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4)
Caner Erden and Halil Ibrahim Demir and Abdullah Hulusi Kokccam (2023) Enhancing Machine Learning Model Performance with Hyper Parameter Optimization: A Comparative Study. ArXiv abs/2302.11406https://api.semanticscholar.org/CorpusID:257079151