Association of residual cholesterol-inflammation index with MAFLD and related mortality risk: a population-based study integrating mediation and machine learning analyses
A
ZhongqiaoLu1
YingxiaHu2
DeshanZong2
BinYue2✉Emailhelloybin@163.com
1Department of Cardiology, Affiliated Wenshan HospitalThe People’s Hospital of Wenshan Prefecture, Kunming University of Science and Technology663000Wenshan, YunnanChina
2Department of Gastroenterology, Affiliated Wenshan HospitalThe People’s Hospital of Wenshan Prefecture, Kunming University of Science and Technology663000Wenshan, YunnanChina
Zhongqiao Lu 1, Yingxia Hu 2, Deshan Zong 2, Bin Yue 2 *
1 Department of Cardiology, The People’s Hospital of Wenshan Prefecture; Affiliated Wenshan Hospital, Kunming University of Science and Technology, Wenshan, Yunnan, 663000, China
2 Department of Gastroenterology, The People’s Hospital of Wenshan Prefecture; Affiliated Wenshan Hospital, Kunming University of Science and Technology, Wenshan, Yunnan, 663000, China
* Corresponding author: Bin Yue
Email: helloybin@163.com
Abstract
Background
The residual cholesterol-inflammation index (RCII), a composite indicator integrating lipid metabolism and systemic inflammation, may serve as a novel predictor for metabolic dysfunction-associated fatty liver disease (MAFLD) and its related adverse outcomes. This study aimed to investigate the association between RCII and the risks of MAFLD and related mortality, assess its predictive value in clinical settings, and explore the mediating role of fasting plasma glucose (FPG) in these relationships.
Methods
A total of 13,254 participants from the NHANES 1999–2010 cycles were included. RC, CRP, and RCII were evaluated as exposures, with their distributions compared between MAFLD and non-MAFLD populations. Multivariable logistic and Cox regression models were used to assess the associations of RCII with MAFLD prevalence and three types of mortality (all-cause, cardiovascular, and premature). Nonlinear relationships were examined using restricted cubic splines (RCS). Mediation analysis was conducted to quantify the contribution of FPG to RCII-related risks, complemented by Mendelian randomization to infer causal effects of TC, HDL-C, LDL-C, and CRP on MAFLD. Multiple machine learning models were constructed to evaluate the predictive utility of RCII, with SHapley Additive exPlanations (SHAP) used for model interpretation.
Results
Compared to non-MAFLD individuals, participants with MAFLD exhibited pronounced metabolic dysregulation and inflammation, with significantly elevated RCII levels. RCII showed the strongest predictive power for MAFLD (Q4 vs Q1: OR = 17.79, P < 0.001). Higher RCII levels were independently associated with increased risks of MAFLD-related all-cause, cardiovascular, and premature death in both Kaplan–Meier and Cox models, with a clear dose-response pattern. These associations remained consistent across subgroups, with evidence of interaction effects. Mediation analysis revealed that FPG partially mediated the relationship between RCII and adverse outcomes, accounting for 2.02%–8.06% of the total effect. Among all models, the random forest algorithm achieved the highest predictive performance (accuracy = 89.70%, AUC = 0.960), with SHAP analysis confirming RCII as a top-ranking feature.
Conclusions
RCII is independently and positively associated with both MAFLD risk and related mortality outcomes, demonstrating robust predictive capability. Its effects may be partially mediated by FPG. These findings underscore the potential of RCII as a clinically valuable biomarker for early identification and stratified management of individuals with high metabolic-inflammatory burdens.
Keywords
Residual cholesterol-inflammation index (RCII)
Metabolic dysfunction-associated fatty liver disease (MAFLD)
Mendelian randomization (MR)
Fasting plasma glucose (FPG)
NHANES
mediation analysis
mortality risk
machine learning
A
A
Introduction
A
Metabolic dysfunction-associated fatty liver disease (MAFLD) has emerged as the most prevalent chronic liver condition globally, driven by the escalating burden of metabolic dysregulation, chronic inflammation, and insulin resistance [
1,
2]. With an estimated prevalence exceeding 30% among adults worldwide, MAFLD is now a leading contributor to cirrhosis, hepatocellular carcinoma, and all-cause mortality, posing a significant global public health challenge [
3]. In China, the prevalence of MAFLD is rising at an alarming rate, placing increasing strain on healthcare systems and national resources [
4]. Early identification and precise risk stratification of high-risk individuals are crucial for effective prevention and intervention. However, current predictive tools lack robust composite indices that simultaneously capture metabolic overload and chronic inflammatory status, thereby limiting the development of efficient screening and targeted management strategies.
Against this backdrop, remnant cholesterol (RC)—a triglyceride-rich component of atherogenic lipoproteins primarily comprising very-low-density lipoproteins (VLDL), intermediate-density lipoproteins (IDL), and chylomicron remnants—has garnered increasing attention in the field of metabolic disease research [5]. Robust evidence from large-scale cohort studies has established RC as an independent predictor of cardiovascular events, beyond the traditional lipid markers LDL-C and HDL-C [6], and has implicated it as a key contributor to the pathogenesis of atherosclerosis. Recently, attention has turned to the potential mechanistic role of RC in metabolic liver diseases. Studies based on the NHANES population have demonstrated a strong association between serum RC levels and both hepatic steatosis and fibrosis in individuals with nonalcoholic fatty liver disease (NAFLD), with RC outperforming conventional cholesterol measures in predicting liver stiffness, underscoring its potential clinical relevance in hepatic risk stratification [7]. C-reactive protein (CRP), one of the most widely used biomarkers of chronic low-grade inflammation, has long been recognized as a critical factor in the progression of NAFLD/MAFLD [8–10]. Chronic systemic inflammation, moreover, has been shown to exacerbate adipose tissue dysfunction, disrupt insulin signaling pathways, and amplify hepatic steatosis and hepatocyte apoptosis via the adipose-liver axis [11, 12].
Single biomarkers often fall short in capturing the complex interplay between metabolic dysfunction and chronic inflammation in multifactorial diseases. To address this limitation, the RCII—a composite metric integrating RC and CRP—has recently been proposed as a surrogate for the coupled metabolic–inflammatory axis. Emerging evidence supports the predictive utility of RCII across a spectrum of chronic conditions. For instance, studies based on NHANES and CHARLS cohorts identified RCII as an independent predictor of incident stroke, with a graded increase in 7-year stroke risk across RCII quartiles [13]. Similarly, another NHANES-based analysis demonstrated that RCII outperforms RC or CRP alone in predicting all-cause, cardiovascular, and cancer-related mortality, exhibiting robust dose–response associations [14].
Although the RCII has shown preliminary promise in predicting several chronic diseases, its prognostic value, mechanistic relevance, and generalizability across populations in the context of MAFLD and associated mortality remain poorly characterized. In particular, it is unclear whether RCII exhibits a nonlinear association with MAFLD risk, whether it serves as an independent predictor, and whether its effects are mediated through specific metabolic pathways such as fasting plasma glucose. To date, no comprehensive epidemiological evidence has systematically addressed these questions. Furthermore, conventional statistical models are inherently limited in capturing complex feature interactions and high-dimensional data structures, potentially obscuring key risk patterns in chronic disease prediction. In contrast, machine learning approaches have emerged as powerful tools in medical risk stratification, offering enhanced accuracy, robustness, and the ability to uncover latent risk determinants in large-scale, multi-variable datasets.
To address these gaps, we leveraged cross-sectional and longitudinal data from the 1999–2010 National Health and Nutrition Examination Survey (NHANES) to systematically evaluate the association between the RCII and both MAFLD risk and related mortality outcomes. We further investigated potential nonlinear exposure–response relationships and mediating metabolic pathways. In parallel, we employed Boruta-based feature selection and a suite of machine learning algorithms to develop predictive models for MAFLD, aiming to establish RCII as a novel integrative biomarker and to advance intelligent modeling strategies for the early detection of metabolic diseases.
Materials and Methods
Data Sources
This study was based on data from the National Health and Nutrition Examination Survey (NHANES;
https://www.cdc.gov/nchs/nhanes), a nationally representative survey conducted in the United States that comprehensively evaluates health status, nutritional intake, and socioeconomic factors among the civilian non-institutionalized population.
A
The study adhered to the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) guidelines for observational research.
A
We included data from adult participants enrolled between 1999 and 2010, during which measurements of total cholesterol (TC), high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), and CRP were consistently available to calculate the RCII. Subsequent cycles (2011–2014) were excluded due to missing CRP data, and although high-sensitivity CRP (hsCRP) data were available from 2015–2018, differences in assay methodology and the absence of mortality follow-up data post-2018 precluded their inclusion [
13].
Participants were excluded if they met any of the following criteria: (1) age < 18 years; (2) missing data on TC, HDL-C, LDL-C, or CRP; or (3) lack of follow-up information. After applying these criteria, a total of 13,254 NHANES participants were included in the final analysis. A detailed flowchart of the inclusion and exclusion process is presented in Fig. 1.
Definition of MAFLD and RCII
MAFLD was diagnosed according to the latest international consensus, requiring evidence of hepatic steatosis (via imaging or biochemical indicators) in addition to at least one of the following three metabolic conditions: (1) overweight or obesity; (2) diagnosed type 2 diabetes mellitus (T2DM); or (3) evidence of metabolic dysregulation. Metabolic dysregulation was defined as meeting at least two of the following six criteria: (1) central obesity (waist circumference > 102 cm in men or > 88 cm in women); (2) elevated blood pressure (systolic ≥ 130 mmHg or diastolic ≥ 85 mmHg, or current use of antihypertensive medication); (3) hypertriglyceridemia (triglycerides > 1.70 mmol/L or on lipid-lowering therapy); (4) reduced high-density lipoprotein cholesterol (HDL-C < 1.0 mmol/L in men or < 1.3 mmol/L in women); (5) prediabetes, defined as fasting plasma glucose of 5.6–6.9 mmol/L, 2-hour postprandial glucose of 7.8–11.0 mmol/L, or HbA1c of 5.7%–6.4%; (6) elevated high-sensitivity C-reactive protein (hsCRP > 2 mg/L) [15].
In the absence of abdominal imaging and liver biopsy data, this study employed the fatty liver index (FLI) to determine the presence of hepatic steatosis. The FLI was calculated according to the following formula:
where TG is serum triglycerides (mg/dL), BMI is body mass index (kg/m²), GGT is γ-glutamyl transferase (U/L), and waist circumference is measured in centimeters. Participants were classified as having hepatic steatosis if the FLI was ≥ 60, as previously validated in epidemiological studies [16].
The RCII, serving as the primary exposure variable in this study, is a composite marker integrating metabolic and inflammatory burden. It was calculated as:
RCII = RC × CRP,
where residual cholesterol (RC) was estimated using the formula RC = TC − (HDL-C + LDL-C), with all values expressed in mg/dL [17].
Outcome Assessment
Mortality outcomes were ascertained through linkage of the NHANES cohort with the National Death Index (NDI), with follow-up through December 31, 2019. The primary endpoints included all-cause mortality (death from any cause), cardiovascular mortality (defined using ICD-10 codes I00–I09, I11, I13, I20–I51, and I60–I69), and premature death (defined as death occurring before the age of 75). Cause-of-death classifications were based on the "ucode_leading" variable provided in the publicly available mortality files curated by the National Center for Health Statistics (NCHS), ensuring standardized and consistent endpoint determination [18].
Covariates
In multivariable analyses, a comprehensive set of covariates was included to account for potential confounding factors, encompassing demographic characteristics, lifestyle behaviors, and comorbid conditions. Demographic variables comprised age, sex, educational attainment, marital status, and race/ethnicity. Behavioral factors included smoking status and alcohol consumption. Clinical comorbidities—namely diabetes, hypertension, and dyslipidemia—were identified based on self-reported physician diagnoses or corresponding biochemical criteria, as previously defined [13].
Restricted Cubic Spline Analysis
To characterize the shape of the association between the RCII and the risk of MAFLD and mortality, restricted cubic spline (RCS) functions were applied. RCII was modeled as a continuous variable within logistic regression frameworks for MAFLD prevalence and Cox proportional hazards models for mortality outcomes, allowing for the identification of potential non-linear dose–response relationships. Knot placement was determined based on the empirical distribution of RCII, and all models were adjusted for key covariates [18].
Mediation Analysis
To evaluate the mediating role of fasting plasma glucose (FPG) in the associations between the RCII and both MAFLD risk and mortality outcomes, mediation analyses were conducted using the "mediation" package in R. FPG was selected as the mediator due to its central role in glucose metabolism dysregulation and its potential to mechanistically link RCII with metabolic liver disease and adverse outcomes. Two regression models were specified: one to predict the mediator (FPG) and another to model the outcome—logistic regression for MAFLD and an accelerated failure time (AFT) model (via survreg) for mortality endpoints. Inference was based on nonparametric bootstrapping. Estimates included the average causal mediation effect (ACME), total effect, and the proportion mediated. All models were adjusted for key covariates, including age, sex, education level, marital status, BMI, waist circumference, and smoking status [19].
Subgroup Analyses
Subgroup analyses were conducted to assess the robustness of the associations between RCII and study outcomes across strata of key demographic and health-related variables. Participants were stratified by age (< 60 vs. ≥60 years), sex (male vs. female), body mass index (BMI < 30 vs. ≥30 kg/m²), marital status (unmarried/widowed vs. married/cohabiting), educational attainment (high school or above vs. below high school), smoking status (non-smoker vs. smoker), alcohol consumption (no vs. yes), and history of diabetes, coronary heart disease, stroke, angina, and cancer—resulting in 12 predefined subgroups. Within each stratum, three progressively adjusted multivariable models (Model 1–3) were constructed to evaluate the association between RCII and outcome variables, with hazard ratios (HRs) and corresponding 95% confidence intervals reported. Statistical interactions were tested by incorporating multiplicative interaction terms into the fully adjusted models, and P-values for interaction were calculated to assess effect modification [20].
Survival Analysis
Kaplan–Meier survival curves were generated to estimate all-cause mortality, cardiovascular mortality, and premature death across quartiles of RCII. Group differences were assessed using the log-rank test. To further quantify the association between RCII and mortality outcomes, Cox proportional hazards models were constructed with three levels of covariate adjustment. Model 1 was unadjusted. Model 2 adjusted for age, sex, race, and educational attainment. Model 3 was additionally adjusted for body mass index (BMI), smoking status, alcohol consumption, and diabetes status. Results were reported as hazard ratios (HRs) with corresponding 95% confidence intervals (CIs). All analyses were performed using R software, and statistical significance was defined as a two-sided P-value < 0.05 [21].
Mendelian Randomization
A two-sample Mendelian randomization (MR) approach was employed to investigate the potential causal relationships between blood lipid traits (TC, HDL-C, LDL-C), CRP, FPG, and the risk of MAFLD [22]. Genetic instruments for both exposures and outcomes were derived from publicly available genome-wide association study (GWAS) summary statistics, including datasets for MAFLD (finngen_R12_NAFLD), TC (met-d-Total_C), HDL-C (GCST008035), LDL-C (GCST008037), CRP (GCST90029070), and FPG (GCST008032). Single nucleotide polymorphisms (SNPs) were selected as instrumental variables based on genome-wide significance (P < 5 × 10⁻⁸) and pruned for linkage disequilibrium using a threshold of r² < 0.001 and a clumping window of < 10,000 kb to ensure independence. If the number of eligible SNPs was insufficient, a relaxed significance threshold (P < 1 × 10⁻⁵) was applied, in accordance with prior studies [23].
Causal effects were estimated using multiple MR methods to ensure robustness and consistency of results, including inverse variance weighted (IVW), MR-Egger regression, weighted median, simple mode, and weighted mode approaches. For each method, effect estimates, 95% confidence intervals, and P-values were reported. Heterogeneity tests and sensitivity analyses were also conducted to evaluate the validity of instrumental variables and the underlying MR assumptions. All analyses were performed using R software.
Statistical Analysis
For the NHANES dataset, all analyses accounted for the complex multistage, stratified, and weighted sampling design by incorporating appropriate survey weights to yield nationally representative estimates. Continuous variables were summarized as means with standard deviations if normally distributed and compared using independent samples t-tests; otherwise, medians with interquartile ranges were reported, and group differences were assessed using the Wilcoxon rank-sum test. Categorical variables were compared using the chi-square test [24][23].
In the MR analysis, five complementary methods were used to estimate the genetic associations between exposures and outcomes: IVW, weighted median, MR-Egger regression, simple mode, and weighted mode approaches. Heterogeneity across genetic instruments in the IVW model was assessed using Cochran’s Q statistic, while directional horizontal pleiotropy was evaluated via the MR-Egger intercept. To ensure robustness, leave-one-out sensitivity analysis was performed to evaluate the influence of individual single-nucleotide polymorphisms (SNPs) on the overall causal estimates. All hypothesis tests were two-sided, with a significance threshold of P < 0.05 [22].
A total of 13,254 participants were randomly split into training and test sets at a 7:3 ratio. Feature selection was performed using the Boruta algorithm to identify variables most predictive of MAFLD. Five classification models were subsequently constructed: random forest (RF), k-nearest neighbors (KNN), naïve Bayes (NB), light gradient boosting machine (LightGBM), and decision tree (rpart). Model performance was evaluated based on accuracy, Kappa statistic, sensitivity, specificity, precision, area under the receiver operating characteristic curve (AUROC), and area under the precision-recall curve (AUPR) [25].
To enhance model interpretability, SHAP was applied to quantify the relative contribution of each predictor to model outputs. SHAP values provide individualized estimates of feature importance, indicating both the direction and magnitude of each variable’s effect while accounting for feature interactions [26].
Machine learning analyses were conducted in Python (v3.10) using core libraries including xgboost, sklearn, shap, pandas, and matplotlib. All other statistical analyses were performed in R (v4.2.1) using packages such as survey, ggplot2, dplyr, mgcv, and rms. All statistical tests were two-sided, with P-values < 0.05 considered statistically significant.
Results
Baseline Characteristics of the Study Population
A total of 13,254 participants from the NHANES cohort were included in the analysis, comprising 5,693 individuals with MAFLD and 7,561 without (Table
1).
A
Compared to the non-MAFLD group, participants with MAFLD were older (median age: 52.00 vs. 45.00 years) and exhibited significantly higher levels of key metabolic and inflammatory markers, including BMI (32.39 vs. 24.75 kg/m²), waist circumference (108.80 vs. 89.00 cm), FPG (102.70 vs. 95.00 mg/dL), and C-reactive protein (0.35 vs. 0.15 mg/dL). In terms of lipid profiles, the MAFLD group had elevated triglycerides (150.00 vs. 94.00 mg/dL) and LDL-C (119.00 vs. 112.00 mg/dL), along with reduced HDL-C (46.00 vs. 56.00 mg/dL). Notably, both residual cholesterol (RC; 30.00 vs. 19.00 mg/dL) and the residual cholesterol–inflammation index (RCII; 10.56 vs. 2.85) were markedly higher in the MAFLD group (P < 0.001 for both comparisons). When stratified by RCII quartiles, 40.21% of individuals in the MAFLD group fell within the highest quartile (Q4), compared to only 13.64% in the non-MAFLD group. Conversely, the proportion of individuals in the lowest RCII quartile (Q1) was significantly lower among those with MAFLD (7.22% vs. 38.32%). Comorbidities were also more prevalent in the MAFLD group, including diabetes (15.30% vs. 5.69%), CHD (5.38% vs. 3.15%), stroke (4.39% vs. 2.96%), and angina (4.46% vs. 1.92%). These findings collectively suggest that individuals with MAFLD are characterized by pronounced metabolic dysregulation and systemic inflammation, and that RCII may serve as a robust marker for identifying high-risk individuals.
Table 1
Baseline characteristics of clinical information
Features | Overall (n = 13254) | non-MAFLD (n = 7561) | MAFLD (n = 5693) | P-Value |
|---|
Age (median, IQR) | 48.00 (31.00) | 45.00 (33.00) | 52.00 (28.00) | < 0.001 |
Gender (n, %) | | | | < 0.001 |
Male | 6305 (47.57) | 3294 (43.57) | 3011 (52.89) | |
Female | 6949 (52.43) | 4267 (56.43) | 2682 (47.11) | |
Race (n, %) | | | | < 0.001 |
Mexican American | 2733 (20.62) | 1396 (18.46) | 1337 (23.48) | |
Non-Hispanic White | 6583 (49.67) | 3888 (51.42) | 2695 (47.34) | |
Non-Hispanic Black | 2482 (18.73) | 1371 (18.13) | 1111 (19.52) | |
Other Hispanic | 913 (6.89) | 526 (6.96) | 387 (6.80) | |
Other Race | 543 (4.10) | 380 (5.03) | 163 (2.86) | |
Education (n, %) | | | | < 0.001 |
Less than High School | 4014 (30.29) | 2109 (27.89) | 1905 (33.46) | |
High School and above | 9240 (69.71) | 5452 (72.11) | 3788 (66.54) | |
Marital_Status (n, %) | | | | < 0.001 |
Never married/Widowed | 5125 (38.67) | 3065 (40.54) | 2060 (36.18) | |
Married/Living with partner | 8129 (61.33) | 4496 (59.46) | 3633 (63.82) | |
BMI (median, IQR) | 27.57 (7.63) | 24.75 (4.62) | 32.39 (7.03) | < 0.001 |
Waist_Circumference (median, IQR) | 97.00 (20.00) | 89.00 (13.80) | 108.80 (15.00) | < 0.001 |
SBP (median, IQR) | 122.00 (24.00) | 118.00 (24.00) | 126.00 (24.00) | < 0.001 |
DBP (median, IQR) | 70.00 (16.00) | 68.00 (14.00) | 72.00 (16.00) | < 0.001 |
Smoking (n, %) | | | | < 0.001 |
No | 7181 (54.18) | 3787 (50.09) | 3394 (59.62) | |
Yes | 6073 (45.82) | 3774 (49.91) | 2299 (40.38) | |
Drinking (n, %) | | | | < 0.001 |
No | 3366 (25.40) | 1759 (23.26) | 1607 (28.23) | |
Yes | 9888 (74.60) | 5802 (76.74) | 4086 (71.77) | |
Diabetes (n, %) | | | | < 0.001 |
No | 11953 (90.18) | 7131 (94.31) | 4822 (84.70) | |
Yes | 1301 (9.82) | 430 (5.69) | 871 (15.30) | |
CHD (n, %) | | | | < 0.001 |
No | 12710 (95.90) | 7323 (96.85) | 5387 (94.62) | |
Yes | 544 (4.10) | 238 (3.15) | 306 (5.38) | |
Stroke (n, %) | | | | < 0.001 |
No | 12780 (96.42) | 7337 (97.04) | 5443 (95.61) | |
Yes | 474 (3.58) | 224 (2.96) | 250 (4.39) | |
Angina (n, %) | | | | < 0.001 |
No | 12855 (96.99) | 7416 (98.08) | 5439 (95.54) | |
Yes | 399 (3.01) | 145 (1.92) | 254 (4.46) | |
Cancer (n, %) | | | | 0.083 |
No | 12066 (91.04) | 6912 (91.42) | 5154 (90.53) | |
Yes | 1188 (8.96) | 649 (8.58) | 539 (9.47) | |
FPG (median, IQR) | 98.00 (16.70) | 95.00 (14.30) | 102.70 (20.00) | < 0.001 |
CRP (median, IQR) | 0.22 (0.42) | 0.15 (0.28) | 0.35 (0.58) | < 0.001 |
TC (median, IQR) | 195.00 (54.00) | 192.00 (53.00) | 199.00 (55.00) | < 0.001 |
HDL-C (median, IQR) | 51.00 (21.00) | 56.00 (22.00) | 46.00 (17.00) | < 0.001 |
LDL-C (median, IQR) | 115.00 (47.00) | 112.00 (46.00) | 119.00 (47.00) | < 0.001 |
TG (median, IQR) | 114.00 (85.00) | 94.00 (60.00) | 150.00 (99.00) | < 0.001 |
GGT (median, IQR) | 20.00 (17.00) | 17.00 (12.00) | 27.00 (23.00) | < 0.001 |
FLI (median, IQR) | 52.19 (58.26) | 26.20 (31.30) | 84.03 (20.45) | < 0.001 |
RC (median, IQR) | 23.00 (17.00) | 19.00 (12.00) | 30.00 (20.00) | < 0.001 |
RCII (median, IQR) | 5.22 (12.22) | 2.85 (6.36) | 10.56 (18.48) | < 0.001 |
RCII_Type (n, %) | | | | < 0.001 |
Q1 | 3308 (24.96) | 2897 (38.32) | 411 (7.22) | |
Q2 | 3316 (25.02) | 2117 (28.00) | 1199 (21.06) | |
Q3 | 3310 (24.97) | 1516 (20.05) | 1794 (31.51) | |
Q4 | 3320 (25.05) | 1031 (13.64) | 2289 (40.21) | |
| MAFLD, metabolic dysfunction-associated fatty liver disease; IQR, inter-quartile range; SBP, systolic blood pressure; DBP, diastolic blood pressure; CHD, coronary heart disease; FPG, fasting plasma glucose; CRP, C-reactive protein; TC, total cholesterol; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; TG, triglyceride; GGT, glutamyl transferase; FLI, fatty liver index; RC, remnant cholesterol; RCII, residual cholesterol-inflammation index |
Association Between RCII and Risk of MAFLD and related Mortality
In the study cohort, higher levels of RC, CRP, and the RCII were all significantly associated with increased odds of MAFLD, exhibiting robust dose–response relationships (Fig. 2). Compared to the lowest RCII quartile (Q1), adjusted odds ratios (ORs) for MAFLD were progressively elevated across quartiles: 4.16 (95% CI: 3.67–4.72) for Q2, 8.88 (95% CI: 7.77–10.14) for Q3, and 17.79 (95% CI: 15.69–20.17) for Q4 (all P < 0.001). Although both RC and CRP alone were also independently associated with MAFLD risk, the magnitude of their associations was comparatively lower. The adjusted OR for the highest RC quartile was 17.05 (95% CI: 14.46–20.11), and for CRP was 7.46 (95% CI: 6.54–8.52). Collectively, these findings suggest that RCII—by integrating lipid dysregulation and systemic inflammation—offers superior predictive performance for identifying individuals at high risk of MAFLD.
Kaplan–Meier survival analysis revealed a significant association between RCII quartiles and all three mortality outcomes: all-cause mortality, cardiovascular mortality, and premature death (Fig. 3). Survival probability declined progressively across increasing RCII quartiles (Q1 to Q4), with the lowest survival observed in the highest RCII group (Q4). All differences were statistically significant (P < 0.001), indicating that individuals with elevated RCII levels are at substantially higher risk of mortality.
To further evaluate the relationship between RCII and mortality, we constructed multivariable Cox proportional hazards models using all-cause, cardiovascular, and premature death as outcomes (Fig. 4A–C). In the analysis of all-cause mortality (Fig. 4A), using the lowest RCII quartile (Q1) as the reference, a stepwise increase in mortality risk was observed across higher quartiles. In the fully adjusted model (Model 3), hazard ratios (HRs) for Q2, Q3, and Q4 were 1.14 (95% CI: 0.96–1.34, P = 0.127), 1.28 (95% CI: 1.12–1.48, P < 0.001), and 1.83 (95% CI: 1.60–2.10, P < 0.001), respectively, indicating a clear dose–response trend. A similar pattern was found for cardiovascular mortality (Fig. 4B), with adjusted HRs of 1.17 (95% CI: 0.89–1.54, P = 0.262) for Q2, 1.22 (95% CI: 0.98–1.53, P = 0.071) for Q3, and 1.79 (95% CI: 1.40–2.29, P < 0.001) for Q4, mirroring the all-cause mortality results. For premature death (Fig. 4C), the associations were even more pronounced. Compared to Q1, the adjusted HRs were 1.37 (95% CI: 1.10–1.69, P = 0.004) for Q2, 1.61 (95% CI: 1.33–1.94, P < 0.001) for Q3, and 2.36 (95% CI: 1.97–2.82, P < 0.001) for Q4, again demonstrating a robust dose-dependent relationship. Collectively, elevated RCII levels were independently associated with significantly increased risks of all-cause, cardiovascular, and premature death, with consistent dose–response gradients across quartiles. These findings suggest that RCII may serve as a reliable prognostic biomarker for long-term adverse outcomes.
Nonlinear Associations Between RCII and Risk of MAFLD and related mortality
RCS regression analyses revealed nonlinear relationships between the RCII and multiple adverse health outcomes (Fig. 5A–D). As shown in Fig. 5A, RCII exhibited a pronounced nonlinear association with MAFLD risk: the risk sharply increased when RCII was below approximately 5.83, plateaued thereafter, and showed a slight decline at higher levels. In contrast, Fig. 5B demonstrated a significant linear relationship between RCII and all-cause mortality (Overall P = 0.006; Nonlinear P = 0.822), with increasing RCII levels corresponding to a steadily rising mortality risk, indicating a robust dose–response association. Figure 5C indicated a marginally significant overall association between RCII and cardiovascular mortality (overall P = 0.055; nonlinear P = 0.274), with modest risk elevation observed at higher RCII levels despite the absence of clear nonlinearity. Figure 5D showed a strong positive association between RCII and premature death (overall P < 0.001; nonlinear P = 0.277), with risk increasing consistently across the full range of RCII values. Collectively, these findings underscore that elevated RCII is independently and positively associated with MAFLD, all-cause mortality, CVD mortality, and premature death, supporting its potential utility as a predictive biomarker for long-term adverse outcomes.
Subgroup Analyses of the Stability and Heterogeneity of RCII in Predicting MAFLD and related Mortality Risk
To further assess the robustness and subgroup-specific variations in the association between the RCII and the risk of MAFLD and adverse outcomes, stratified analyses were conducted across multiple key covariates, including age, sex, BMI, marital status, education level, smoking and alcohol consumption, and history of chronic diseases (Figures S1–S4). The results demonstrated that elevated RCII was consistently and significantly associated with increased risk of MAFLD (Figure S1), all-cause mortality (Figure S2), cardiovascular mortality (Figure S3), and premature death (Figure S4) across most subpopulations, even after adjusting for potential confounders. Notably, significant interactions were observed between RCII and several subgroup variables (e.g., age, sex, BMI, diabetes, and cancer), suggesting that the predictive strength of RCII may be more pronounced in certain high-risk groups.
Causal Effects of HDL-C and CRP on MAFLD Risk
Using five Mendelian randomization (MR) methods—including MR-Egger, weighted median, IVW, simple mode, and weighted mode—we systematically evaluated the causal effects of key lipid and inflammatory biomarkers on the risk of MAFLD. For TC, all MR methods yielded odds ratios (ORs) close to 1.00 with non-significant P values (all P > 0.2), suggesting no evidence of a causal relationship between total cholesterol levels and MAFLD (Fig. 6A). Similarly, for LDL-C, results were highly consistent across methods (OR = 1.00 for all), with narrow 95% confidence intervals and P values > 0.4, indicating no significant causal association with MAFLD (Fig. 6B). In contrast, for HDL-C, the IVW method estimated an odds ratio (OR) of 0.78 (95% CI: 0.58–0.98, P = 0.018), the MR-Egger method yielded an OR of 0.75 (P = 0.024), and the weighted mode method also reported an OR of 0.75 (P = 0.018) (Fig. 6C). CRP demonstrated a positive, potentially pathogenic association with MAFLD across all MR approaches except the simple mode. The weighted median method showed a statistically significant effect estimate (OR = 1.39, 95% CI: 1.11–1.74, P = 0.004) (Fig. 6D). To enhance the robustness and credibility of these causal inferences, we have provided comprehensive sensitivity analyses—including scatter plots, funnel plots, single SNP effect plots, and leave-one-out analyses for TC, LDL-C, HDL-C, and CRP—in the supplementary materials (Figure S5-8). These findings suggest that TC and LDL-C appear unrelated to MAFLD risk in causal inference. In contrast, elevated HDL-C levels may causally reduce the risk of MAFLD, whereas higher CRP levels are likely to increase it.
Mediation Analysis
A
Mediation analysis revealed that FPG played a statistically significant mediating role in the relationship between the RCII and MAFLD risk, accounting for 2.02% of the total effect (Average Causal Mediation Effect [ACME]: β = 2.34×10⁻⁵, P < 0.001) (Fig.
7A, Table
S1). For survival outcomes, the proportion of mediation by FPG was 6.76% for all-cause mortality (ACME: β = − 0.21, P < 0.001), 8.06% for cardiovascular mortality (ACME: β = − 1.32, P < 0.001), and 7.33% for premature death (ACME: β = − 0.18, P < 0.001) (Fig.
7B–D, Table
S1). Given that survival analyses were conducted using an accelerated failure time (AFT) model, the estimated effects reflect the influence of RCII on log-transformed survival time via FPG. In other words, RCII may contribute to increased mortality risk in part by elevating FPG levels.
Additionally, a two-step Mendelian randomization mediation framework was applied to assess the mediating role of FPG in the causal pathways linking HDL-C and CRP with MAFLD risk. As shown in Table 2, the total effect of HDL-C on MAFLD was β = − 0.132, with a direct effect of β = − 0.125 and a mediated effect of β = − 0.007 (95% CI: − 0.010 to − 0.002), indicating a significant mediation proportion of 5.3%. For CRP, the total effect was β = 0.166, with a direct effect of β = 0.158 and a mediated effect of β = 0.008 (95% CI: 0.001 to 0.028), accounting for 4.8% of the total effect and also reaching statistical significance. Collectively, these findings suggest that FPG partially mediates the effects of RCII, HDL-C, and CRP on both MAFLD risk and adverse mortality outcomes.
Table 2
Two-step Mendelian randomization mediation analysis results: estimates of the total, direct, and FPG-mediated effects of HDL-C and CRP on MAFLD risk
Exposure | Mediator | Outcome | Total_beta | Direct_beta | Mediation_beta |
|---|
HDL-C | FPG | MAFLD | -0.132 | -0.125 | -0.007 (-0.010 to -0.002) |
CRP | FPG | MAFLD | 0.166 | 0.158 | 0.008 (0.001 to 0.028) |
| MAFLD, metabolic dysfunction-associated fatty liver disease; FPG, fasting plasma glucose; HDL-C, high-density lipoprotein cholesterol; CRP, C-reactive protein |
Performance Evaluation of MAFLD Prediction Models and Identification of Key Predictors
To identify the most informative features associated with MAFLD, we applied the Boruta algorithm to a broad set of candidate variables. Waist circumference, BMI, and age emerged as the top-ranking predictors with the highest importance scores. Additional contributors included systolic blood pressure (SBP), sex, diabetes status, and diastolic blood pressure (DBP). These variables, together with RCII, were subsequently incorporated into machine learning–based prediction models for MAFLD identification.
To compare the predictive performance of different modeling strategies, we constructed and evaluated five machine learning classifiers: RF, KNN, NB, LightGBM, and decision tree (rpart). Model performance was assessed separately in both the training and testing datasets (Table 3). In the training cohort, the RF model demonstrated the best performance, achieving an accuracy of 98.0%, an area under the receiver operating characteristic curve (AUC-ROC) of 0.999, and an area under the precision-recall curve (PR-AUC) of 0.988, indicating a near-perfect fit (Fig. 8A–B). LightGBM and KNN also showed strong predictive capacity with accuracies of 94.4% and 93.4%, respectively, and AUCs exceeding 0.98. In contrast, the NB and rpart models underperformed in the training dataset.
Table 3
Performance comparison of different machine learning models for MAFLD classification on training and testing datasets
Models | Train datasets | Testing datasets |
|---|
Accuracy | Kappa | AUC-ROC | Sensitivity | Specificity | Precision | Accuracy | Kappa | AUC-ROC | Sensitivity | Specificity | Precision |
|---|
RF | 0.98 | 0.958 | 0.999 | 0.964 | 0.991 | 0.988 | 0.897 | 0.788 | 0.96 | 0.864 | 0.921 | 0.892 |
KNN | 0.934 | 0.864 | 0.987 | 0.911 | 0.951 | 0.933 | 0.845 | 0.681 | 0.921 | 0.793 | 0.884 | 0.837 |
NB | 0.874 | 0.74 | 0.949 | 0.81 | 0.922 | 0.887 | 0.865 | 0.72 | 0.943 | 0.786 | 0.924 | 0.886 |
LGB | 0.944 | 0.887 | 0.991 | 0.934 | 0.952 | 0.936 | 0.891 | 0.776 | 0.956 | 0.868 | 0.907 | 0.876 |
rpart | 0.865 | 0.728 | 0.868 | 0.884 | 0.851 | 0.817 | 0.859 | 0.715 | 0.861 | 0.875 | 0.847 | 0.811 |
| MAFLD, metabolic dysfunction-associated fatty liver disease; RF, random forest; KNN, k-nearest neighbors; NB, naïve Bayes; LGB, light gradient boosting machine; AUC-ROC, area under the receiver operating characteristic curve |
| Figure lengths |
In the testing set, the RF model maintained superior generalizability, achieving an accuracy of 89.7%, an AUC-ROC of 0.960, and a PR-AUC of 0.949 (Fig. 8C–D). LightGBM and KNN models also demonstrated strong external validity, with AUCs of 0.956 and 0.921, respectively. NB and rpart showed comparatively weaker performance in the validation phase.
Model interpretability was examined using SHAP (Fig. 8E) and feature importance ranking (Fig. 8F). Waist circumference was identified as the most influential predictor, followed by RCII. Other notable contributors included age, blood pressure (SBP and DBP), sex, and history of angina.
Taken together, the RF model exhibited the highest predictive accuracy and interpretability across all tested algorithms. Notably, RCII—representing an integrated measure of systemic inflammation and residual cholesterol—demonstrated robust and independent predictive value in identifying individuals at high risk for MAFLD.
Discussion
Leveraging data from the NHANES cohort, this study systematically evaluated the association between the RCII and multiple adverse health outcomes, including MAFLD, all-cause mortality, cardiovascular mortality, and premature death. RCII demonstrated a robust and independent positive association with each endpoint, even after multivariable adjustment. Mendelian randomization and mediation analyses further supported the mechanistic roles of HDL-C, CRP, and FPG in this relationship. Among several machine learning algorithms, the RF model achieved superior predictive performance. SHAP analysis corroborated RCII as a key predictor of MAFLD risk, underscoring its potential utility in precision risk stratification and early identification of high-risk individuals.
In recent years, multiple research groups have employed machine learning techniques to develop predictive models for MAFLD, yielding promising results. However, variations in model performance, feature selection, and target populations have been noted across studies. A study based on NHANES 2017–2020 data developed a model centered on the non-HDL to HDL cholesterol ratio (NHHR), where the XGBoost algorithm achieved an AUC of 0.828. While demonstrating reasonable predictive power, the model relied predominantly on conventional lipid parameters and featured limited variable diversity [27]. In a large-scale investigation involving over five million individuals in Northwestern China, LASSO regression was used for feature selection, and the CatBoost model attained an AUC of 0.862, highlighting the predictive relevance of age, BMI, triglycerides, and fasting glucose; however, the absence of inflammatory markers limited its comprehensiveness [28]. Another study integrated vibration-controlled transient elastography (VCTE) parameters to stratify MAFLD risk, with an RF model achieving an AUC of 0.80 in the validation set, mainly for delineating low-, intermediate-, and high-risk groups [29]. Additionally, leveraging long-term follow-up data from NHANES III, a recent study employed multiple machine learning models to predict all-cause mortality among MAFLD patients, with the Coxnet model reaching an AUC of 0.88 at the 25-year mark—underscoring the clinical potential for long-term prognostic assessment [30]. We constructed and compared five machine learning models based on routine clinical and laboratory parameters. The RF model demonstrated superior performance in the test set, with an AUC of 0.960 and an accuracy of 89.7%, comparable to or exceeding previously reported models. Notably, the inclusion of RCII—a novel composite biomarker reflecting both metabolic dysfunction and systemic inflammation—substantially enhanced model interpretability and adaptability. These findings support the utility of RCII-enhanced machine learning models as robust tools for early MAFLD detection and individualized risk stratification.
Previous studies have established that RC is closely associated with hepatic lipid dysregulation and serves as a predictor for MAFLD and its cardiovascular outcomes [31]. Likewise, CRP, including hs-CRP, has been widely used as a convenient marker of systemic inflammation and is strongly linked to increased MAFLD risk [8] However, reliance on single biomarkers often yields inconsistent predictive performance across different populations, limiting their clinical robustness. In contrast, RCII, by integrating metabolic and inflammatory dimensions, demonstrates superior consistency and stability in predicting both MAFLD and mortality outcomes. Compared with other metabolic or inflammatory indicators—such as the systemic immune-inflammation index (SII) [32], homeostatic model assessment of insulin resistance (HOMA-IR) [33], and low thyroid function status [34] RCII exhibits greater external validity and translational potential in diverse clinical settings.
Multiple large-scale prospective cohort studies have confirmed that MAFLD is independently associated with elevated risks of all-cause and cardiovascular mortality [35–37], particularly among individuals with coexisting diabetes or the "lean MAFLD" phenotype [38, 39]. Extending this evidence, our study further reveals a positive and independent association between elevated RCII levels and premature death. Notably, this relationship persists even after comprehensive adjustment for potential confounders. These findings suggest that RCII may serve not only as a general prognostic marker for mortality, but also as an early-warning indicator for premature death—offering critical value for optimizing the timing of interventions and informing public health resource allocation.
The predictive power of the RCII for MAFLD, all-cause mortality, and cardiovascular mortality likely stems from its integration of two central pathological axes: metabolic dysregulation and chronic inflammation. RCII combines RC, a marker of lipid accumulation, with C-reactive protein (CRP or hs-CRP), a canonical indicator of systemic inflammation—each representing distinct yet interrelated biological pathways implicated in the pathogenesis and progression of metabolic diseases [40, 41]. Their synergistic interaction may potentiate vascular injury and organ dysfunction across multiple systems. Elevated RC promotes lipid deposition within arterial walls, contributes to endothelial dysfunction, and exacerbates lipid derangements via impaired reverse cholesterol transport mediated by HDL-C [42, 43]. In parallel, CRP not only suppresses pancreatic β-cell function but also stimulates hepatic glucose production, thereby aggravating FBG levels [44]. In our mediation analysis, FBG emerged as a significant intermediary linking RCII to increased mortality risk. Notably, elevated FBG serves as both a surrogate for insulin resistance and a pathogenic factor in its own right. Through activation of the AGE–RAGE axis, hyperglycemia accelerates vascular stiffening and myocardial remodeling, impairs HDL function, and promotes LDL oxidation—collectively compounding the metabolic and inflammatory disturbances driven by RCII [45, 46]. This tripartite interaction among dyslipidemia, inflammation, and glucose imbalance constitutes a tightly coupled risk network. Thus, RCII may be conceptualized as an integrated biomarker of metabolic–inflammatory stress, with its close association with FBG highlighting the pivotal role of glucose dysregulation in mediating the adverse outcomes linked to RCII.
This study has several limitations. First, the construction of RCII relies on the availability of CRP and lipid measurements, which may restrict its applicability in populations lacking inflammatory biomarker data. Second, although both traditional regression models and multiple machine learning algorithms consistently supported the robustness of RCII in assessing MAFLD risk, the cross-sectional nature of the NHANES dataset precludes definitive causal inference. Potential reverse causality and residual confounding cannot be fully ruled out. While Mendelian randomization provided suggestive evidence for a causal role, its validity is inherently constrained by the choice of instrumental variables and the characteristics of the study population. Therefore, longitudinal cohort studies and interventional trials are needed to further validate the prognostic utility of RCII.
A
In summary, RCII—a novel composite biomarker integrating metabolic and inflammatory signals—demonstrates strong predictive capacity and cross-model stability for MAFLD risk assessment, underscoring its potential clinical relevance. Future investigations should prioritize validating the accuracy and clinical effectiveness of RCII through animal models, randomized trials, and prospective cohort studies. Moreover, integrating multi-omics approaches, such as transcriptomics and metabolomics, may help elucidate the underlying metabolic–inflammatory pathways captured by RCII and provide mechanistic insight to support its translation into clinical practice.
Conclusion
As a composite biomarker integrating lipid dysregulation and systemic inflammation, the RCII is independently and significantly associated with increased risk of MAFLD and adverse mortality outcomes, demonstrating strong predictive utility. Its underlying mechanism may be partially mediated by elevated FPG levels.
Electronic Supplementary Material
Below is the link to the electronic supplementary material
References
1.Sakurai Y et al. Role of Insulin Resistance in MAFLD. IJMS. 2021;22:4156. https://doi.org/10.3390/ijms22084156
2.Kaya E, et al. Metabolic-associated Fatty Liver Disease (MAFLD): A Multi-systemic Disease Beyond the Liver. J Clin Transl Hepatol. 2022;10:329–38. https://doi.org/10.14218/JCTH.2021.00178.
3.Huang H, et al. Global burden trends of MAFLD-related liver cancer from 1990 to 2019. Portal Hypertens Cirrhosis. 2023;2:157–64. https://doi.org/10.1002/poh2.63.
4.Zhou F, et al. Unexpected Rapid Increase in the Burden of NAFLD in China From 2008 to 2018: A Systematic Review and Meta-Analysis. Hepatology. 2019;70:1119–33. https://doi.org/10.1002/hep.30702.
5.Stürzebecher PE, et al. What is ‘remnant cholesterol’? Eur Heart J. 2023;44:1446–8. https://doi.org/10.1093/eurheartj/ehac783.
6.Wang J, et al. The association of remnant cholesterol (RC) and interaction between RC and diabetes on the subsequent risk of hypertension. Front Endocrinol. 2022;13:951635. https://doi.org/10.3389/fendo.2022.951635.
7.Wu Z, et al. Longitudinal association of remnant cholesterol with joint arteriosclerosis and atherosclerosis progression beyond LDL cholesterol. BMC Med. 2023;21:42. https://doi.org/10.1186/s12916-023-02733-w.
8.Huang J, et al. Serum high-sensitive C-reactive protein is a simple indicator for all-cause among individuals with MAFLD. Front Physiol. 2022;13:1012887. https://doi.org/10.3389/fphys.2022.1012887.
9.Kumar R, et al. Association of high-sensitivity C-reactive protein (hs-CRP) with non-alcoholic fatty liver disease (NAFLD) in Asian Indians: A cross-sectional study. J Family Med Prim Care. 2020;9:390. https://doi.org/10.4103/jfmpc.jfmpc_887_19.
10.Del Giudice M, et al. Rethinking IL-6 and CRP: Why they are more than inflammatory biomarkers, and why it matters. Brain Behav Immun. 2018;70:61–75. https://doi.org/10.1016/j.bbi.2018.02.013.
11.Tarantino G. Hepatic steatosis, low-grade chronic inflammation and hormone/growth factor/adipokine imbalance. WJG. 2010;16:4773. https://doi.org/10.3748/wjg.v16.i38.4773.
12.Margioris AN et al. Chronic low-grade inflammation. Diet, Immunity and Inflammation, Elsevier; 2013, pp. 105–20. https://doi.org/10.1533/9780857095749.1.105
13.Chen J et al. Predictive value of remnant cholesterol inflammatory index for stroke risk: Evidence from the China health and Retirement Longitudinal study. J Adv Res 2024:S2090123224005927. https://doi.org/10.1016/j.jare.2024.12.015
14.Wang Y, et al. Remnant cholesterol inflammatory index and its association with all-cause and cause-specific mortality in middle-aged and elderly populations: evidence from US and Chinese national population surveys. Lipids Health Dis. 2025;24:155. https://doi.org/10.1186/s12944-025-02580-z.
15.Eslam M, et al. The Asian Pacific association for the study of the liver clinical practice guidelines for the diagnosis and management of metabolic dysfunction-associated fatty liver disease. Hepatol Int. 2025;19:261–301. https://doi.org/10.1007/s12072-024-10774-3.
16.Bedogni G, et al. The Fatty Liver Index: a simple and accurate predictor of hepatic steatosis in the general population. BMC Gastroenterol. 2006;6:33. https://doi.org/10.1186/1471-230X-6-33.
17.Yu Y, et al. Remnant cholesterol inflammatory index, calculated from residual cholesterol to C-reactive protein ratio, and stroke outcomes: a retrospective study using the National institutes of health stroke scale and modified Rankin scale. Lipids Health Dis. 2025;24:228. https://doi.org/10.1186/s12944-025-02650-2.
18.Zhong H, et al. Associations of composite dietary antioxidant index with premature death and all-cause mortality: a cohort study. BMC Public Health. 2025;25:796. https://doi.org/10.1186/s12889-025-21748-x.
19.Feng Q, Nutrition, et al. Metabolism Cardiovasc Dis. 2021;31:3335–44. https://doi.org/10.1016/j.numecd.2021.08.051.
20.Tan Y, et al. Association between blood metal exposures and hyperuricemia in the U.S. general adult: A subgroup analysis from NHANES. Chemosphere. 2023;318:137873. https://doi.org/10.1016/j.chemosphere.2023.137873.
21.Zhong Y et al. Association Between the Non-High-Density Lipoprotein Cholesterol-to-High-Density Lipoprotein Cholesterol Ratio (NHHR) and Mortality in Patients with COPD: Evidence From the NHANES 1999–2018. COPD 2025;Volume 20:857–68. https://doi.org/10.2147/COPD.S508481
22.Yan Z, et al. Association between high-density lipoprotein cholesterol and type 2 diabetes mellitus: dual evidence from NHANES database and Mendelian randomization analysis. Front Endocrinol. 2024;15:1272314. https://doi.org/10.3389/fendo.2024.1272314.
23.Cai J, et al. Assessing the causal association between human blood metabolites and the risk of epilepsy. J Transl Med. 2022;20:437. https://doi.org/10.1186/s12967-022-03648-5.
24.Xu J, et al. Association between serum uric acid, hyperuricemia and periodontitis: a cross-sectional study using NHANES data. BMC Oral Health. 2023;23:610. https://doi.org/10.1186/s12903-023-03320-4.
25.He M, et al. Association between neutrophil-albumin ratio and ultrasound-defined metabolic dysfunction-associated fatty liver disease in U.S. adults: evidence from NHANES 2017–2018. BMC Gastroenterol. 2025;25:20. https://doi.org/10.1186/s12876-025-03612-9.
26.Lundberg SM, Prutkin JM, Nair B, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. 2020;2:56–67. https://doi.org/10.1038/s42256-019-0138-9.
27.Lin L, et al. Machine learning for predicting metabolic-associated fatty liver disease including NHHR: a cross-sectional NHANES study. PLoS ONE. 2025;20:e0319851. https://doi.org/10.1371/journal.pone.0319851.
28.Deng J, et al. Development and validation of a machine learning-based framework for assessing metabolic-associated fatty liver disease risk. BMC Public Health. 2024;24:2545. https://doi.org/10.1186/s12889-024-19882-z.
29.Huang L, et al. Machine learning-based disease risk stratification and prediction of metabolic dysfunction-associated fatty liver disease using vibration-controlled transient elastography: Result from NHANES 2021–2023. BMC Gastroenterol. 2025;25:255. https://doi.org/10.1186/s12876-025-03850-x.
30.Wang X, et al. Machine learning for predicting all-cause mortality of metabolic dysfunction-associated fatty liver disease: a longitudinal study based on NHANES. BMC Gastroenterol. 2025;25:376. https://doi.org/10.1186/s12876-025-03946-4.
31.Huang H, et al. Remnant Cholesterol Predicts Long-term Mortality of Patients With Metabolic Dysfunction–associated Fatty Liver Disease. J Clin Endocrinol Metabolism. 2022;107:e3295–303. https://doi.org/10.1210/clinem/dgac283.
32.Zeng D, et al. Evaluating body roundness index and systemic immune inflammation index for mortality prediction in MAFLD patients. Sci Rep. 2025;15:330. https://doi.org/10.1038/s41598-024-83324-4.
33.Zhang W, et al. Association of cardiovascular health with MAFLD and mortality in overweight and obese adults and mediation by inflammation and insulin resistance. Sci Rep. 2025;15:18791. https://doi.org/10.1038/s41598-025-03820-z.
34.Chen Y, et al. Impact of Thyroid Function on the Prevalence and Mortality of Metabolic Dysfunction-Associated Fatty Liver Disease. J Clin Endocrinol Metabolism. 2023;108:e434–43. https://doi.org/10.1210/clinem/dgad016.
35.Kim D, et al. Metabolic dysfunction-associated fatty liver disease is associated with increased all-cause mortality in the United States. J Hepatol. 2021;75:1284–91. https://doi.org/10.1016/j.jhep.2021.07.035.
36.Huang Q, et al. NAFLD or MAFLD: Which Has Closer Association With All-Cause and Cause-Specific Mortality?—Results From NHANES III. Front Med. 2021;8:693507. https://doi.org/10.3389/fmed.2021.693507.
37.Chung GE, et al. Lean or diabetic subtypes predict increased all-cause and disease-specific mortality in metabolic-associated fatty liver disease. BMC Med. 2023;21:4. https://doi.org/10.1186/s12916-022-02716-3.
38.Zhang P, et al. Mortality outcomes in diabetic metabolic dysfunction-associated fatty liver disease: non-obese versus obese individuals. Sci Rep. 2024;14:11320. https://doi.org/10.1038/s41598-024-61896-5.
39.Song J, et al. MAFLD as a predictor of adverse cardiovascular events among CHD patients with LDL-C < 1.8 mmol/L. Nutrition. Metabolism Cardiovasc Dis. 2025;35:103798. https://doi.org/10.1016/j.numecd.2024.103798.
40.Sandesara PB, et al. The Forgotten Lipids: Triglycerides, Remnant Cholesterol, and Atherosclerotic Cardiovascular Disease Risk. Endocr Rev. 2019;40:537–57. https://doi.org/10.1210/er.2018-00184.
41.Ansar W, et al. Inflammation and Inflammatory Diseases, Markers, and Mediators: Role of CRP in Some Inflammatory Diseases. In: Ansar W, Ghosh S, editors. Biology of C Reactive Protein in Health and Disease. New Delhi: Springer India; 2016. pp. 67–107. https://doi.org/10.1007/978-81-322-2680-2_4.
42.Kontush A, HDL and Reverse Remnant-Cholesterol Transport (RRT). Relevance to Cardiovascular Disease. Trends Mol Med. 2020;26:1086–100. https://doi.org/10.1016/j.molmed.2020.07.005.
43.Bernelot Moens SJ, et al. Remnant Cholesterol Elicits Arterial Wall Inflammation and a Multilevel Cellular Immune Response in Humans. ATVB. 2017;37:969–75. https://doi.org/10.1161/ATVBAHA.116.308834.
44.Stanimirovic J, et al. Role of C-Reactive Protein in Diabetic Inflammation. Mediat Inflamm. 2022;2022:1–15. https://doi.org/10.1155/2022/3706508.
45.Zhou M, et al. Activation and modulation of the AGEs-RAGE axis: Implications for inflammatory pathologies and therapeutic interventions – A review. Pharmacol Res. 2024;206:107282. https://doi.org/10.1016/j.phrs.2024.107282.
46.McNair E, et al. Atherosclerosis and the Hypercholesterolemic AGE–RAGE Axis. Int J Angiol. 2016;25:110–6. https://doi.org/10.1055/s-0035-1570754.
Abbreviations
RCII
Residual cholesterol-inflammation index
SII
Systemic immune-inflammation index
MAFLD
Metabolic dysfunction-associated fatty liver disease
NAFLD
Non-alcoholic fatty liver disease
NHANES
National health and nutrition examination survey
HDL-C
High-density lipoprotein cholesterol
LDL-C
Low-density lipoprotein cholesterol
IDL
Intermediate-density lipoproteins
VLDL
Very-low-density lipoproteins
NHHR
Non-HDL to HDL cholesterol ratio
ROC
Receiver operating characteristic
AUPR
Area under the precision-recall curve
HOMA-IR
Homeostatic model assessment of insulin resistance
ICD
International Classification of Diseases
ACME
Average causal mediation effect
GWAS
Genome-wide association study
IVW
Inverse variance weighted
LASSO
Least absolute shrinkage and selection operator
LightGBM
Light gradient boosting machine
SHAP
SHapley Additive exPlanations
SNPs
Single nucleotide polymorphisms
VCTE
Vibration-controlled transient elastography
Acknowledgements
We extend our gratitude to the participants of the NHANES database in the United States for their invaluable contribution to this study.
A
Author Contribution
Zhongqiao Lu: Writing– original draft, data curation and formal analysis. Yingxia Hu: Writing– original draft, data curation and formal analysis. Deshan Zong: Writing– review, data curation and formal analysis. Bin Yue: Writing– review and editing, & Methodology.
A
Data Availability
All data used in this study is available through the National Health and Nutrition Examination Survey repository, which is publicly accessible at: [https://wwwn.cdc.gov/nchs/nhanes/default.aspx] (https:/wwwn.cdc.gov/nchs/nhanes/default.aspx) .
Figure 1. Inclusion and exclusion criteria of study participants
Figure 2. Association between quartile levels of RC, CRP, and RCII and the risk of MAFLD.
Figure 3. Kaplan–Meier survival curves for all-cause mortality, cardiovascular disease mortality, and premature death across RCII quartiles. (A) all-cause mortality, (B) cardiovascular disease mortality, (C) Premature death.
Figure 4. Associations between RCII quartiles and the risks of all-cause mortality, cardiovascular mortality, and premature death. (A) all-cause mortality. (B) cardiovascular disease mortality. (C) premature death.
Figure 5. Dose–response relationships between RCII and the risk of MAFLD and related mortality outcomes. (A) RCS models illustrate the associations between the RCII and the risk of MAFLD. (B) all-cause mortality (C) cardiovascular disease mortality. (D) premature death.
Figure 6. Mendelian Randomization Estimates of key lipid and inflammatory biomarkers on MAFLD Risk
Figure 7. Mediation pathways illustrating the role of FPG in the association between RCII and MAFLD or adverse outcomes.
Figure 8. Performance comparison and feature importance analysis of machine learning models for MAFLD prediction. (A-D) Performance metrics of five machine learning classifiers evaluated on the training and testing datasets, including precision–recall (PR) curves and receiver operating characteristic (ROC) curves. (E) SHAP plot illustrating the contribution of individual features to MAFLD prediction in the RF model. (F) Ranked feature importance based on the RF algorithm
Table 1 Baseline characteristics of clinical information
Features | Overall (n = 13254) | non-MAFLD (n = 7561) | MAFLD (n = 5693) | P-Value |
|---|
Age (median, IQR) | 48.00 (31.00) | 45.00 (33.00) | 52.00 (28.00) | < 0.001 |
Gender (n, %) | | | | < 0.001 |
Male | 6305 (47.57) | 3294 (43.57) | 3011 (52.89) | |
Female | 6949 (52.43) | 4267 (56.43) | 2682 (47.11) | |
Race (n, %) | | | | < 0.001 |
Mexican American | 2733 (20.62) | 1396 (18.46) | 1337 (23.48) | |
Non-Hispanic White | 6583 (49.67) | 3888 (51.42) | 2695 (47.34) | |
Non-Hispanic Black | 2482 (18.73) | 1371 (18.13) | 1111 (19.52) | |
Other Hispanic | 913 (6.89) | 526 (6.96) | 387 (6.80) | |
Other Race | 543 (4.10) | 380 (5.03) | 163 (2.86) | |
Education (n, %) | | | | < 0.001 |
Less than High School | 4014 (30.29) | 2109 (27.89) | 1905 (33.46) | |
High School and above | 9240 (69.71) | 5452 (72.11) | 3788 (66.54) | |
Marital_Status (n, %) | | | | < 0.001 |
Never married/Widowed | 5125 (38.67) | 3065 (40.54) | 2060 (36.18) | |
Married/Living with partner | 8129 (61.33) | 4496 (59.46) | 3633 (63.82) | |
BMI (median, IQR) | 27.57 (7.63) | 24.75 (4.62) | 32.39 (7.03) | < 0.001 |
Waist_Circumference (median, IQR) | 97.00 (20.00) | 89.00 (13.80) | 108.80 (15.00) | < 0.001 |
SBP (median, IQR) | 122.00 (24.00) | 118.00 (24.00) | 126.00 (24.00) | < 0.001 |
DBP (median, IQR) | 70.00 (16.00) | 68.00 (14.00) | 72.00 (16.00) | < 0.001 |
Smoking (n, %) | | | | < 0.001 |
No | 7181 (54.18) | 3787 (50.09) | 3394 (59.62) | |
Yes | 6073 (45.82) | 3774 (49.91) | 2299 (40.38) | |
Drinking (n, %) | | | | < 0.001 |
No | 3366 (25.40) | 1759 (23.26) | 1607 (28.23) | |
Yes | 9888 (74.60) | 5802 (76.74) | 4086 (71.77) | |
Diabetes (n, %) | | | | < 0.001 |
No | 11953 (90.18) | 7131 (94.31) | 4822 (84.70) | |
Yes | 1301 (9.82) | 430 (5.69) | 871 (15.30) | |
CHD (n, %) | | | | < 0.001 |
No | 12710 (95.90) | 7323 (96.85) | 5387 (94.62) | |
Yes | 544 (4.10) | 238 (3.15) | 306 (5.38) | |
Stroke (n, %) | | | | < 0.001 |
No | 12780 (96.42) | 7337 (97.04) | 5443 (95.61) | |
Yes | 474 (3.58) | 224 (2.96) | 250 (4.39) | |
Angina (n, %) | | | | < 0.001 |
No | 12855 (96.99) | 7416 (98.08) | 5439 (95.54) | |
Yes | 399 (3.01) | 145 (1.92) | 254 (4.46) | |
Cancer (n, %) | | | | 0.083 |
No | 12066 (91.04) | 6912 (91.42) | 5154 (90.53) | |
Yes | 1188 (8.96) | 649 (8.58) | 539 (9.47) | |
FPG (median, IQR) | 98.00 (16.70) | 95.00 (14.30) | 102.70 (20.00) | < 0.001 |
CRP (median, IQR) | 0.22 (0.42) | 0.15 (0.28) | 0.35 (0.58) | < 0.001 |
TC (median, IQR) | 195.00 (54.00) | 192.00 (53.00) | 199.00 (55.00) | < 0.001 |
HDL-C (median, IQR) | 51.00 (21.00) | 56.00 (22.00) | 46.00 (17.00) | < 0.001 |
LDL-C (median, IQR) | 115.00 (47.00) | 112.00 (46.00) | 119.00 (47.00) | < 0.001 |
TG (median, IQR) | 114.00 (85.00) | 94.00 (60.00) | 150.00 (99.00) | < 0.001 |
GGT (median, IQR) | 20.00 (17.00) | 17.00 (12.00) | 27.00 (23.00) | < 0.001 |
FLI (median, IQR) | 52.19 (58.26) | 26.20 (31.30) | 84.03 (20.45) | < 0.001 |
RC (median, IQR) | 23.00 (17.00) | 19.00 (12.00) | 30.00 (20.00) | < 0.001 |
RCII (median, IQR) | 5.22 (12.22) | 2.85 (6.36) | 10.56 (18.48) | < 0.001 |
RCII_Type (n, %) | | | | < 0.001 |
Q1 | 3308 (24.96) | 2897 (38.32) | 411 (7.22) | |
Q2 | 3316 (25.02) | 2117 (28.00) | 1199 (21.06) | |
Q3 | 3310 (24.97) | 1516 (20.05) | 1794 (31.51) | |
Q4 | 3320 (25.05) | 1031 (13.64) | 2289 (40.21) | |
MAFLD, metabolic dysfunction-associated fatty liver disease; IQR, inter-quartile range; SBP, systolic blood pressure; DBP, diastolic blood pressure; CHD, coronary heart disease; FPG, fasting plasma glucose; CRP, C-reactive protein; TC, total cholesterol; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; TG, triglyceride; GGT, glutamyl transferase; FLI, fatty liver index; RC, remnant cholesterol; RCII, residual cholesterol-inflammation index
Table 2. Two-step Mendelian randomization mediation analysis results: estimates of the total, direct, and FPG-mediated effects of HDL-C and CRP on MAFLD risk
Exposure | Mediator | Outcome | Total_beta | Direct_beta | Mediation_beta |
|---|
HDL-C | FPG | MAFLD | -0.132 | -0.125 | -0.007 (-0.010 to -0.002) |
CRP | FPG | MAFLD | 0.166 | 0.158 | 0.008 (0.001 to 0.028) |
MAFLD, metabolic dysfunction-associated fatty liver disease; FPG, fasting plasma glucose; HDL-C, high-density lipoprotein cholesterol; CRP, C-reactive protein
Table 3: Performance comparison of different machine learning models for MAFLD classification on training and testing datasets
Models | Train datasets | Testing datasets |
|---|
Accuracy | Kappa | AUC-ROC | Sensitivity | Specificity | Precision | Accuracy | Kappa | AUC-ROC | Sensitivity | Specificity | Precision |
|---|
RF | 0.98 | 0.958 | 0.999 | 0.964 | 0.991 | 0.988 | 0.897 | 0.788 | 0.96 | 0.864 | 0.921 | 0.892 |
KNN | 0.934 | 0.864 | 0.987 | 0.911 | 0.951 | 0.933 | 0.845 | 0.681 | 0.921 | 0.793 | 0.884 | 0.837 |
NB | 0.874 | 0.74 | 0.949 | 0.81 | 0.922 | 0.887 | 0.865 | 0.72 | 0.943 | 0.786 | 0.924 | 0.886 |
LGB | 0.944 | 0.887 | 0.991 | 0.934 | 0.952 | 0.936 | 0.891 | 0.776 | 0.956 | 0.868 | 0.907 | 0.876 |
rpart | 0.865 | 0.728 | 0.868 | 0.884 | 0.851 | 0.817 | 0.859 | 0.715 | 0.861 | 0.875 | 0.847 | 0.811 |
MAFLD, metabolic dysfunction-associated fatty liver disease; RF, random forest; KNN, k-nearest neighbors; NB, naïve Bayes; LGB, light gradient boosting machine; AUC-ROC, area under the receiver operating characteristic curve