I. Introduction
Natural disasters represent one of the most pressing challenges facing contemporary society, with profound implications for human safety, economic stability, and sustainable development. According to the United Nations Office for Disaster Risk Reduction (UNDRR), over 160 million people worldwide are affected by natural hazards annually, resulting in substantial human casualties and economic losses exceeding billions of dollars [1]. Recent advances in artificial intelligence and machine learning have opened unprecedented opportunities [30] for enhancing disaster prediction accuracy [29] and early warning systems [2,16,31,34,35,36], yet the interpretability of these "black-box" models remains a critical challenge for decision-makers and stakeholders.
Although significant progress has been made in applying machine learning algorithms to natural disaster forecasting [3], existing methods still suffer from limited transparency and explainability. Traditional statistical approaches provide clear interpretations but often sacrifice predictive performance, while deep learning solutions achieve high accuracy at the expense of model interpretability [4,22,26,28,33]. This trade-off between accuracy and explainability has hindered the widespread adoption of AI-based disaster risk assessment systems, particularly in policy-making contexts where stakeholders require transparent justifications for risk predictions and resource allocation decisions [5].
This limitation motivates us to explore explainable artificial intelligence (XAI) techniques, specifically SHAP analysis, which can bridge the gap between model performance and interpretability [6]. The key challenge lies in developing a disaster risk prediction framework that not only achieves high classification accuracy across multiple risk categories but also provides intuitive, feature-level explanations that reveal which factors contribute most significantly to disaster vulnerability. Understanding these driving factors is essential for developing targeted mitigation strategies and optimizing resource allocation in disaster preparedness programs [3].
To address these challenges, we propose an integrated AI framework combining XGBoost ensemble learning with SHAP-based explainability analysis for multi-class disaster risk prediction. Our approach leverages the World Risk Index dataset, which comprehensively quantifies disaster risk through exposure to natural hazards and social vulnerability indicators across 181 countries [14,25]. The framework employs XGBoost classifiers to predict five-level risk categories (Very Low, Low, Medium, High, Very High) for four critical dimensions: overall World Risk Index, Exposure, Vulnerability, and Susceptibility [1]. Furthermore, we integrate SHAP analysis to decode the complex decision-making processes of the trained models, revealing both global feature importance patterns and local prediction explanations for individual risk assessments [12].
The main contributions of this work are as follows:
A
1.
Novel Integration of XGBoost and SHAP for Disaster Risk Assessment: We present the first comprehensive framework combining gradient boosting classification with explainable AI techniques for multi-dimensional disaster risk prediction, achieving superior classification performance while maintaining full interpretability.
2.
Multi-Class Risk Categorization Across Critical Dimensions: Our system simultaneously predicts risk levels for four interconnected disaster indicators, providing a holistic assessment that captures exposure, vulnerability, and susceptibility patterns with test accuracies exceeding 85% and AUC scores above 0.92.
3.
Interpretable Feature Contribution Analysis: Through SHAP value computation and visualization [6], we identify and quantify the most influential factors driving disaster susceptibility, revealing actionable insights such as feature interaction effects and threshold behaviors that inform targeted risk mitigation strategies.
4.
Transparent AI-Driven Decision Support: Our explainable framework generates visual evidence (summary plots, dependence plots, interaction heatmaps) that enables policymakers and disaster management authorities to understand, validate, and trust AI-generated risk predictions, facilitating evidence-based resource allocation and policy formulation.
II.
Related Work
A.
Machine Learning in Disaster Prediction
Recent advances in machine learning have revolutionized natural disaster prediction and risk assessment [19,20]. Singh et al. (2023) demonstrated that neural networks [17,18] and decision trees could effectively predict disaster likelihood based on location, season, and weather conditions, achieving prediction rates approaching 92% for various natural hazards [11]. Their work highlighted the importance of selecting appropriate algorithms based on data characteristics and forecasting requirements.
Fowdur and Nassir-Ud-Diin (2022) developed a real-time collaborative machine learning system for weather forecasting with multiple predictor locations, emphasizing the role of ensemble methods in improving prediction accuracy [10]. Their framework integrated diverse data sources, demonstrating that multi-location predictions significantly enhance forecast reliability compared to single-point predictions.
In flood prediction research, Janizadeh et al. (2022) and Linh et al. (2022) applied XGBoost algorithms to construct flood susceptibility maps by analyzing relationships between historical flood events and conditioning factors [2][3]. Their studies confirmed XGBoost's superior performance in handling non-linear relationships and diverse hydrological data. Similarly, Ha et al. (2023) highlighted how flood susceptibility modeling has evolved from structural mitigation measures to comprehensive risk assessments incorporating exposure, hazard, and vulnerability factors [4].
Frifra et al. (2024) employed LSTM and XGBoost algorithms for storm prediction in Western France, demonstrating the effectiveness of ensemble methods in capturing temporal patterns in extreme weather events [1]. Their work achieved high accuracy in forecasting storm characteristics, validating the applicability of gradient boosting techniques to multi-hazard risk assessment. Labis (2024) provided a comprehensive review of machine learning for disaster risk reduction, identifying effective solutions across earthquake, flood, and other disaster case studies while addressing challenges related to data quality, model interpretability, and real-time processing limitations [9].
Liu et al. (2024) proposed a novel flood risk management approach based on future climate and land use change scenarios, integrating XGBoost with hydraulic modeling to predict flood risk comprehensively [14]. Their research emphasized the importance of considering multiple factors including exposure, hazard, and vulnerability in disaster risk assessment frameworks.
B.
Explainable AI and SHAP Analysis
The interpretability of machine learning models has become increasingly critical in high-stakes applications. Srivalli and Sumanthi (2025) proposed a SHAP-based approach for financial risk assessment, demonstrating how explainable AI techniques can enhance stakeholder confidence and facilitate transparent decision-making [5]. Their work emphasized that SHAP (SHapley Additive exPlanations) provides mathematically rigorous feature attribution, making it particularly suitable for risk quantification tasks.
Pappala et al. (2025) explored explainable AI applications in healthcare risk assessment, highlighting techniques such as SHAP, LIME, and attention mechanisms for clarifying AI behavior at both global and local levels [6,24]. Their research underscored the importance of balancing model accuracy with interpretability, particularly in clinical environments where transparency builds trust among medical professionals and patients.
In geospatial hazard modeling, Daif et al. (2025) investigated SHAP versus LIME for temperature forecasting, demonstrating that SHAP analysis provides more consistent and reliable explanations across different model architectures [7]. Their comparative study confirmed SHAP's superiority in handling complex, non-linear climate data patterns, achieving high predictive accuracy while maintaining interpretability across various lead times.
Chen and Chen (2021) successfully integrated SHAP analysis with XGBoost for urban flood susceptibility assessment in Shenzhen City, revealing how urban morphology influences flooding risks [8]. Their explainable AI framework enabled urban planners to understand spatial patterns of flood vulnerability and optimize infrastructure development accordingly [21], demonstrating the practical value of combining predictive modeling with interpretability techniques.
Zhang et al. (2023) advanced the DS-XGBoost model for financial risk early warning, combining D-S evidence theory with XGBoost algorithm and SHAP analysis [12,23]. Their work demonstrated that explainable AI not only improves prediction accuracy but also provides transparent decision-making processes essential for risk management applications, achieving accuracy rates exceeding 85% while maintaining full interpretability of feature contributions.
Anees et al. (2025) applied machine learning-based vulnerability assessment in forest fire prediction, utilizing XGBoost models to analyze complex environmental patterns [13]. Their research demonstrated the effectiveness of ensemble learning methods in capturing non-linear relationships between topographic, meteorological, and vegetation characteristics in disaster susceptibility modeling.
C.
World Risk Index and Comprehensive Risk Assessment
The World Risk Index (WRI), developed by the United Nations University and Bündnis Entwicklung Hilft, provides a comprehensive framework for assessing disaster risk globally [15]. The index combines exposure to natural hazards (earthquakes, cyclones, floods, droughts, sea-level rise) with social vulnerability dimensions including susceptibility, lack of coping capacities, and lack of adaptive capacities. This multi-dimensional approach enables holistic risk assessment that considers both physical hazard exposure and socioeconomic vulnerability factors.
Recent advances in disaster management emphasize the integration of AI with traditional risk assessment methodologies. The UNDRR Global Assessment Report highlights that effective disaster risk reduction requires not only accurate prediction but also transparent communication of risk factors to diverse stakeholders [15]. This necessitates the development of explainable AI systems that can provide actionable insights while maintaining high predictive accuracy across multiple risk dimensions.
III.
DATASET DESCRIPTION AND PREPROCESSING
The World Risk Report is an annual technical publication focused on global disaster risk assessment, published in both German and English languages. The World Risk Index (WRI) identifies disaster risks associated with extreme natural events across 181 countries worldwide.
The World Risk Index employs 27 aggregated public indicators to quantify disaster risk globally. Conceptually, the index comprises two main components: exposure to extreme natural hazards and social vulnerability of nations. Exposure analysis considers earthquakes, cyclones, floods, droughts, and climate-induced sea-level rise. Social vulnerability is decomposed into three dimensions: susceptibility to extreme natural events, lack of coping capacities, and lack of adaptive capacities. All index component values range from 0 to 100, where higher WRI scores indicate greater national disaster risk.
The five-level risk categorization (Very Low, Low, Medium, High, Very High) follows the official World Risk Report classification scheme, which employs quintile-based thresholds validated by domain experts. Specifically, categories are defined using equal-frequency binning: Very Low (0-20th percentile), Low (20-40th), Medium (40-60th), High (60-80th), and Very High (80-100th). This approach ensures balanced class distributions while maintaining consistency with established disaster risk assessment frameworks. We adopted multi-class classification rather than regression to align with policy-relevant discrete risk levels used by disaster management authorities for resource allocation decisions.
The dataset encompasses 11 years of data across multiple countries with the following features:
Region: Geographic region name
WRI (World Risk Index): Overall disaster risk score for the region
Exposure: Risk exposure to natural hazards (earthquakes, hurricanes, floods, droughts, sea-level rise)
Vulnerability: Vulnerability based on infrastructure, nutrition, housing conditions, and economic framework
Susceptibility: Susceptibility determined by infrastructure, nutrition, housing status, and economic conditions
Lack of Coping Capabilities: Coping capacity related to governance, preparedness, early warning systems, medical care, and social/material security
Lack of Adaptive Capacities: Adaptive capacity concerning upcoming natural events, climate change, and future challenges
Year: Temporal dimension of the data
WRI Category, Exposure Category, Vulnerability Category, Susceptibility Category: Categorical classifications of risk levels (Very Low, Low, Medium, High, Very High)
Figure 1 illustrates the boxplot analysis of continuous risk scores across five categorical levels for each indicator. The World Risk Index (Fig. 1a) demonstrates clear separation between risk categories, with median values progressively increasing from Very Low (approximately 2.5) to Very High (approximately 40). The Kruskal-Wallis test yielded P < 0.001, confirming statistically significant differences among all risk categories. Similar patterns are observed for Exposure (Fig. 4b), Vulnerability (Fig. 1c), and Susceptibility (Fig. 1d), validating the categorical classifications used in our multi-class prediction models. The statistically significant separation between categories (Kruskal-Wallis P < 0.001) validates the categorical formulation, demonstrating that the quintile-based boundaries effectively distinguish distinct risk profiles rather than arbitrary divisions of continuous scores.
IV. Experiments
The 11-year temporal span of our dataset enables examination of risk evolution. Preliminary temporal analysis revealed that global risk distributions remained relatively stable during 2009–2019, with correlation coefficients between consecutive years exceeding 0.85. Our independent sampling approach is justified for this stability period, though we acknowledge that incorporating temporal features (e.g., year-over-year changes) could capture emerging trends. This represents a valuable direction for future work, particularly for dynamic early-warning systems.
A.
Multi-Class ROC Curve Analysis
We developed XGBoost classification models for four disaster risk indicators to predict five-level risk categories. For classification tasks, we employed stratified 5-fold cross-validation to ensure robust performance estimation across different data partitions. The reported test results are from an independent 20% holdout set not seen during cross-validation. Cross-validation yielded mean accuracies of 0.86 ± 0.02 (WRI), 0.85 ± 0.03 (Exposure), 0.84 ± 0.02 (Vulnerability), and 0.87 ± 0.02 (Susceptibility), with low standard deviations confirming model stability. The final models trained on 80% of data and evaluated on the holdout set achieved accuracies exceeding 0.85, consistent with cross-validation results, ensuring the model's generalizability across diverse risk profiles. Figure 1 presents the Receiver Operating Characteristic (ROC) curves for all four multi-class classification tasks on test set data.
The ROC curve analysis demonstrates exceptional discriminative performance across all risk categories. For Exposure classification (Fig. 1a), the model achieved individual class AUC values ranging from 0.926 to 0.984, with the "Very High" category achieving the highest discriminative power (AUC = 0.984, 95% CI=[0.972–0.991]). The stepped curve pattern reflects the multi-class nature of the predictions, where one-vs-rest binary classifications are performed for each risk level.
WRI classification (Fig. 1b) exhibited similarly strong performance, with macro-averaged AUC of 0.954. The model demonstrated particularly robust performance in distinguishing extreme risk categories (Very Low and Very High), while maintaining reliable classification for intermediate risk levels. Confidence intervals remained narrow across all categories, indicating stable and consistent model predictions.
Vulnerability and Susceptibility classifications (Figs. 1c and 1d) achieved comparable performance levels, with macro-averaged AUC scores of 0.947 and 0.941 respectively. The consistent performance across all four indicators validates the robustness of the XGBoost framework and demonstrates its capability to capture complex, non-linear relationships between disaster risk factors and categorical outcomes.
Both cross-validation and holdout test results demonstrated consistent performance: test accuracies exceeded 0.85, F1-scores ranged from 0.82 to 0.88, recall rates above 0.80, and AUC scores consistently surpassed 0.92. The narrow confidence intervals (95% CI width less than 0.03) and low cross-validation variance confirm model robustness across different geographic regions and temporal periods. These comprehensive metrics confirm the reliability and practical applicability of the AI-driven classification system for real-world disaster risk assessment. These comprehensive metrics confirm the reliability and practical applicability of the AI-driven classification system for real-world disaster risk assessment. To contextualize XGBoost performance, we conducted baseline comparisons with Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM) on the Susceptibility classification task. XGBoost achieved test accuracy of 0.87 (AUC = 0.941), outperforming RF (0.83, AUC = 0.916), SVM (0.79, AUC = 0.893), and LR (0.74, AUC = 0.871). The superiority of XGBoost is attributed to its gradient boosting framework, which effectively captures non-linear feature interactions and handles imbalanced classes through weighted learning. Training efficiency also favored XGBoost (12.7 seconds) compared to SVM (45.6 seconds), while maintaining comparable speed to Random Forest (18.4 seconds). These results confirm that ensemble methods, particularly XGBoost, are better suited for the complex, multi-dimensional nature of disaster risk data compared to traditional classifiers.
B.
SHAP-Based Interpretability Analysis
To enhance the transparency and trustworthiness of our AI-powered disaster prediction system, we employed SHAP (SHapley Additive exPlanations) analysis to decode the decision-making processes of the trained XGBoost models. Taking Susceptibility Category as a representative example, we conducted comprehensive interpretability analysis revealing which features most significantly influence disaster susceptibility predictions.
Figure 3 presents multifaceted SHAP visualizations unveiling the complex feature interactions driving susceptibility predictions. The interaction heatmap (Fig. 3a) quantifies pairwise feature interaction effects, revealing that Lack of Coping Capabilities exhibits the strongest interaction with multiple other vulnerability dimensions. This finding indicates that coping capacity deficits compound the effects of other risk factors, amplifying overall susceptibility.
A
The SHAP dependence plots (Figs.
3d-f) illustrate feature value relationships with model predictions, showing threshold effects and non-linear patterns. For instance, Lack of Adaptive Capacities demonstrates a clear positive correlation with susceptibility predictions (Fig.
3f), where values exceeding 60 trigger substantial increases in predicted risk levels. Such insights enable policymakers to identify critical intervention points where targeted capacity-building efforts would yield maximum risk reduction benefits.
Figure 4 presents SHAP summary plots for Susceptibility across three representative risk categories: Very Low (Fig. 4, top), Medium (Fig. 4, middle), and Very High (Fig. 4, bottom). These visualizations reveal how feature importance patterns shift across different risk levels.
For Very Low susceptibility regions, WRI Category and Exposure Category emerge as primary protective factors (negative SHAP values), indicating that favorable overall risk indices and limited hazard exposure effectively buffer against susceptibility. Conversely, for Very High susceptibility regions, Lack of Coping Capabilities and Lack of Adaptive Capacities dominate positive SHAP contributions, confirming that social vulnerability dimensions are the critical determinants of extreme susceptibility.
The Medium risk category exhibits mixed feature contributions, with both exposure-related and capacity-related factors playing balanced roles. This pattern suggests that Medium-risk regions represent transitional states where interventions targeting either hazard exposure reduction or capacity enhancement could effectively mitigate susceptibility.
The AI-driven SHAP analysis framework successfully identified interpretable patterns explaining why certain regions experience high disaster susceptibility while others remain resilient. These explainable insights transform "black-box" machine learning predictions into actionable intelligence for disaster risk management authorities.
The integration of XGBoost's predictive power with SHAP's interpretability capabilities addresses the critical challenge of building trustworthy AI systems for high-stakes disaster management applications. By revealing the "why" behind predictions, our framework enables evidence-based policy formulation, targeted resource allocation, and transparent communication of risk assessments to diverse stakeholders.