Landslide susceptibility assessment in the Central Yunnan Plateau
by assembling optimized statistical and machine learning models
A
JieyingChen1✉Email
QinLi2
ChengHuang2
XieHu1
ZehaoShen1,3
1State Key Laboratory of Vegetation Structure, Function and Construction, College of Urban and Environmental SciencePeking University100871BeijingChina
2Key Laboratory of Geohazard Forecast and Geoecological Restoration in Plateau Mountainous AreaYunnan Institute of Geo-Environment Monitoring650216KunmingChina
3
A
The Southwest United Graduate School650092Kunming
Jieying Chen1, Qin Li2, Cheng Huang2, Xie Hu1, Zehao Shen1, 3*
1 State Key Laboratory of Vegetation Structure, Function and Construction, College of Urban and Environmental Science, Peking University, Beijing, 100871, China
2 Key Laboratory of Geohazard Forecast and Geoecological Restoration in Plateau Mountainous Area, Yunnan Institute of Geo-Environment Monitoring, Kunming, 650216, China;
3. The Southwest United Graduate School, Kunming, 650092
*Email: shzh@urban.pku.edu.cn
Abstract
Background
Landslides are a significant geohazard in mountainous regions worldwide, with increasing occurrences due to the changing climate and intensified land-use activities. The Central Yunnan Plateau (CYP) in Southwest China is particularly prone to landslides due to its geological and climatic conditions.
Methods
This study employs a data-driven approach to optimize parameters related to rainfall, geology, and land use, using point-biserial correlation and decision tree methods. To verify the effectiveness of our method, we conducted a comparative analysis across three models: random forest, support vector machine, and logistic regression. The optimized random forest model was then used to sort out the importance of influencing factors. Finally, a landslide susceptibility map was generated by averaging the results from the three models.
Results
Key findings indicate that 24-hour and 30-day cumulative rainfall are primary climatic predictors in landslide susceptibility. Specifically, landslide susceptibility increases sharply with 24-hour rainfall up to 40 mm, while the effect of 30-day cumulative rainfall shows a slight initial decrease followed by a gradual increase. Decision tree analysis further reveals that landslide susceptibility is lower in forest and grassland compared to cultivated and constructing areas, and is highest in unconsolidated and carbonate rocks (30–70%). Proximity to roads within 1,000 m also present high landslide susceptibility. Model predictions suggest that the eastern and southern parts of the CYP with high rainfall and significant human activities have higher landslide susceptibility.
Conclusion
A
This study underscores the importance of improved rainfall monitoring, targeted infrastructure maintenance, and strategic land-use planning to mitigate landslide hazards and enhance safety for residents in vulnerable regions of the CYP.
Keywords:
Central Yunnan Plateau
Landslide
Model optimization
Machine learning
Influencing factors
A
Introduction
Landslides are pervasive geological hazards threatening lives, properties, and infrastructures worldwide, especially in mountainous regions (Chae et al. 2017). As an intrinsic process in the Critical Zone, landslides are of the primary concern in disciplines such as geology, ecology, and environmental sciences (Lee et al. 2023; Silhán et al. 2021). The rise in landslide frequency and intensity worldwide is largely attributed to the compounding effects of intensifying anthropogenic climate changes and human activities in recent decades (Emberson et al. 2021; Lacroix et al. 2020b; Froude & Petley 2018).
A landslide refers to the downslope movement of rocks, soils, debris, or a combination of these materials under the influence of gravity along a slope or hillside (Cruden 1991; Varnes 1958). The occurrences of landslides result from the interplay of various geological and environmental factors (Lacroix et al. 2020b). Among which, geological conditions, such as the presence of weak rock layers or faults, play a crucial role, as these layers may gradually become unstable under persistent stress (Schulz et al. 2018; Mainsant et al. 2012). Additionally, climatic factors, such as prolonged rainfall or heavy rainstorm, increase soil pore pressure, reducing slope stability (Handwerger et al. 2019; Schulz et al. 2009). Tectonic activities, including earthquakes, also contribute to landslide susceptibility by destabilizing rock and soil masses (Keefer 1984; Bontemps et al. 2020). Human activities, particularly the land use change (e.g. road construction, agricultural activities), could reshape surface morphology and vegetation cover (Petley et al. 2007; Lacroix et al. 2020a). Seismic activities also have significant synergistic effects on landslide occurrence (Frigerio et al. 2021). Therefore, landslide susceptibility varies across different natural and societal conditions (Zhi et al., 2023; Goetz et al., 2015), which interact in complex ways, influencing regional variations in landslide triggers (Pacheco Quevedo et al., 2023).
Landslide studies primarily adopt mechanism-driven and data-driven approaches to address causative analysis, susceptibility assessment, and risk evaluation (Dai et al., 2002; Crozier & Glade, 2005; Sim et al. 2022). Mechanism-driven methods, such as limit equilibrium analysis and finite element modeling, simulate slope stability dynamics through physical laws (Duncan, 1996; Griffiths & Lane, 1999). These methods provide explicit physical interpretability but require precise parameter calibration, and are limited for large-scale applications (Van Weste et al., 2008; Zhang et al. 2018). In contrast, data-driven methodologies span statistical models like logistic regression (LR) for spatial susceptibility assessment (Guzzetti et al., 2005), machine learning algorithms including support vector machine (SVM) and random forest (RF) for nonlinear pattern recognition (Tehrani et al., 2022; Merghadi et al. 2020), and deep learning architectures such as convolutional neural networks (CNNs) have been used to decipher landslide imagery (Sameen et al., 2020). While these techniques excel at handling complex relationships, their reliance on data quality and inherent "black-box" nature limit the interpretability. Recent advancements integrate statistical frameworks with machine learning, such as CNN and RF incorporating physical constraints (Liu et al., 2024), effectively leveraging domain knowledge and data adaptability to enhance model robustness and generalization in landslide susceptibility assessment.
A
Landslide studies across different scales yield varying interpretations in triggering mechanisms. Slope-scale studies generally focus on the landslide process and small-scale triggers, such as topography and lithological types, variations in pore water pressure, rock mass fracturing, and the impacts of localized rainfall (Liu et al. 2021; Perrone et al. 2008; Tan et al. 2023; Feng et al. 2024). In contrast, regional or global studies would further take into account the influence of climate change, geological process (e.g. earthquakes), and human activities on the frequency and intensity distributions of landslides. The increase in extreme weather events associated with global warming, such as intense rainstorms and snowmelts, significantly elevates landslide occurrences (Iverson 2000; Huggel et al. 2012; Gariano & Guzzetti. 2016). Regional-scale studies often leverage climate models and extensive geological surveys to explore strategies for mitigating the heightened landslide hazards under future climate scenarios (Hürlimann et al. 2022; Madhu et al. 2024). On a global level, research also highlights how human interventions, such as urban expansion and agricultural irrigation, alter regional hydrological characteristics and vegetation cover, consequently affecting landslide occurrence (Lacroix et al. 2020b; Froude and Petley 2018; Fidan et al. 2024).
A
China, a mountainous country with 64.89% of its land area consisting of mountains and plateaus (Deng et al. 2015), is also one of the countries that are most severely affected by landslides (Gómez et al. 2023). The Southwest China is especially exposed to frequent and intensive landslides (Zhang et al. 2018), ranking as the most catastrophic natural disaster in the area (Fan et al. 2019; Yang et al. 2020b; Chen et al. 2012). Neotectonic movements in this region have led to geomorphological alterations, forming numerous faults and folds, resulting in extensive rock fractures and unstable slopes (Zhang et al. 2007). The interaction between rugged topography and the monsoon climate further contributes to the regional intense rainfall (Huang 2009). Over the past decades, rapid increases in mining, road construction, and urbanization have further destabilized slopes, leading to an increase in landslide events (Yin et al. 2010; Li et al. 2012; Ma et al. 2018; Fan et al. 2017; Cui et al. 2022). In Yunnan Province, a representative hotspot in Southwest China, the coupling of active tectonic uplift (Xu et al., 2003), rainfall seasonality (Yang et al., 2020a), and intensive anthropogenic disturbances (Wang et al., 2022) has created a favorable environment for landslide development. A recent research shows that a RF model fed with geomorphologic variables and real-time rainfall records can provide early-warning for rainfall-driven landslides in Yunnan (Kang et al., 2024).
The Central Yunnan Plateau (CYP), located in the north-central part of Yunnan Province, is surrounded by large mountains and rivers. It has continuously served as the socioeconomic center of Yunnan and played a key role in the rapid development of Southwest China over the past decades (Li et al., 2024). CYP is also the most vulnerable region of geohazards in Yunnan, jointly caused by the high landslide frequency, impacts of natural and artificial triggers, and the typical fragility to geohazard in a densely-populated region. However, landslide research in southwest China has predominantly focused on Sichuan Province and northwestern Yunnan, with limited comprehensive studies on the CYP (Wu et al. 2022; Wang et al. 2024). Given its inclusion of the Central Yunnan Urban Cluster and the pivotal role in Yunnan’s socioeconomic development, it warrants closer investigation to understand landslide susceptibility and influencing factors.
This study aims to quantify landslide susceptibility by combining mechanism-based models with machine learning approaches. First, we used point biserial correlation and decision tree analysis to optimize temporal resolution of rainfall records, geological, and land-use parameters based on their physical relevance with landslide events. Subsequently, we compared RF, SVM, and LR models to evaluate the model fitting performance of different parameter combinations. Next, we used the optimized RF model to analyze the weight of each factor's impact on landslide occurrence, and implemented mechanism-based explanations. Finally, using the prediction results from the RF, SVM, and LR models, we generated a comprehensive landslide susceptibility map for the study area, and proposed corresponding mitigation strategies for landslide hotspots. Based on the mentioned approaches, this study aims to address the following key questions:
1) What is the spatial pattern of landslide susceptibility in the CYP? Which influencing factors are most significant?
2) How can we enhance the interpretability of landslide occurrence by take into account the nonlinear effects and interactions of the predictive variables?
3) What managing strategies are need to mitigate landslide risks in the study area and similar regions?
By addressing these questions, the study contributes to robust assessment of landslide susceptibility, and improves early-warning systems for geohazards in mountainous regions.
Methods
Existing landslide studies have identified precipitation, stratigraphy, and land use as key factors influencing landslide occurrence. However, these factors show different patterns and critical thresholds across regions (Caine 1980; Aleotti 2004; Guzzetti et al. 2008). We explored the relationships of these factors through various analytical methods.
We adopted a data-driven workflow of “parameter optimization – multi-model comparison – result integration” to capture the multifactor nature of landslide susceptibility in the CYP. We first used point-biserial correlation to examine rainfall data with different time resolutions for a preliminary comparison, we then applied a decision-tree algorithm to refine the categorical groups of lithology and land use, yielding variables optimized by both temporal solution and categories. We fed these predictors to three models—RF, SVM, and LR, evaluated their performance with balanced accuracy and AUC to verify the effectiveness of the optimization. The best-performing RF model was then employed to rank the relative importance of each factor. The arithmetic mean of the susceptibility outputs from the three models was used to generate a landslide susceptibility map, integrating results from statistical and machine-learning approaches. This framework preserves the interpretability and also takes advantage of the non-linear analysis of machine learning, providing a replicable route for landslide susceptibility assessment in complex mountainous terrains.
1.1 Study area
The CYP spans approximately 85,000 m2, covering 22% of the province’s total area. The CYP extends from the Yunling-Cangshan mountain range in the west to Qujing City in the east, borders by the Jinsha River to the north, and reaches south along the Ailao Mountains to Yuxi City, where it connects with the karst mountains of southeastern Yunnan. The plateau has an average elevation of around 2,000 m a.s.l., with altitude gradually decreasing from north to south, forming a northwest- southeast sloping topography (see Fig. 1a).
Geologically, the CYP belongs to the Yangtze stratigraphic region, one of Yunnan’s four primary stratigraphic zones, with north-south-oriented fault zones. The region’s lithological types include metamorphic, intrusive, volcanic, unconsolidated, clastic, and carbonate formations. Clastic rocks dominate in the western part of the area, while carbonate rocks are prevalent in the east (see Fig. 1b). The plateau experiences a subtropical monsoon climate, with summer rainfall influenced by warm, moist airflows from the southeast and southwest and cold northern air. The local mountainous terrain further uplifts these airflows, resulting in intense, orographically rainfall that lack a clear zonal pattern. Short-duration, high-intensity precipitation is particularly notable (Yu et al. 2013).
Fig. 1
(a) The geographical location, (b) digital elevation model (DEM), (c) lithological types of the Central Yunnan Plateau (CYP).
Click here to Correct
1.2 Multisource geospatial database
2.2.1 Historical landslide inventory
The landslide data used in this study were from the Yunnan Province Geological Hazard Information System (https://wild.ynge.net:58688/dmgeo-yndzhj-portal/), a continuously updating platform. Data collection was carried out by local counties and districts, with real-time updates provided through the system. A total of 108 landslide data points from the CYP were obtained for the period 2021–2023 (see Fig. 2a), which were classified into two types: landslide disasters and landslide hazards.
Landslide disasters refer to events causing damage and losses, including casualties and property destruction (see Fig. 2b). In contrast, landslide hazards denote locations where signs of potential landslides have been observed, either due to natural or human factors, but no actual landslide has occurred (see Fig. 2c). To maintain the validity of the control data, hazard points were excluded from subsequent data analysis and the selection of control points in the model.
During the selection of pseudo-absence control points, all water bodies and 5 km buffer zones surrounding documented landslide disasters and hazard sites within the CYP were first excluded. From the remaining area, 108 control points were then randomly generated, with a minimum spacing of 1 km between any two points.
Fig. 2
(a) Geographical location of landslide, danger and pseudo-absence point. (b) Photographic illustration of landslide. (c) Photographic illustration of danger
Click here to Correct
2.2.2 Landslide influencing factors
Based on relevant studies (Dehnavi et al. 2015; Zhao et al. 2022; Jia et al. 2019), we selected climate, geological, topographical, soil, vegetation, and human activity factors as the influencing factors for landslide occurrence. For the climate factors, we selected cumulative rainfall data for the 24 hours, 7 days, 15 days, 21 days, and 30 days prior to landslide events. Monthly rainfall data, used to calculate the landslide susceptibility, were obtained from the National Tibetan Plateau Data Center (https://data.tpdc.ac.cn/product), and the cumulative rainfall data were extracted from the China Meteorological Data Service Centre (https://data.cma.cn/); for each sampling location, the record from the meteorological station closest to that point was taken as the representative value.
Geological factors included lithological types and distance to faults, were derived from the Yunnan Province geological map (scale 1:500,000), obtained via the Geographic Data Sharing Infrastructure of the College of Urban and Environmental Science, Peking University (http://geodata.pku.edu.cn). DEM at 90 m spatial resolution were obtained from the Shuttle Radar Topography Mission (SRTM). The river network was extracted in ArcGIS using the ArcSWAT extension, and the discharge threshold was verified through visual inspection of the corresponding channels in Google Earth. Euclidean distances from each landslide-disaster point and control point to the nearest river were calculated in ArcGIS. Terrain slope was derived from the DEM using the Slope function in the Surface toolbox. Soil factors comprised soil clay content and organic matter content, obtained from the Harmonized World Soil Database (https://www.fao.org/soils-portal/soil-
survey/soil-maps-and-databases/ harmonized-world-soil-database-v12/en/). Vegetation factors were represented by the Normalized Difference Vegetation Index (NDVI), acquired from the Level-1 and Atmosphere Archive & Distribution System Distributed Active Archive Center (https://ladsweb.modaps.eosdis.nasa.gov/search/order). Anthropogenic variables comprised land- use types and proximity to transportation infrastructures. Land cover was derived from the 30 m Global Land Cover dataset developed by University of Maryland (www.globallandcover.com). The roads, including national and provincial highways, county roads, and minor rural tracks, were digitized from high-resolution Google Earth imagery (https://earth.google.com). Straight-line distance from each landslide and control point to the nearest road was calculated in ArcGIS. All raster data were resampled to a consistent grid with a spatial resolution of 1 km. Detailed data information is provided in Table 1.
Table 1
Data Sources of Influencing Factors
Factor
Variable
Unit
Data acquisition methods
Climate
24-hour cumulative rainfall
mm
In situ data collection
7-day cumulative rainfall
15-day cumulative rainfall
21-day cumulative rainfall
30-day cumulative rainfall
Monthly cumulative rainfall from 2003 to 2022
Obtained by averaging monthly rainfall data from 2003 to 2022
Geology
Lithology
/
Field survey
Distance to fault
m
Calculated in ArcGIS
Topography
Elevation
m
SRTM DEM, resampled to 1000 m
Slope
°
Calculated from SRTM DEM
Distance to rivers
m
Calculated in ArcGIS
Soil
Soil clay content
%
In-situ data collection
Soil organic matter content
%
Vegetation
Normalized Difference Vegetation Index (NDVI)
/
From the Terra Moderate Resolution Imaging Spectroradiometer (MODIS) Vegetation Indices (MOD13Q1), data were generated every 16 days at 250 m spatial resolution as a Level 3 product, then resampled to 1000 m
Human activity
Land use
/
30 m original resolution, resampled to 1000 m
Distance to roads
m
Calculated in ArcGIS
1.3 Factor optimization
2.3.1 Optimization of rainfall data at different temporal resolutions
To evaluate the effect of rainfall data at different resolutions, we employed the Pearson correlation coefficient, a statistic that measures the strength and direction of the linear relationship between two continuous variables (Pearson 1895). The formula for calculating the Pearson Correlation Coefficient is as follows:
1
where Xi and Yi are the i-th observations of variables X and Y, respectively, and
and
are the mean values of variables X and Y, respectively.
To investigate the relationships between rainfall (a continuous variable) and landslide occurrence (a binary variable where 0 indicates no occurrence and 1 indicates occurrence), we used the point-biserial correlation coefficient, which ranges between − 1 and 1 (Tate 1954; Duan et al. 2023).
2
where is the point-biserial correlation coefficient, is the mean of the continuous variable for the group where the binary variable equals 1, is the mean of the continuous variable for the group where the binary variable equals 0, is the standard deviation of the continuous variable, and are the number of observations in each binary group, is the total number of observations.
Click here to Correct
2.3.2 Optimization of discrete variable values
We treated lithological and land-use types as unordered categorical features. Based on the original classifications of lithological and land-use types, we applied a decision tree using landslide occurrence (presence/absence) as the dependent variable and lithology and land-use categories as independent variables. This procedure reclassified lithological and land-use types into two categories, thereby aligning their classification more closely with landslide occurrence patterns.
Decision tree is a widely used supervised learning algorithm for classification tasks (Chen et al. 2024; Zhang et al. 2024). It partitions a dataset into subsets (leaf nodes) through a series of decision rules (decision nodes), where each leaf node corresponds to a classification or prediction outcome (Song and Ying 2015). The basic construction process of a decision tree includes selecting the optimal feature, splitting the dataset, recursively building the tree, and generating leaf nodes.
In this study, we implemented the C5.0 algorithm from the “C50” package in R version 4.3.3, which is based on Quinlan’s C5.0 algorithm (Kuhn & Quinlan, 2023). This approach enabled us to explore and determine the most effective classification scheme for lithological and land-use types, which were then utilized in the subsequent modeling.
1.4 Landslide prediction model development
We developed landslide prediction models using LR, SVM, and RF algorithms, with landslide influencing factors as predictors and landslide occurrence as the response variable. To ensure model validity, we first conducted correlation and collinearity checks on all predictors, confirming that the Pearson correlation coefficients were less than 0.8 and the Variance Inflation Factor (VIF) values were below 0.3. Additionally, all predictors were log-transformed in the LR to improve their distributional normality. The dataset was then randomly partitioned into a training set (80%) and a validation set (20%).
LR is a widely used statistical model for binary classification problems (Menard 2002). The model constructs a linear combination of predictors and applies the logistic function (sigmoid function) to map the output between 0 and 1, predicting the likelihood of landslide occurrence (1) or non-occurrence (0).
3
where
represents the landslide susceptibility given the predictors X,
are the model parameters, and
correspond to the climate, geological, topographical, soil, vegetation, and human activities.
SVM is a supervised learning model used for classification and regression analysis. It locates an optimal separating hyperplane in a high-dimensional space to classify data points, aiming to maximize the margin between classes (Hearst et al. 1998). For non-linearly separable data, SVM employs kernel functions to implicitly project the inputs into a higher-dimensional Hilbert space, where a maximum-margin hyperplane can be learned (Cortes & Vapnik, 1995). This model performs well with high-dimensional data and complex non-linear classification tasks (Zhou et al. 2025; Dhamercherla et al. 2025). In this study, we implemented the SVM model using the e1071 package in R version 4.3.3 and applied a linear kernel.
RF is an ensemble learning-based supervised model primarily used for classification and regression tasks. It builds up multiple decision trees and combines their predictions through voting (for classification) or averaging (for regression) to improve the model’s accuracy and stability. Each tree is trained using a bootstrapped sample of the dataset, and at each node, a random subset of features is selected for splitting. This randomness increases model diversity and reduces the risk of overfitting (Breiman 2001). In this study, we constructed a model using the “randomForest” package in R version 4.3.3, and reset the seed for consistency.
1.5 Landslide prediction model evaluation
To validate the effectiveness of the optimization strategies, we applied both pre- and post-optimized data to various machine learning models. By comparing model fitting performance before and after optimization across different algorithms, we assessed the impact of these strategies. Specifically, we incorporated reclassified lithology, land use, and different resolution rainfall data into LR, SVM, and RF models. We then compared balanced accuracy and Area Under the Curve (AUC) values to evaluate the improvement in model performance.
Balanced Accuracy is an important metric for evaluating the performance of classification models. It represents the average of the accuracy for the positive and negative classes (De Diego et al. 2022).
4
where
(True Positives) is the number of samples correctly predicted as positive,
(True Negatives) is the number of samples correctly predicted as negative,
(False Positives) is the number of samples incorrectly predicted as positive, and
(False Negatives) is the number of samples incorrectly predicted as negative.
AUC is a crucial metric for assessing the performance of classification models, particularly in binary classification tasks (Bradley 1997). The AUC represents the area under the ROC (Receiver Operating Characteristic) curve, reflecting the model's ability to distinguish between positive and negative samples. The ROC curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR), calculated as follows:
5
6
where
is the number of samples correctly predicted as positive,
is the number of samples correctly predicted as negative,
is the number of samples incorrectly predicted as positive, and
is the number of samples incorrectly predicted as negative.
The AUC value is computed using numerical integration. In R version 4.3.3, the AUC function from the “Proc” package was used for direct computation. AUC ranges between 0 to 1, with values closer to 1 indicating better performance and lower than 0.5 suggesting the not ideal performance.
1.6 Influencing factor analysis
In this study, we used the RF model to sort out the importance of influencing factors and analyze their quantitative relationships using partial dependence plots (PDPs). General methods for evaluating variable importance include node purity-based and error increase-based measures (Gregorutti et al. 2017; Hapfelmeier et al. 2014). In this study, we calculated the Mean Decrease Gini and Mean Decrease Accuracy and then averaged these two measures to determine the overall importance of each variable.
To prevent overfitting and for simplification, we performed 500 cross-validation iterations on the established model to identify the optimal number of input variables. Based on their importance rankings, we screened the input variables to develop a more efficient and accurate RF model, and used it for subsequent PDPs analysis to explore the quantitative relationship between each influencing factor and the landslide susceptibility. PDPs are visual tools used to analyze the effect of a specific feature on the prediction. The corresponding predictions are calculated and plotted, providing a clear visualization of the quantitative relationship between the feature and the predicted outcome. The factor analysis process was implemented using the “randomForest” package in R version 4.3.3.
1.7 Average landslide susceptibility map
Using the optimal parameters previously identified for lithology and land use, and rainfall data, we applied LR, SVM, and RF models to quantify landslide susceptibility in the CYP. We created a landslide susceptibility map for the region by using the existing influencing factor raster layers as independent variables.
To ensure a clear representation of rainfall data, we utilized monthly and daily mean rainfall raster data from the past 20 years (2003–2023) to generate the average landslide susceptibility map. The monthly mean rainfall data were calculated using a weighted approach based on the proportion of landslide occurrences in each month, while the daily mean rainfall was derived by averaging the monthly values.
We finalized the landslide susceptibility map using the prediction from LR, SVM, and RF models by calculating a weighted average based on the AUC values of each model.
Result
1.8
Correlation of influencing factors
The Pearson correlation coefficients between the 24-hour cumulative rainfall prior to landslide events and the 7-day, 15-day, 21-day, and 30-day cumulative rainfall are 0.35, 0.36, 0.34, and 0.46, respectively. In contrast, the correlation coefficients among the other cumulative rainfall data of different time resolutions range from 0.61 to 0.94, significantly higher than those between the 24-hour cumulative rainfall and other time periods (see Fig. 3a).
Point-biserial correlation analysis reveals that the correlation coefficients between landslide occurrence and the cumulative rainfall for 24 hours, 7 days, 15 days, 21 days, and 30 days are 0.546, 0.292, 0.226, 0.268, and 0.303, respectively (see Fig. 3b). Notably, the 24-hour and 30-day cumulative rainfall exhibit the highest correlation coefficients with landslide occurrence.
Based on these results, the 24-hour and 30-day cumulative rainfall were selected as the optimized rainfall data parameters for further analysis.
Fig. 3
(a) Correlation coefficient between rainfall data of different temporal resolutions. (b) Correlation coefficient between rainfall data of different temporal resolutions and landslide occurrence.
Click here to Correct
1.9 Identification of influencing factor categories
The decision tree classification showed that metamorphic, volcanic, intrusive, clastic, and carbonate rocks with more than 70% carbonate content are categorized as Class 1, which are less prone to landslides. In contrast, unconsolidated rocks and carbonate rocks with 30–70% carbonate content are classified as Class 2 with a higher susceptibility to landslides (see Fig. 4a).
For land use data, the decision tree classification showed that cropland and artificial surfaces fall into Class 1, more prone to landslides. Conversely, forest, grassland, and shrubland were classified as Class 2, less prone to landslides (see Fig. 4b). The optimized schemes of lithological and land use types were used for further analysis.
Fig. 4
Decision tree classifications of lithology and land-use. The left side of the vertical axis indicates whether a landslide occurred (0 = no occurrence, 1 = occurrence), and the right side of the vertical axis represents the proportion of different categories at each node. (a) Decision tree classification based on lithological types (META: metamorphic rock, VOL: volcanic rock, INS: intrusive rock, CLAS: clastic rock, CARB(> 70%): carbonate rock(> 70%), UNCON: unconsolidated rock, CARB(30–70%): carbonate rock (30–70%));(b) Decision tree classification based on land-use types (1: cropland, 2:forest, 3: grassland, 4: shrubland, 8: artificial surfaces)
Click here to Correct
1.10 Model evaluation
3.3.1 Comparison of rainfall data optimization
The performance of the models varied when rainfall data of different time resolutions were put into the LR, SVM, and RF models (Table 3). The 24-hour cumulative rainfall demonstrated the best predictive performance across all models, achieving balanced accuracy values of 0.81, 0.85, and 0.81, as well as AUC values of 0.86, 0.85, and 0.81 for LR, SVM, and RF, respectively. Notably, the SVM model achieved the highest balanced accuracy and AUC values. In contrast, as the cumulative time increased, model balanced accuracy and AUC values generally showed a decreasing trend before rising again. The 15-day cumulative rainfall data performed the worst among all three models.
When both the 24-hour and 30-day cumulative rainfall were used simultaneously, the models' predictive performance, both balanced accuracy and AUC values, improved significantly, particularly in the RF model.
Table 3
Comparison of model parameters for rainfall data at different resolutions and their combination schemes
Model Parameters
LR
SVM
RF
Balanced accuracy
AUC
Balanced
accuracy
AUC
Balanced accuracy
AUC
24-hour CR
0.81
0.876
0.85
0.85
0.81
0.81
7-day CR
0.75
0.720
0.68
0.68
0.61
0.61
15-day CR
0.70
0.700
0.65
0.65
0.46
0.46
21-day CR
0.65
0.704
0.75
0.75
0.54
0.54
30-day CR
0.70
0.748
0.68
0.68
0.55
0.55
24-h& 7-d CR
0.83
0.872
0.85
0.85
0.80
0.80
24- h&15-d CR
0.81
0.876
0.85
0.85
0.83
0.83
24- h& 21-d CR
0.81
0.872
0.85
0.85
0.85
0.85
24- h& 30-d CR
0.81
0.880
0.85
0.85
0.83
0.83
3.3.2 Comparison of decision tree classification optimization
The model performance improved after applying decision tree optimization and reclassification to the lithological and land-use types. The optimized results indicate that RF model showed the most significant improvements in balanced accuracy and AUC.
When only the lithological types were optimized, the RF model's accuracy and AUC values improved by 0.05. Optimizing the land use data alone resulted in a 0.02 increase in both accuracy and AUC. When both lithological and land-use types were optimized simultaneously, the RF model reached its best performance, with both accuracy and AUC values increasing by 0.05. In contrast, the performance of the SVM and LR models remained relatively stable (Table 4).
Table 4
Comparison of model parameters before and after decision tree optimization
Model Parameter Processing
LR
SVM
RF
Balanced
accuracy
AUC
Balanced
accuracy
AUC
Balanced
accuracy
AUC
a. Not optimized
0.81
0.848
0.85
0.85
0.78
0.78
b. Optimized lithology
0.81
0.876
0.83
0.83
0.83
0.83
c. Optimized land-use
0.79
0.852
0.85
0.85
0.80
0.80
d. Optimized both lithology and land-use
0.81
0.880
0.85
0.85
0.83
0.83
Based on the comprehensive model comparison results, the reclassification optimization of lithological and land-use types shows an improvement effect on the LR model. Similarly, combining the 24-hour and 30-day cumulative rainfall prior to landslide events optimized the RF model, highlighting the effectiveness of these optimization strategies.
Consequently, the final landslide prediction model was established using the reclassified lithological and land-use types, along with the combined 24-hour and 30-day cumulative rainfall.
1.11 Attribution of landslide susceptibility
First, feature selection was performed on the RF model. The cross-validation results showed that the top 8 factors ranked by Mean Decrease Accuracy positively contributed to the model's prediction accuracy (Fig. 5). On the other hand, the bottom 4 factors—distance to rivers, elevation, slope, and NDVI—had minimal impact and potentially negative effects on the model’s accuracy (Figure S1). Therefore, only the top 8 factors were included to create partial dependence plots (PDPs) to illustrate the quantitative relationship between influencing factors and landslide occurrence (Fig. 5). By equally weighting the Mean Decrease Accuracy and Mean Decrease Gini values, a comprehensive ranking of the landslide influencing factors was obtained for the RF model (Fig. 5). The ranking, from most to least important, is as follows: 24-hour cumulative rainfall, distance to roads, soil organic matter content, 30-day cumulative rainfall, soil clay content, land use type, distance to faults, and lithological types.
Fig. S1
The spatial patterns of the environmental factor across CYP region. (LITH: Lithology, CLAY: Soil clay content, DTF: Distance to fault, DTRI: Distance to river, DTRO: Distance to road, Ele: Elevation, LAND: Land use, NDVI: Normalized difference vegetation index, OC: Soil organic matter content, R1: 24-hour cumulative rainfall, R30: 30-day cumulative rainfall, SLO: Slope)
Click here to Correct
According to the PDPs, an increase in 24-hour cumulative rainfall within the range of 0–40 mm significantly raises the landslide susceptibility. In contrast, the effect of 30-day cumulative rainfall shows a slight initial decrease followed by a gradual increase. Additionally, the landslide susceptibility is negatively correlated with distance to roads within 1,000 m, after which it levels off. Within the range of 0–4,000 m, the landslide susceptibility is also negatively correlated with distance to faults. Furthermore, soil organic matter content and soil clay content show a similar positive correlation with landslide susceptibility. The reclassified land-use types and lithology align with expected outcomes: groups categorized as less prone to landslides are less likely to trigger landslides, vice versa.
Fig. 5
Importance ranking of landslide influencing factors and partial dependence plot in RF model (CLAY: soil clay content, DTF: distance to fault, DTRO: distance to road, LAND: land-use, OC: soil organic matter content, R1: 24-hour cumulative rainfall, R30: 30-day cumulative rainfall, LITH: lithology).
Click here to Correct
1.12 Average landslide susceptibility
After developing the landslide prediction models using LR, SVM, and RF, we generated a landslide susceptibility map for the CYP by integrating the raster layers of influencing factors. The average landslide susceptibility map for the CYP (Fig. 6) indicates that the susceptibility ranges from 0 to 0.8. Areas with an average landslide susceptibility greater than 0.6 are primarily located in the eastern and the southern parts of the plateau, while the central region generally exhibits an average landslide susceptibility lower than 0.4.
Fig. 6
Average landslide susceptibility map in the CYP.
Click here to Correct
Discussion
This study optimized geological data, land use classifications, and combinations of rainfall data at different resolutions to create an average landslide susceptibility map for the CYP region using LR, SVM, and RF models. The findings indicate a generally low landslide susceptibility in the central plateau, consistent with previous studies by Wang et al. (2014) and Liu et al. (2023), which found a significantly lower landslide occurrence in this region compared to surrounding areas. This suggests that the CYP’s stable geological environment may reduce the threat of landslide hazard, benefiting regional economic and social development.
Areas with a higher landslide susceptibility are mainly located along the eastern and southern margins of the plateau, where rainfall is significantly higher than in the central and northwestern regions. Although the terrain in the eastern region is relatively gentle, a high density of roads and substantial agricultural land use indicates considerable human disturbance. Additionally, landslide-prone eastern areas primarily consist of carbonate rock with 30–70% composition. This region is also characterized by generally low vegetation cover (Rao et al., 2023). Due to erosion, mixed composition-induced instability, and heterogeneous physical properties, this type of rock is susceptible to landslides under certain conditions (Ehrenberg & Nadeau, 2005). This indicates that landslides in the eastern region result from combined effects of climate, human activity, and geological factors.
In the southern region, characterized by river valleys with steep slopes and high soil organic matter content, rainfall is the highest in the plateau. Although organic matter can enhance soil aggregation and water retention (Esmaeilzadeh & Ahangar, 2014), excessive organic matter can significantly increase soil water absorption capacity. During extreme rainfall, this can lead to soil saturation and reduced shear strength, thus raising landslide risk (Xuan et al., 2023). Metamorphic rocks typically exhibit good stability (Eberhardt et al., 2004), but complex fractures and faults may destabilize under external stress or increased water pressure from rainfall, triggering landslides (Ehteshami-Moinabadi, 2022). This suggests that the southern region is particularly susceptible to landslides following extreme rainfall.
Different studies utilizing machine learning for landslide modeling vary in the input factors (Dehnavi et al., 2015; Zhao et al., 2022; Jia et al., 2019; Song et al., 2025). Rainfall remains a common predictor, with researchers proposing rainfall thresholds based on duration and intensity (Caine, 1980). Although some argue that landslides are caused by water infiltration rather than rainfall volume (Terlien, 1998), limited data availability often restricts models to factors like annual precipitation, monthly averages, or cumulative event rainfall (Janizadeh et al., 2023; Smith et al., 2023). This study proposes a combined 24-hour and 30-day cumulative rainfall approach for landslide modeling, which yielded the best performance in predicting landslide susceptibility. The 24-hour rainfall reflects the impact of sudden events, while the 30-day rainfall represents cumulative effects, both contributing independently to landslide prediction (Smith et al., 2023).
On the lithology, the classification standards vary due to different academic viewpoints and data accessibility, often lacking geological relevance for landslide susceptibility assessment (Dehnavi et al., 2015; Chen & Zhang, 2021). To address this, a decision tree was used to reclassify the lithological types, yielding practical and reliable results. In the CYP, metamorphic, volcanic, intrusive, clastic, and carbonate rocks with more than 70% carbonate content are less prone to landslides, while unconsolidated rocks and carbonate rocks with 30–70% carbonate content are more prone. Metamorphic rocks typically exhibit high strength and low porosity, providing stability and reducing landslide risk (Eberhardt et al. 2004). Volcanic and intrusive rocks are generally dense and hard, with strong resistance to weathering and erosion, making landslides less likely (Ladygin et al. 2023). Clastic rocks have tightly arranged particles, low porosity, and permeability, which contribute to stability and reduce the risk of large-scale landslides (Smith et al. 2023). In contrast, unconsolidated rocks, with high porosity and permeability, have loose internal structures and are susceptible to landslides under water flow. While carbonate rocks with more than 70% carbonate content typically have high permeability, good drainage, and high strength, rocks with 30–70% carbonate content are prone to landslides due to erosion, mixed composition-induced instability, and heterogeneous physical properties (Ehrenberg & Nadeau 2005).
Based on these findings, the study proposes the following recommendations for landslide prevention and control in the CYP region:
1.
1. Strengthen monitoring of 24-hour and 30-day cumulative rainfall to enable timely warnings and disaster prevention guidance for residents.
2.
2. Enhance infrastructure maintenance, particularly for roads, as construction significantly impacts landslide occurrence. It is crucial to strengthen infrastructure supervision and maintenance in high-risk areas to prevent secondary disasters caused by infrastructure damage.
3.
3. Implement proper land-use planning to avoid large-scale development activities in high-risk areas that could disrupt the natural environment.
While this study contributes to data optimization for landslide susceptibility quantification, some limitations remain. Due to data availability constraints, the landslide events include various types (e.g., slides, flows, and failures) with differing trigger mechanisms, which warrant further investigation (Hungr et al., 2014). Additionally, landslide records may be influenced by subjective factors and local economic conditions, introducing potential biases. Future research could benefit from integrating remote sensing, geological survey, and socioeconomic data to enhance data accuracy and reliability through data fusion and correction methods.
Conclusion
This study analyzed the spatiotemporal patterns and key factors influencing landslides in the CYP. By optimizing geological and land use classifications and integrating rainfall data at different resolutions, we improved the performance of the landslide susceptibility quantification. The results highlight that 24-hour cumulative rainfall, 30-day cumulative rainfall, human activities (such as road construction), and soil organic carbon content are critical factors affecting landslide occurrence. Landslide-prone areas are primarily concentrated in the eastern and southern regions of the plateau, where climatic conditions, human activities, and geological features are more favorable.
Based on these findings, this study recommends strengthening rainfall monitoring, with a focus on both daily and monthly rainfall dynamics. Additionally, enhancing the maintenance of roads and other infrastructure can help mitigate the impact of human activities in high-risk areas. Finally, it is also essential to prevent overdevelopment in vulnerable regions.
Abbreviations
CYP
The Central Yunnan Plateau
LR
logistic regression
SVM
support vector machine
RF
random forest
CNN
convolutional neural network
AUC
Area Under the Curve
PDP
partial dependence plot
Declarations
A
Data Availability
Some of the data presented and analysed in this study was collected from previously published papers and can be accessed through the sources referenced in the text. The additional novel data presented in this paper are available from the corresponding author on reasonable request.
Competing interests
The authors declare that they have no competing interests.
A
Fundings
The study is sponsored by the Yunnan Fundamental Research Projects (202302d4040076).
A
Author Contribution
All the authors have significantly contributed in this manuscript from the initial draft to the final shape. J.C. contributed to the writing of the original draft and played a significant role in colleciting the data, developing the methodology and conducting data analysis. Q.L. and C.H. was responsible for the collection of landslide and 24-hour cumulative rainfall data. X.H. involved in reviewing and editing the manuscript. Z.S. was responsible for the conceptualization, supervision, funding acquisition and reviewing of the manuscript.
A
Acknowledgement
The field work is supported by the Observatory on Biodiversity and Critical Zone in the Central Yunnan Plateau, the Ministry of Natural Resources (OBCYP-MNR).
References
Aleotti P (2004) A warning system for rainfall-induced shallow failures. Eng Geol 73(3–4):247–265. https://doi.org/10.1016/j.enggeo.2004.01.007
Bontemps N, Lacroix P, Larose E, Jara J, Taipe E (2020) Rain and small earthquakes maintain a slow- moving landslide in a persistent critical state. Nat Commun 11(1):780. https://doi.org/10.1038/s41467-020-14445-3
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159. https://doi.org/10.1016/S0031-3203(96)00142-2
Breiman L (2001) Random forests. Machine learning, 45: 5–32. https://doi.org/10.1023/A:1010933404324
Caine N (1980) The rainfall intensity-duration control of shallow landslides and debris flows. Geogr Annaler: Ser Phys Geogr 62(1–2):23–27. https://doi.org/10.1080/04353676.1980.11879996
Chae BG, Park HJ, Catani F, Simoni A, Berti M (2017) Landslide prediction, monitoring and early warning: a concise review of state-of-the-art. Geosci J 21:1033–1070. https://doi.org/10.1007/s12303-017-0034-4
Chen W, Zhang S (2021) GIS-based comparative study of Bayes network, Hoeffding tree and logistic model tree for landslide susceptibility modeling. CATENA 203:105344. https://doi.org/10.1016/j.catena.2021.105344
Chen Z, Tang J, Song D (2024) Modeling landslide susceptibility using alternating decision tree and support vector. Terrestrial, Atmospheric and Oceanic Sciences, 35(1): 12. https://doi.org/10.1007/s44195-024-00074-6
Chen XL, Zhou Q, Ran H, Dong R (2012) Earthquake-triggered landslides in southwest China. Nat Hazards Earth Syst Sci 12(2):351–363. https://doi.org/10.5194/nhess-12-351-2012
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. https://doi.org/10.1007/BF00994018
Crozier MJ, Glade T (2005) Landslide hazard and risk: issues, concepts and approach. Landslide Hazard Risk 1–40. https://doi.org/10.1002/9780470012659
Cruden D (1991) A simple definition of a landslide. Bull Eng Geol Environ 43(1). https://doi.org/10.1007/BF02590167
Cui F, Li B, Xiong C, Yang Z, Peng J, Li J, Li H (2022) Dynamic triggering mechanism of the Pusa mining-induced landslide in Nayong County, Guizhou Province, China. Geomatics. Nat Hazards Risk 13(1):123–147. https://doi.org/10.1080/19475705.2021.2017020
Dai FC, Lee CF, Ngai YY (2002) Landslide risk assessment and management: an overview. Eng Geol 64(1):65–87. https://doi.org/10.1016/S0013-7952(01)00093-X
A
Das S, Sarkar S, Kanungo DP (2023) A critical review on landslide susceptibility zonation: recent trends, techniques, and practices. Indian Himalaya Nat Hazards 115(1):23–72. https://doi.org/10.1007/s11069-022-05554-x
De Diego IM, Redondo AR, Fernández RR, Navarro J, Moguerza JM (2022) General performance score for classification problems. Appl Intell 52(10):12049–12063. https://doi.org/10.1007/s10489-021-03041-7
Dehnavi A, Aghdam IN, Pradhan B, Varzandeh MHM (2015) A new hybrid model using step-wise weight assessment ratio analysis (SWARA) technique and adaptive neuro-fuzzy inference system (ANFIS) for regional landslide hazard assessment in Iran. CATENA 135:122–148. https://doi.org/10.1016/j.catena.2015.07.020
Deng W, Li AN, Nan X (2015) Digital Mountain Map of China. SinoMaps, Beijing
Duan Y, Luo J, Pei X et al (2023) Co-Seismic Landslides Triggered by the 2014 Mw 6.2 Ludian Earthquake, Yunnan, China: Spatial Distribution, Directional Effect, and Controlling Factors. Remote Sens 15(18):4444. https://doi.org/10.3390/rs15184444
Dhamercherla S, Reddy Edla D, Dara S (2025) Cancer classification in high dimensional microarray gene expressions by feature selection using eagle prey optimization. Front Genet 16:1528810. https://doi.org/10.3389/fgene.2025.1528810
Duncan JM (1996) State of the art: limit equilibrium and finite-element analysis of slopes[. Journal of Geotechnical engineerin, 122(7): 577–596. https://doi.org/10.1061/(ASCE)0733-9410(1996)122:7(577)
Eberhardt E, Stead D, Coggan JS (2004) Numerical analysis of initiation and progressive failure in natural rock slopes—the 1991 Randa rockslide. Int J Rock Mech Min Sci 41(1):69–87. https://doi.org/10.1016/S1365-1609(03)00076-5
Ehrenberg SN, Nadeau PH (2005) Sandstone vs. carbonate petroleum reservoirs: A global perspective on porosity-depth and porosity-permeability relationships. AAPG Bull 89(4):435–445. https://doi.org/10.1306/11230404071
Ehteshami-Moinabadi M (2022) Properties of fault zones and their influences on rainfall-induced landslides, examples from Alborz and Zagros ranges. Environ Earth Sci 81(5):168. https://doi.org/10.1007/s12665-022-10283-2
Emberson R, Kirschbaum D, Stanley T (2021) Global connections between El Nino and landslide impacts. Nat Commun 12(1). https://doi.org/10.1038/s41467-021-22398-4
Esmaeilzadeh J, Ahangar AG (2014) Influence of soil organic matter content on soil physical, chemical and biological properties. Int J Plant Anim Environ Sci 4(4):244–252. https://api.semanticscholar.org/CorpusID:46540054
Fan X, Xu Q, Scaringi G, Zheng G, Huang R, Dai L, JuY (2019) The long runout rock avalanche in Pusa, China, on August 28, 2017: a preliminary report. Landslides 16:139–154. https://doi.org/10.1007/s10346-018-1084-z
Feng H, Jiang G, He Z et al (2024) Dynamic response and failure characteristics of a slope with bedrock subjected to earthquakes and rainfall in shaking table tests. Bull Eng Geol Environ 83(7):265. https://doi.org/10.1007/s10064-024-03748-0
A
FidanS, Tanyaş H, Akbaş A, Lombardo L, Petley DN, Görüm T (2024) Understanding fatal landslides at global scales: a summary of topographic, climatic, and anthropogenic perspectives. Nat Hazards 1–19. https://doi.org/10.1007/s11069-024-06487-3
Frigerio Porta G, Bebbington M, Xiao X, Jones G (2021) A statistical model for earthquake and/or rainfall triggered landslides. Front Earth Sci 8:605003. https://doi.org/10.3389/feart.2020.605003
Froude MJ, Petley DN (2018) Global fatal landslide occurrence from 2004 to 2016. https://doi.org/10.5194/nhess-18-2161-2018. Natural Hazards and Earth System Sciences, 18
Gariano SL, Guzzetti F (2016) Landslides in a changing climate. Earth Sci Rev 162:227–252. https://doi.org/10.1016/j.earscirev.2016.08.011
Goetz JN, Brenning A, Petschko H, Leopold P (2015) Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling. Comput Geosci 81:1–11. https://doi.org/10.1016/j.cageo.2015.04.007
Gómez D, García EF, Aristizábal E (2023) Spatial and temporal landslide distributions using global and open landslide databases. Nat Hazards 117(1):25–55. https://doi.org/10.1007/s11069-023-05848-8
Gregorutti B, Michel B, Saint-Pierre P (2017) Correlation and variable importance in random forests. Stat Comput 27:659–678. https://doi.org/10.1007/s11222-016-9646-1
Griffiths DV, Lane PA (1999) Slope stability analysis by finite elements. Geotechnique 49(3):387–403. https://doi.org/10.1680/geot.1999.49.3.387
Guzzetti F, Peruccacci S, Rossi M, Stark CP (2008) The rainfall intensity–duration control of shallow landslides and debris flows: an update. Landslides 5:3–17. https://doi.org/10.1007/s10346-007-0112-1
Handwerger AL, Huang MH, Fielding EJ, Booth AM, BürgmannR (2019) A shift from drought to extreme rainfall drives a stable landslide to catastrophic failure. Sci Rep 9(1):1569. https://doi.org/10.1038/s41598-018-38300-0
Hapfelmeier A, Hothorn T, Ulm K, Strobl C (2014) A new variable importance measure for random forests with missing data. Stat Comput 24:21–34. https://doi.org/10.1007/s11222-012-9349-1
Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Their Appl 13(4):18–28. https://doi.org/10.1109/5254.708428
Huang RQ (2009) Some catastrophic landslides since the twentieth century in the southwest of China. Landslides 6(1):69–81. https://doi.org/10.1007/s10346-009-0142-y
Huggel C, Clague JJ, Korup O (2012) Is climate change responsible for changing landslide activity in high mountains? Earth Surf Proc Land 37(1):77–91. https://doi.org/10.1002/esp.2223
Hürlimann M, Guo Z, Puig-Polo C, Medina V, Landslides (2022) 1–20. https://doi.org/10.1007/s10346-021-01775-6
Hungr O, Leroueil S, Picarelli L (2014) The Varnes classification of landslide types, an update. Landslides 11:167–194. https://doi.org/10.1007/s10346-013-0436-y
Iverson RM (2000) Landslide triggering by rain infiltration. Water Resour Res 36(7):1897–1910. https://doi.org/10.1029/2000WR900090
Janizadeh S, Bateni SM, Jun C al (2023) Potential impacts of future climate on the spatio-temporal variability of landslide susceptibility in Iran using machine learning algorithms and CMIP6 climate-change scenarios. Gondwana Res 124:1–17. https://doi.org/10.1016/j.gr.2023.05.003
Jia XL, Dai QM, Yang HZ (2019) Susceptibility zoning of karst geological hazards using machine learning and cloud model. Cluster Comput 22:S8051–S8058. https://doi.org/10.1007/s10586-017-1590-0
Kang J, Wan B, Gao Z et al (2024) Research on machine learning forecasting and early warning model for rainfall-induced landslides in Yunnan province. Sci Rep 14:14049. https://doi.org/10.1038/s41598-024-64679-0
Keefer DK (1984) Landslides caused by earthquakes. Geol Soc Am Bull 95(4):406–421
Kuhn M, Quinlan R (2023) C5. 0 decision trees and rule-based models. R package version 0.1. 8
Lacroix P, Dehecq A, Taipe E (2020a) Irrigation-triggered landslides in a Peruvian desert caused by modern intensive farming. Nat Geosci 13(1):56–60. https://doi.org/10.1038/s41561-019-0500-x
Lacroix P, Handwerger AL, Bièvre G (2020b) Life and death of slow-moving landslides. Nat Reviews Earth Environ 1(8):404–419. https://doi.org/10.1038/s43017-020-0072-8
Ladygin VM, Girina OA, Frolova YV (2023) The Petrophysical Properties and Strength of Extrusive Rocks Discharged by Bezymianny Volcano, Kamchatka. J Volcanol Seismolog 17(3):159–174. https://doi.org/10.1134/S0742046323700197
Lee RM, Shoshitaishvili B, Wood RL et al (2023) The meanings of the Critical Zone. Anthropocene 42:100377. https://doi.org/10.1016/j.ancene.2023.100377
Li Q, Li L, Zhang J, He X (2024) Spatio-Temporal Characteristics and Driving Mechanisms of Urban Expansion in the Central Yunnan Urban Agglomeration. Land 13(9):1496. https://doi.org/10.3390/land13091496
Li XB, Dong LJ, Zhao GY et al (2012) Stability analysis and comprehensive treatment methods of landslides under complex mining environment—a case study of Dahu landslide from Linbao Henan in China. Saf Sci 50(4):695–704. https://doi.org/10.1016/j.ssci.2011.08.049
Liu M, Xu B, Li Z et al (2023) Landslide susceptibility zoning in Yunnan Province based on SBAS-InSAR technology and a random forest model. Remote Sens 15(11):2864. https://doi.org/10.3390/rs15112864
Liu S, Wang L, Zhang W et al (2024) Physics-informed optimization for a data-driven approach in landslide susceptibility evaluation. J Rock Mech Geotech Eng 16(8):3192–3205. https://doi.org/10.1016/j.jrmge.2023.11.039
Liu Y, Deng Z, Wang X (2021) The effects of rainfall, soil type and slope on the processes and mechanisms of rainfall-induced shallow landslides. Appl Sci 11(24):11652. https://doi.org/10.3390/app112411652
Madhu D, Nithya GK, Sreekala S, Ramesh MV Regional-scale landslide modeling using machine learning and GIS: a case study for Idukki district, Kerala, India. Natural, Hazards (2024) 1–22. https://doi.org/10.1007/s11069-024-06592-3
Mainsant G, Larose E, Brönnimann C, Jongmans D, Michoud C, Jaboyedoff M (2012) Ambient seismic noise monitoring of a clay landslide: Toward failure prediction. J Geophys Research: Earth Surf 117(F1). https://doi.org/10.1029/2011JF002159
Ma GT, Hu XW, Yin YP, Pan YX (2018) Failure mechanisms and development of catastrophic rockslides triggered by precipitation and open-pit mining in Emei. Sichuan China Landslides 15(7):1401–1414. https://doi.org/10.1007/s10346-018-0981-5
A
Menard S (2001) Applied logistic regression analysis. Sage
Merghadi A, Yunus AP, Dou J et al (2020) Machine learning methods for landslide susceptibility studies: A comparative overview of algorithm performance. Earth-Sci Rev 207:103225. https://doi.org/10.1016/j.earscirev.2020.103225
Pacheco Quevedo R, Velastegui-Montoya A, Montalván-Burbano N, Morante-Carballo F, Korup O, DalelesRennó C (2023) Land use and land cover as a conditioning factor in landslide susceptibility: a literature review. Landslides 20(5):967–982. https://doi.org/10.1007/s10346-022-02020-4
Pearson K (1895) VII. Note on regression and inheritance in the case of two parents. proceedings of the royal society of London, 58(347–352), 240–242. https://doi.org/10.1098/rspl.1895.0041
Perrone A, Vassallo R, Lapenna V, Di Maio C (2008) Pore water pressures and slope stability: a joint geophysical and geotechnical analysis. Journal of Geophysics and Engineering, 2008, 5(3): 323–337. https://doi.org/10.1088/1742-2132/5/3/008
Petley DN, Hearn GJ, Hart A, Rosser NJ, Dunning SA, Oven K, Mitchell WA (2007) Trends in landslide occurrence in Nepal. Nat Hazards 43:23–44. https://doi.org/10.1007/s11069-006-9100-3
Rao WG, Shen ZH, Duan XW (2023) Spatiotemporal patterns and drivers of soil erosion in Yunnan, Southwest China: RULSE assessments for recent 30 years and future predictions based on CMIP6. Catena, 220(Part B). 106703. https://doi.org/10.1016/j.catena.2022.106703
Sameen MI, Pradhan B, Lee S (2020) Application of convolutional neural networks featuring Bayesian optimization for landslide susceptibility assessment. CATENA 186:104249. https://doi.org/10.1016/j.catena.2019.104249
Schulz WH, Smith JB, Wang G, Jiang Y, Roering JJ (2018) Clayey landslide initiation and acceleration strongly modulated by soil swelling. Geophys Res Lett 45(4):1888–1896. https://doi.org/10.1002/2017GL076807
Schulz WH, McKenna JP, Kibler JD, Biavati G (2009) Relations between hydrology and velocity of a continuously moving landslide—evidence of pore-pressure feedback regulating landslide motion? Landslides. 6:181–190. https://doi.org/10.1007/s10346-009-0157-4
A
SilhánK (2021) Dendrogeomorphology of Different Landslide Types: A Review. Forests 12(3):261. https://doi.org/10.3390/f12030261
Sim KB, Lee ML, Wong SY (2022) A review of landslide acceptable risk and tolerable risk. Geoenvironmental Disasters 9(1):3. https://doi.org/10.1186/s40677-022-00205-6
Smith HG, Neverman AJ, Betts H, Spiekermann R (2023) The influence of spatial patterns in rainfall on shallow landslides. Geomorphology 437:108795. https://doi.org/10.1016/j.geomorph.2023.108795
A
Smith TM, Sayers CM, Sondergeld CH (2009) Rock properties in low-porosity/low-permeability sandstones. Lead Edge 28(1):48–59. https://doi.org/10.1190/1.3064146
Song Y, Hu X, Shi X, Cui Y, Zhou C, Xu Y (2025) Hydrological proxy derived from InSAR coherence in landslide characterization. Remote Sens Environ 322:114712. https://doi.org/10.1016/j.rse.2025.114712
Song YY, Ying LU (2015) Decision tree methods: applications for classification and prediction. Shanghai Archives Psychiatry 27(2):130. https://doi.org/10.11919/j.issn.1002-0829.215044
Tan Y, Cao J, Xiang W, Xu WZ, Tian JW, Gou Y (2023) Slope stability analysis of saturated–unsaturated based on the GEO-studio: a case study of Xinchang slope in Lanping County, Yunnan Province, China. Environ Earth Sci 82(13):322. https://doi.org/10.1007/s12665-023-11006-x
Tate RF (1954) Correlation between a discrete and a continuous variable. Point-biserial correlation[J]. The Annals of mathematical statistics, 1954, 25(3): 603–607
Tehrani FS, Calvello M, Liu Z, Zhang L, Lacasse S (2022) Machine learning and landslide studies: recent advances and applications. Nat Hazards 114(2):1197–1245. https://doi.org/10.1007/s11069-022-05423-7
Terlien MTJ (1998) The determination of statistical and deterministic hydrological landslide-triggering thresholds. Environ Geol 35(2):124–130. https://doi.org/10.1007/s002540050299
Van Westen CJ, Castellanos E, Kuriakose SL (2008) Spatial data for landslide susceptibility, hazard, and vulnerability assessment: An overview. Eng Geol 102(3–4):112–131. https://doi.org/10.1016/j.enggeo.2008.03.010
Varnes DJ (1958) Landslide types and processes. Landslides Eng Pract 24:20–47
Wang J, Jaboyedoff M, Chen G et al (2024) Landslide susceptibility prediction and mapping using the LD-Bi LSTM model in seismically active mountainous regions. Landslides 21:17–34. https://doi.org/10.1007/s10346-023-02141-4
Wang X, Zhang L, Wang S, Lari S (2014) Regional landslide susceptibility zoning with considering the aggregation of landslide points and the weights of factors. Landslides 11:399–409. https://doi.org/10.1007/s10346-013-0392-6
Wang Y, Xie Y, Liu X et al (2022) Climate and human induced 2000-year vegetation diversity change in Yunnan, southwestern China. Holocene 32(11):1327–1339. https://doi.org/10.1177/09596836211041730
Wu W, Zhang Q, Singh VP, Wang G, Zhao J, Shen Z, Sun S (2022) A data-driven model on Google Earth Engine for landslide susceptibility assessment in the Hengduan Mountains, the Qinghai–Tibetan Plateau. Remote Sens 14(18):4662. https://doi.org/10.3390/rs14184662
Xu X, Wen X, Zheng R et al (2003) Pattern of latest tectonic motion and its dynamics for active blocks in Sichuan-Yunnan region, China. Sci China Ser D: Earth Sci 46(Suppl 2):210–226. https://doi.org/10.1360/03dz0017
Xuan K, Li X, Zhang J, Jiang Y, Ma B, Liu J (2023) Effects of organic amendments on soil pore structure under waterlogging stress. Agronomy 13(2):289. https://doi.org/10.3390/agronomy13020289
Yang H, Yang T, Zhang S et al (2020a) Rainfall-induced landslides and debris flows in Mengdong Town, Yunnan Province, China. Landslides 17(4):931–941
Yang H, Wei F, Ma Z, Guo H, Su P, Zhang S (2020b) Rainfall threshold for landslide activity in Dazhou, southwest China. Landslides 17:61–77. https://doi.org/10.1007/s10346-019-01270-z
A
Yin YP, Sun P, Zhang M, Li B (2011) Mechanism on apparent dip sliding of oblique inclined bedding rockslide at Jiweishan. Chongqing China Landslides 8(1):49–65. https://doi.org/10.1007/s10346-010-0237-5
Yu XL, Ma XY, Gu SX, Li J (2013) Spatial and temporal changes of precipitation in central Yunnan plateau for the last half century. Resour Environ Yangtze Basin 22(S1):96–102
A
Yunnan Provincial local Chronicles Compilation Committee (1998) Local Chronicles of Yunnan Province. Yunnan People's Publishing House
Zhang D, Jindal D, Roy N et al (2024) Enhancing landslide susceptibility mapping using a positive-unlabeled machine learning approach: a case study in Chamoli, India. Geoenvironmental Disasters 11(1):21. https://doi.org/10.1186/s40677-024-00281-w
Zhang S, Zhao L, Delgado-Tellez R et al (2018) A physics-based probabilistic forecasting model for rainfall-induced shallow landslides at regional scale[J]. Nat Hazards Earth Syst Sci 18(3):969–982. https://doi.org/10.5194/nhess-18-969-2018
A
Zhang F, Huang X (2018) Trend and spatiotemporal distribution of fatal landslides triggered by non-seismic effects in China. Landslides 15(8):1663–1674. https://doi.org/10.1007/s10346-018-1007-z
Zhang Y, Zhang J, Lei WZ, Shi JS, Wang XL, Xiong TY (2007) Discussion on environmental geological problems in the areas from Southwest China to Southeast Asia. Earth Sci Front 14(6):24–30. https://doi.org/10.1016/S1872-5791(08)60002-0
Zhao JQ, Zhang Q, Wang D, Wu W, Yuan R (2022) Machine Learning-Based Evaluation of Susceptibility to Geological Hazards in the Hengduan Mountains Region, China. Int J Disaster Risk Sci 13(2):305–316. https://doi.org/10.1007/s13753-022-00401-w
Zhi ZM, Liu FG, Zhou Q, Xia XS, Chen Q (2021) Evaluation of geological hazards susceptibility based on watershed units: A case study of the Changdu City, Tibet. Chin J Geol Hazard Control 34(1):139–150. https://doi.org/10.16031/j.cnki.issn.1003-8035.202111026
Zhou W, Zhou Y, Liang S et al (2025) A new framework for landslide susceptibility mapping in contiguous impoverished areas using machine learning and catastrophe theory. Scientific Reports, 15(1): 10620. https://doi.org/10.1038/s41598-025-88070-9 Supplementary
A
Fig. S2
Comprehensive ranking of landslide influencing factors based on cross-validation of the RF model (LITH: Lithology, CLAY: Soil clay content, DTF: Distance to fault, DTRI: Distance to river, DTRO: Distance to road, ElE: Elevation, LAND: Land use, NDVI: Normalized difference vegetation index, OC: Soil organic matter content, R1: 24-hour cumulative rainfall, R30: 30-day cumulative rainfall, SLO: Slope)
Click here to Correct
Total words in MS: 6483
Total words in Title: 8
Total words in Abstract: 261
Total Keyword count: 5
Total Images in MS: 9
Total Tables in MS: 3
Total Reference count: 102