Quantitative Study on Flow-Driving Factors in the Lower Yellow River Bankfull Discharge Based on MIC-RF
Jiabei Li 1 Email
Dangwei Wang 2 Email
A
Ke Ni 2✉
Jian Chen 1 Email
Jianguo Chen 2 Email
Yuhang Dong 2
Zijing Zhou 2
Jiabei Li. 1 Email Email Email
1
A
A
North China University of Water Resources and Electric Power 450046 Zhengzho u China
2 China Institute of Water Resources and Hydropower Research Beijing 10 0048 China
3 Water Security and Key Laboratory of Sediment Science and Northern Ri ver Training the Ministry of Water Resources 100048 Beijing China
Jiabei Li1,Dangwei Wang2,4,Ke Ni2,4,*,Jian Chen1,Jianguo Chen2,4 ,Yuhang Dong2,4and Zijing Zhou2,4
1 North China University of Water Resources and Electric Power, Zhengzhou 450046, China;
Ljb08241010@163.com(Jiabei Li.);chenjian@ncwu.edu.cn(Jian Chen);
2 China Institute of Water Resources and Hydropower Research, Beijing 100048,China;wangdw17@126.com(Dangwei Wang);chenjg@iwhr.com(Jianguo Chen); 98347513@qq.com(Yuhang Dong); 370794215@qq.com(Zijing ZHOU);
3 Water Security and Key Laboratory of Sediment Science and Northern River Training, the Ministry of Water Resources, Beijing 100048,China
* Correspondence: 15515340923@163.com(Ke Ni)
Abstract
This study employs a two-stage Maximum Information Coefficient-Random Forest (MIC-RF) model based on discharge, sediment concentration, sediment transport rate, and sediment grain size data from four hydrological stations in the lower reaches of the Yellow River between 1960 and 2022. The model quantifies the contribution of various factors to the interannual variation in bankfull discharge.Results indicate that peak flood discharge (importance 0.300) and Annual Runoff Volume (0.272) are the dominant factors; Maximum sediment transport rate (0.135), median grain size of suspended sediment (0.126), and sediment Inflow Coefficient (0.122) exert indirect influence by regulating the sediment transport-deposition balance; Bed Sediment Median Grain Size (0.045) has a negligible effect. These findings provide quantitative basis for maintaining main channel flow capacity and optimizing water-sediment management.
Keywords:
Bankfull discharge variation
Maximum information content
Random forest
Lower reaches of the Yellow River
1 Introduction
The floodplains in the lower reaches of the Yellow River cover over 80% of the area but carry only about 20% of the flow. The remaining 80% of the flow still relies on the main channel for conveyance. Therefore, the bankfull discharge serves as a crucial indicator of the cross-sectional flow capacity.The bankfull discharge is the flow rate at which the main channel water level is level with the barblip and the flow is about to overflow the banks. It directly measures the main channel's flow capacity and serves as a key indicator for determining whether flood discharge and sediment transport functions are operating normally12.
However, the response of the main channel's flow capacity to hydrological and sedimentary variations still relies primarily on empirical statistical frameworks.Previous studies have largely assumed that the riverbed is in a state of equilibrium, establishing empirical relationships between the bankfull discharge and geometric morphology of the lower reaches of the Yellow River's floodplain and the concurrent inflow of water and sediment35.The bankfull discharge increases significantly with rising flood season water volume and exhibits a significant negative correlation with the sediment load coefficient68.The growth of the river mouth reference plane is suppressed by upstream erosion effects912.Both annual water volume and peak flood discharge during the flood season into a regression model, this study proposes a moving average-power function empirical formula based on observed data from 1950 to 2003. This formula provides a rapid estimation basis for reservoir sediment flushing plans13.The aforementioned studies collectively indicate that bankfull discharge is not determined by a single factor, but rather by the combined influence of multiple factors undergoing long-term trend and fluctuation changes. However, the interactive effects among these factors and their relative importance have not yet been systematically quantified in existing research and require further validation.
Since the bankfull discharge is essentially a function of the main channel morphology, and the geometric characteristics of the main channel (such as width-to-depth ratio and cross-sectional area) are constrained by the lagged adjustment effects of historical sediment transport conditions— — earlier hydrodynamic processes indirectly govern the formation of present-day bankfull discharge by shaping riverbed boundaries (e.g., sediment deposition, gradient, roughness coefficient). Consequently, its calculation requires methods that integrate time-lag effects with hydrodynamic-sediment synergies, such as: moving average methods, flood frequency analysis methods14.Given the lag in the bankfull discharge response, traditional studies have predominantly employed moving average methods or flood frequency analysis to eliminate the interference from cumulative effects.However, the selection of sliding window length is subjective and makes it difficult to isolate multi-factor coupling effects. To address this, this paper proposes a novel approach: replacing absolute values with “interannual variation quantities.” It pairs annual variation sequences of bankfull discharge with those of concurrent water-sediment factors, using differential forms to mitigate cumulative lag interference. Subsequently, the Maximum Information Coefficient (MIC) is employed to screen factors significantly correlated with variation quantities, while Random Forest quantifies their relative importance. This enables the identification of dominant mechanisms governing instantaneous bankfull discharge fluctuations within a collaborative framework.
2 Research Area Overview
This study section is located in the lower reaches of China's Yellow River basin, spanning geographically from the outlet of Mengjin Gorge to the estuary at Lijin. The main stem extends approximately 750 km in length with a total elevation drop of 94 m, covering a catchment area of 2.3×10⁴ km²15.Detailed and abundant measured hydrological and sediment data exist for the lower reaches of the Yellow River. The main hydrological stations along the downstream mainstem section from Xiaolangdi to Lijin include Xiaolangdi, Huayuan Kou, Jiahetan, Gaocun, Sunkou, Aishan, Luokou, and Lijin. The sections from Xiaolangdi to Huayankou and from Huayankou to Gaocun represent typical meandering river segments, while the Gaocun to Aishan stretch is transitional. The Aishan to Lijin segment exhibits a sinuous pattern. The four hydrological stations—Huayankou (HYK), Gaocun (GC), Aishan (AS), and Lijin (LJ)—serve as boundary points delineating different river morphology segments in the lower reaches. The study selected these four national basic hydrological stations as representative cross-sections. They control 98% of the total downstream watershed area and possess continuous records since 1960 for water level, discharge, sediment concentration, sediment transport rate, and bed and suspended sediment particle size data. This data set meets the requirements for analyzing the long-term evolution of bankfull discharge and its driving factors.
A
Fig. 1
Sketch of the Lower Yellow River
Click here to Correct
To quantitatively characterize the response of the main channel's flow capacity to changes in hydrological and sediment conditions, it is necessary to first establish a long-term, comparable hydrological-sediment and geometric database. Hydrological and sediment conditions during the flood season are the dominant factors determining bed erosion and sedimentation in the lower reaches of the Yellow River. Existing studies have predominantly used flood-season hydrological and sediment metrics to characterize annual hydrological and sediment conditions1618.During non-flood seasons, low flow rates and weak sediment transport render their contributions negligible. Therefore, this study adopts annual hydrological and sedimentary conditions as the analysis unit. Given the significant differences in the absolute values and dimensions of various factors, direct modeling may obscure the effects of minor factors due to vast differences in magnitude. In contrast, interannual variation allows indicators to be uniformly expressed as “amplitude of change” and partially incorporates lagged response information, making it more suitable for weighting and quantification.Accordingly, this study collected and compiled daily discharge, sediment content, and sediment transport rates from the Yellow River Conservancy Commission's compiled Yellow River Hydrological Yearbook19 for the period 1960–2022, and calculated the sediment yield coefficient based on these data. The median particle size D₅₀ for bedload and suspended sediment was derived from quarterly sampling data at major cross-sections during the same period. Suspended sediment D₅₀ was measured using a laser particle size analyzer, while bedload D₅₀ was obtained via sieve analysis per GB/T 50123 − 2019. With a missing data rate < 2%, linear interpolation was applied followed by magnitude-frequency consistency testing.
A
Fig. 2
Actual Water-Sediment Data Chart
Click here to Correct
2 Analysis Method of Factors Affecting Bankfull Discharge Based on the MIC-Random Forest Model
To balance the breadth of nonlinear relationships among variables with the accuracy of prediction tasks, this paper adopts a two-stage framework combining the Maximum Information Coefficient (MIC) and Random Forest (RF) models. Specifically, the first stage measures nonlinear dependencies using the Maximum Information Coefficient (MIC), while the second stage evaluates factor importance based on the Gini impurity reduction (MDG) achieved by Random Forest (RF).
2.1 Maximum Information Coefficient Prescreening
To effectively capture nonlinear and non-functional dependencies among variables, this study employs MIC20. MIC is an information-theoretic metric capable of broadly measuring the strength of dependencies between variables, particularly suited for capturing nonlinear relationships and even complex associations formed by the superposition of multiple functions.Unlike traditional metrics such as Pearson's correlation coefficient, which are only applicable to linear relationships, MIC can identify a broader range of variable relationships. The calculation principle of MIC is based on mutual information, with its core idea being to find the optimal grid partitioning scheme that maximizes the normalized mutual information computed under that partition. For two variables x and y, MIC is defined as:
1
In the equation, x represents any feature value within the feature set (e.g., peak flood discharge, Annual Runoff Volume, maximum sediment transport rate, annual average median grain size of suspended sediment, Sediment Inflow Coefficient, annual average Bed Sediment Median Grain Size), y denotes the change in bankfull discharge, and
represents the joint probability density function of x and y. p(x) and
are the marginal density functions of x and y, respectively, and n is the sample size.D is a point set composed of two variables, which divides the two variables in dataset D into two grids, r and s. This division splits the range of the sum into two parts. The resulting table calculates mutual information within each grid division and selects the maximum mutual information value among all division methods. This value is adopted as the mutual information value for that division, i.e.,
2
In the formula, represents the maximum mutual information of D under partition G, denotes the maximum normalized MI value obtained from different partitions, forms the feature matrix with maximum information count, and represents the upper bound value of the grid partition.
The optimal feature set is obtained by calculating the MIC values between each feature and state variable. Features that do not satisfy formula (3) will be removed.
3
In the formula, σ represents the feature selection threshold, which determines which variables can be included in subsequent modeling.
2.2 Random Forest: A Critical Review
The Random Forest algorithm21, proposed by Breiman in 2001, is based on classification trees for modeling. Its core lies in effectively balancing algorithmic errors by integrating multiple learners.The model uses decision trees as its base learners, integrating multiple decision trees. These trees classify the samples input into the model, aggregate their classification results, and determine the model's final output through a voting mechanism.The training of each decision tree fundamentally involves reducing data “impurity” (measured here by Gini impurity) through feature splitting. Within a single tree, a feature's importance is directly tied to the “reduction in impurity after splitting.” Starting from the root node, the initial dataset's disorder is represented by Gini impurity (higher values indicate more “disordered” data and greater category dispersion).Since a random forest consists of k trees, each tree may assign different splitting values to feature X. Therefore, the results from multiple trees must be aggregated. The random forest algorithm delivers accurate predictions and finds extensive application in classification, regression, survival analysis, and other fields. Naturally, it can also be applied to assess the importance of variables. Its principle is illustrated in Formula 4.
4
Click here to Correct
In the formula, represents the proportion of the i-th category. In short, this formula quantifies the “balance of category distribution” by calculating “1 minus the sum of the squares of the probabilities of all categories”: the more balanced the distribution (e.g., when all category probabilities are equal), the smaller the Gini coefficient; the more unbalanced the distribution (e.g., when a single category accounts for an extremely high proportion), the larger the Gini coefficient.
2.3 Cooperative Analysis of the MIC-RF Two-Stage Model
This study employs a two-stage MIC-RF model to achieve a comprehensive and reliable quantification of factors influencing bankfull discharge. In the first stage, MIC performs a global scan of the original high-dimensional features by measuring extensive nonlinear dependencies among variables, thereby preliminarily screening candidate factors potentially associated with the interannual variation sequence (annual difference) of bankfull discharge.This process achieves dimensionality reduction and noise removal in the data, thereby mitigating the risk of overfitting in subsequent models. In the second stage, the RF model quantifies the actual contribution of each factor to the specific task of predicting changes in bankfull discharge by calculating the Mean Decrease in Gini (MDG) based on the candidate factors selected through MIC screening. The ensemble learning mechanism of the RF model ensures robust results and interpretable factor rankings.
The MIC-RF collaborative framework offers distinct advantages: it comprehensively captures nonlinear and non-functional dependencies among variables while precisely evaluating each factor's contribution to specific prediction tasks. This ensures robust and scientifically sound feature ranking, effectively addressing the limitations of traditional methods in quantifying complex coupling relationships.
Fig. 3
Maximum Information Coefficient-Factor Influence Analysis Model Flowchart for Random Forests
Click here to Correct
3 Quantitative Calculation of Factors Affecting Bankfull Discharge in Flat-Bed Rivers
3.1 Model Establishment
As shown in Table 1, MIC is continuously distributed within the interval [0, 1]: MIC > 0.8 indicates extremely strong association, 0.6 < MIC < 0.8 denotes strong association, 0.4 < MIC < 0.6 signifies moderate association, 0.2 < MIC < 0.4 represents weak association, while MIC < 0.2 is classified as extremely weak or no association 2223.This threshold system is not only applicable to linear relationships but also capable of capturing arbitrary nonlinear and non-functional dependencies, thereby demonstrating greater robustness in the screening of high-dimensional water-sediment characteristics.By traversing an r×s grid partition across the entire sample and selecting the maximum normalized mutual information, we ensure that the unbiased estimates obtained not only reflect the strength of associations between variables but also provide a reliable set of candidate inputs for subsequent random forest modeling.As shown in Table 1, when MIC ≤ 0.2, the strength of nonlinear associations between variables falls into the “extremely weak or absent” range. This threshold has been consistently adopted in previous hydrological and brain network studies, effectively eliminating environmental noise while preserving variables reflecting underlying mechanisms. Therefore, this study sets σ = 0.2 as the prescreening threshold.
To identify influencing factors with significant explanatory power for changes in bankfull discharge, it is necessary to calculate the MIC between each candidate variable and the bankfull discharge change sequence. The specific calculation steps are as follows:
(1)Data Preparation: Ensure that input variables x and y are both numerical data. If categorical data is present, perform encoding conversion first. Missing values can be addressed through linear interpolation, deletion of samples containing missing values, or imputation using the mean or median.Since MIC is sensitive to sample size, provide at least 50 paired samples (xi, yi). The larger the sample size, the more reliable the results.
(2)Initialize the MINE model: MIC calculations are based on the MINE algorithm framework. During initialization, two key parameters must be specified: alpha (typically set between 0.6 and 0.8), which controls the refinement level of the grid partitioning (higher alpha values result in finer subdivisions and increased computational complexity); and c (typically set between 10 and 20), which limits the maximum number of grid cells to balance computational efficiency and accuracy.
(3)Calculate mutual information correlation statistics: For each pair (x, y), compute the joint distribution characteristics of both variables. This process essentially involves estimating the joint probability density and marginal probability density of x and y through grid-based binning.Divide the range of x into m intervals and y into n intervals (where m and n are controlled by alpha and c). Count the number of samples within each grid and calculate the joint probability.
(4)Extracting MIC Values: Based on joint probability and marginal probability, calculate mutual information for each grid partitioning scheme. MIC is defined as the normalized maximum mutual information across all possible grid partitions, where normalization ensures results fall within the range [0, 1].
(5)Results Validation and Application: If sample size permits, multiple random samples may be used to calculate the mean and standard deviation of MIC values, thereby assessing result stability. By ranking MIC values across multiple features, the feature most strongly associated with y can be identified.
Following the steps outlined above, this paper calculated the MIC between each candidate variable and the bankfull discharge variation sequence, with the results presented in Table 2.
Table 1
The correlation degree between two variables under different MIC values
序号
MIC
Relevance
1
(0.8,1]
Extremely strong correlation
2
(0.6,0.8]
Strong correlation
3
(0.4,0.6]
Moderate correlation
4
(0.2,0.4]
weak correlation
5
(0,0.2]
Extremely weak correlation
6
0
Completely unrelated
Table 2
The MIC values between each candidate variable and the bankfull discharge change sequence
Feature Name
MIC value
Peak flood discharge
0.523
Annual Runoff Volume
0.507
Sediment Inflow Coefficient
0.445
Median grain size of annual average suspended sediment
0.395
Bed Sediment Median Grain Size
0.384
Maximum Sediment Concentration
0.352
Subsequently, a random forest model was constructed on the filtered feature subset to further quantify the actual contribution of each feature to changes in bankfull discharge.The parameter settings for the random forest model are as follows: The number of trees is set to 500. A large number of trees reduces the model's variance, minimizes the risk of overfitting, and yields more stable and reliable results.The number of features considered at each split is set to the square root of the total number of features. This configuration introduces appropriate randomness during the construction of each decision tree, thereby preventing overfitting while preserving the model's generalization capability.Specifically, it enables the model to better capture key features while reducing reliance on noisy features when confronted with diverse datasets. These parameters underwent meticulous tuning and validation during model training to ensure optimal performance on the current dataset.Practice has demonstrated that such parameter configurations enable the random forest model to exhibit excellent classification and regression performance when handling complex hydrological and sedimentary influence factor data, providing robust support for the quantitative analysis of bankfull discharge influencing factors.The importance levels of the indicator subsets obtained from the random forest model were normalized, and the sorted results are shown in Table 3.
Table 3
Importance of Each Influencing Factor
Feature Name
Normalized Importance
Peak flood discharge
0.323
Annual Runoff Volume
0.313
Median grain size of annual average suspended sediment
0.153
Sediment Inflow Coefficient
0.102
Maximum Sediment Concentration
0.055
Bed Sediment Median Grain Size
0.049
Based on the MIC-RF two-stage model, the MIC values (Table 2) and RF importance scores (Table 3) were calculated for each candidate variable relative to the interannual variation sequence of bankfull discharge.It should be noted that the MIC value primarily measures the maximum dependency strength between variables, while RF importance reflects a factor's actual contribution to the prediction task.Therefore, differences may exist in their respective rankings. Preliminary MIC results indicate that all factors exhibit a certain degree of nonlinear association with changes in bankfull discharge, while the RF importance ranking more directly reveals the relative contribution of each factor within the bankfull discharge prediction model.This study will integrate the results from both phases, primarily relying on RF ranking with MIC as a supplementary reference, to elucidate the underlying mechanisms.
3.2 Analysis of Computational Results
Figure 3 illustrates the relative importance of various influencing factors on bankfull discharge variation. The random forest importance ranking indicates that peak flood discharge (32.3%) is the primary controlling factor for bankfull discharge, with its magnitude directly determining the instantaneous scouring rate in narrow, deep river channels.The median grain size of suspended sediment (11.9%), sediment Inflow Coefficient (11.2%), and Bed Sediment Median Grain Size (10%) form the second tier. By regulating the sediment transport-deposition balance and bed resistance to erosion, they significantly amplify or attenuate the geometric effects of hydrodynamic conditions. The evolution of bankfull discharge is jointly driven by instantaneous hydrodynamic forces and long-term cumulative effects.
3.3 Model Validation and Mechanism Interpretation Based on Scatter Plot Pattern Diagnosis
To validate whether the factor weights identified by the MIC-RF model authentically reflect the sediment-water-bed response mechanism in the lower Yellow River, cross-validation is required across three dimensions: dynamic symmetry, sediment threshold effects, and boundary memory signals. This aims to demonstrate that the importance rankings from random forests not only reflect statistical superiority but also mutually corroborate established theories such as sediment transport asymmetry and sediment-water coordination.
Figure 4 reveals a strong positive correlation between peak flood discharge and bankfull discharge variation. The horizontal axis represents peak flood magnitude, with data points distributed along a distinct ascending band, indicating higher scour efficiency in the main channel during larger floods. Smaller floods exhibit greater vertical dispersion of points, reflecting irregular declines in bankfull discharge. This morphological difference visually confirms the asymmetry of scour and fill processes.Further observation reveals that changes in bankfull discharge predominantly fall within the range of -1000 to 0 m³/s. This indicates insufficient flow energy to effectively scour the main channel, potentially leading to channel shrinkage and reduced bankfull discharge due to sediment accumulation.
Fig. 4
Relationship between Peak Flood Discharge and Bankfull Discharge Change
Click here to Correct
Figure 5 shows that Annual Water Volume and bankfull discharge exhibit an overall positive correlation, but their distribution patterns display heteroscedastic characteristics. When water volume falls below 60 billion m³, the scatter plot exhibits lateral compression and vertical clustering.Due to the stable scouring efficiency within this flow range, the marginal impact of flow variations themselves is mitigated. The complex response mechanisms underlying bankfull discharge evolution are highlighted by fluctuations in Sediment Inflow Coefficient, delayed bed responses, spatial heterogeneity, and dynamic disturbances from human activities.The significant variation in bankfull discharge under identical annual water volumes stems from the fact that the response of bankfull discharge to riverbed adjustments requires a 4–5 year moving average of water volumes rather than a single year's data. This reflects the cumulative process of riverbed adjustment.
Fig. 5
Relationship between Annual Water Volume and Bankfull Discharge Change
Click here to Correct
Figure 6 clearly demonstrates the threshold regulation of the sediment Inflow Coefficient on bankfull discharge variation. When the sediment Inflow Coefficient ξ < 0.02 kg·s/m⁶, the variation in bankfull discharge is predominantly positive. This indicates strong sediment-carrying capacity of the flow, with channel scouring dominating, leading to increased bankfull discharge.When the sediment Inflow Coefficient ξ exceeds 0.04 kg·s/m⁶, all data points fall within the negative range, indicating that bankfull discharge will inevitably decline. This threshold is broadly comparable to those proposed in existing studies but exhibits slight differences. This discrepancy stems from the use of interannual variation rather than absolute values in this study, which attenuates long-term cumulative effects and shifts the threshold to a higher range.The random forest model assigns it a weight exceeding 10%, essentially reflecting the net effect concentrated in years of water-sediment imbalance rather than a linear correlation throughout the entire process. This nonlinear characteristic is precisely what traditional regression models struggle to capture.
Fig. 6
Relationship between Sediment Coefficient and Bankfull Discharge Change
Click here to Correct
Figure 7 shows a negative correlation between the median grain size of suspended sediment and changes in bankfull discharge. Coarser grain size increases sediment settling velocity, making it more likely to deposit under equivalent flow conditions and thereby reducing channel conveyance capacity. This negative effect is particularly pronounced in meandering river segments, where flow shear stress is already near critical levels, meaning even slight increases in grain size can easily disrupt the sediment transport-deposition equilibrium.In meandering river sections, where channel constraints are stronger, the influence of grain size variation is relatively diminished. The maximum information coefficient captures this moderately strong correlation, while the random forest ranks its importance after sediment transport rate. This indicates that in interannual predictions, the regulatory role of grain size manifests primarily as a fine-tuning of hydrodynamic conditions, with its independent contribution being weaker than that of hydrodynamic factors.
Fig. 7
Relationship between Annual Median Suspended-Sediment Grain Size and Bankfull Discharge Change
Click here to Correct
After entering the high sediment concentration range (Maximum Sediment Concentration exceeding 100 kg/m³) in Fig. 8, most data points exhibit a negative distribution (indicating a decreasing trend in bankfull discharge). Short-term fluctuations in Maximum Sediment Concentration primarily reflect instantaneous changes in the sediment-carrying capacity during flood peaks (controlled by short-term hydrological factors such as flow velocity and water depth). The corresponding signals of bankfull discharge variation are often obscured by these transient disturbances: This is particularly evident in the low sediment concentration range (0–100 kg/m³), where sediment transport is predominantly in an unsaturated/near-saturated state. Flow variations tend to fluctuate around baseline values, resulting in low signal discernibility and minimal practical contribution to short-term bankfull discharge forecasting.
Fig. 8
Relationship between Annual Maximum sediment transport rate and Bankfull Discharge Change
Click here to Correct
Figure 9 illustrates the random distribution pattern of Bed Sediment Median Grain Size, consistent with the less than 5% weight assigned by the random forest model. On an interannual scale, bed sand grain size exhibits minimal variation, falling within the range of measurement error and contributing little to short-term predictions.However, the maximum information coefficient indicates a moderate correlation, which actually captures the accumulated signals of coarsening or refining trends over multiple years. Specifically, although bed sand changes are not apparent in years of intense scouring, the cumulative adjustments over the subsequent three to five years show a weak correlation with changes in bankfull discharge. This discrepancy precisely highlights the advantage of the two-stage model: it preserves information about long-term mechanisms without compromising the accuracy of short-term predictions.
Fig. 9
Relationship between Annual Median Bed-Material Grain Size and Bankfull Discharge Change
Click here to Correct
The six-figure correlation study validates the MIC-RF model: Peak discharge and water volume form the dual core of dynamics-resilience, Sediment Inflow Coefficient and suspended sediment size create sediment negative feedback, Bed sediment size represents long-term boundary memory, While maximum sediment transport rate reflects water-sediment matching efficiency.However, this study still faces four types of uncertainties that warrant careful consideration. First, the interannual difference method only partially eliminates lag effects, and residual lags may slightly overestimate the weights of factors such as flood peaks. Second, the combined analysis of data from four stations smooths out response differences across distinct river types, resulting in spatially compromised weighting outcomes. Third, the variation in bed sediment grain size approaches measurement error thresholds, potentially introducing noise components into the correlation analysis. Fourth, the scarcity of samples during years with exceptionally high sediment influx limits the applicability of identified threshold effects under more extreme conditions. Future research may consider segment-specific modeling and incorporate memory functions to characterize long-term lag effects, thereby enhancing the precision and spatial resolution of mechanism explanations.
4 Conclusion
Based on measured hydrological and sediment data from four hydrological stations in the lower reaches of the Yellow River between 1960 and 2022, this study employs a two-stage model combining maximum information coefficient and random forest analysis. This approach enables nonlinear screening and quantitative assessment of the contribution weights of factors influencing interannual variations in bankfull discharge. The following conclusions are drawn:
(1)Core Drivers: Peak flood discharge and Annual Runoff Volume are the core governing factors. The former exhibits asymmetric scouring, while the latter provides the boundary for morphological stability. Together, they form the foundation of resilience and elasticity for the main channel's flow capacity. Peak flood discharge directly shapes the channel morphology through instantaneous scouring, whereas Annual Runoff Volume maintains sediment-water conditions during non-flood seasons, consolidating scouring outcomes. Their synergy determines the overall flow capacity of the main channel.
(2)Key regulating factors—maximum sediment transport rate, median grain size of suspended sediment, and Sediment Inflow Coefficient—exert significant regulatory effects on the formation of bankfull discharge by altering the direction and intensity of scouring and deposition. Particularly in river sections with poor sediment-water coordination, the influence weight of these sediment characteristics becomes even more pronounced.
(3)Secondary influencing factor: Due to its evolutionary lag, Bed Sediment Median Grain Size exerts a relatively minor effect on interannual bankfull discharge regulation, primarily serving as a long-term boundary constraint.
(4)This study systematically quantified the key factors influencing the bankfull discharge in the lower Yellow River and their relative contributions, deriving a formula for calculating changes in bankfull discharge rates. This provides a scientific basis for understanding the mechanisms behind its dynamic variations. The findings recommend that regulation strategies should prioritize three key objectives: first, maintaining adequate scouring potential for flood peaks; second, ensuring suitable Annual Runoff Volume; and third, dynamically adjusting water-sediment management plans based on sediment characteristics. This approach effectively preserves the main channel's flow capacity, providing decision-making references for ecological conservation and sustainable development in the lower reaches of the Yellow River.
Cover Letter
* Correspondence
15515340923@163.com(Ke Ni.)
A
Author Contribution
Conceptualization, Jiabei Li; Data curation, Jiabei Li, Yuhang Dong. and Zijing Zhou; Formal analysis, Dangwei Wang. Ke Ni and Jiabei Li; Project administration, Jianguo Chen,Jian Chen; Writing—original draft, Jiabei Li.; Writing—review and editing, iabei Li, Yuhang Dong. and Zijing Zhou; All authors have read and agreed to the published version of the manuscript.
A
Funding:
National Key Research and Development Program of China (2023YFC3208602).
A
Data Availability
The original data presented in this study are included in the main text of the manuscript. For further inquiries, please contact the corresponding author.
Explanation
variation and influencing factors of floodplain flow in the lower reaches of the Yellow River constitute a key scientific issue in hydrology and fluvial geomorphology. This research directly impacts flood control safety and sediment regulation, falling within a research domain of clear disciplinary significance and practical relevance. It aligns with the journal's requirements for research utility and disciplinary relevance.
Conflicts of Interest:
The authors declare no conflicts of interest.
Referees
No requirements
References:
1.
Xia, J. Q., Wu, B. S., Wang, Y. P. & Li, W. W. .Estimating the bankfull discharge in the Lower Yellow River and analysis of its variation processes [J]. J. Sediment. Res. (02), 6–14. https://doi.org/10.16239/j.cnki.0468-155x.2010.02.003 (2010).
2.
Hou, Z. J., Li, Y. & Pang, L. X. Response of bankfull discharge in tail reaches of Yellow River estuary to runoff and sediment load [J]. Adv. Sci. Technol. Water Resour., (2): 23–27 (2011).
3.
Wu, B. S., Shen, Y., Ma, Z. P. & Zheng, S. Simulation methods of bankfull discharge in the Lower Yellow River [J]. J. Hydraul. Eng. 55 (07), 791–801. 10.13243/j.cnki.slxb.20230631 (2024).
4.
Chen, L., Hu, C. H. & Chen, X. J. Relationship between bank-full discharges and processes of flow-sediment in lower Yellow River [J]. J. Sediment. Res. 43 (04), 1–7. https://doi.org/10.16239/j.cnki.0468-155x.2018.04.001 (2018).
5.
Song, X. L., Zhong, D. Y. & Wang, G. Q. Simulation on the stochastic evolution of hydraulic geometry relationships with the stochastic changing bankfull discharges in the Lower Yellow River [J]. J. Geog. Sci. 30 (5), 843–864. https://doi.org/10.1007/s11442-020-1758-z (2020).
6.
Chen, M. et al. Analysis of the evolution trends and influential factors of bankfull discharge in the Lower Yellow River. Sci. Rep. 12, 19981. https://doi.org/10.1038/s41598-022-24310-6 (2022).
7.
Han, X. J. et al. Variation of bankfull discharge and its relationship with flow and sediment conditions during flood season in the Lower Yellow River. J. Sediment. Res. 49 (03), 69–73. https://doi.org/10.16239/j.cnki.0468-155x.2024.03.010.( (2024).
8.
Zhang, S. et al. Causes and countermeasures of sediment deposition of small and medium-sized rivers in the floodplain—A case study of the old course of the Fuhe River[J]. J. Lake Sci. 37 (5), 1835–1845. https://doi.org/10.18307/2025.0553 (2025).
9.
Wu, B. S. & Zheng, S. Delayed Response Theory and Applications for Fluvial Processes [. M] China Water&Power Press. 201511, 372 (2015).
10.
Li, L. Y. & Wu, B. S. Modification of delayed response model for bankfull discharges[J]. J. Sediment. Res. (02), 21–26. https://doi.org/10.16239/j.cnki.0468-155x.2011.02.006 (2011).
11.
Yao, W. et al. Analysis of the contribution of multiple factors to the recent decrease in discharge and sediment yield in the Yellow River Basin, China. J. Geogr. Sci. 26, 1289–1304. https://doi.org/10.1007/s11442-016-1227-7 (2016).
12.
Williams, G. P.Bankfull discharge of rivers[J].Water Rescources Research, Volume14, Issue6 1141–1154 (1978). https://doi.org/10.1029/WR014i006p01141
13.
Chen, J. G. et al. Change of bankfull and bed forming discharges in the Lower Yellow River [J].Journal of Sediment Research,(05),10–16 https://doi.org/10.16239/j.cnki.0468-155x.(2006).
14.
Li, Z. Analysis of Flood Carrying Capacity Change of Small and Medium-sized Rivers in the Upper Reaches of the Zhanghe River in Southern Shanxi[J]. E3S Web of Conferences, https://doi.org/359: 01018.10.1051/e3sconf/202339301021 (2023).
A
15.
Jiang, E. H. et al. Research on River regime Evolution Law and Mechanism of Wandering Reach in the Lower Yellow River[M] (China Water & Power, 2006).
16.
Li, D., Wang, G., Qin, C. & Wu, B. River Extraction under Bankfull Discharge Conditions Based on Sentinel-2 Imagery and DEM Data. Remote Sens. 13 (14), 2650. https://doi.org/10.3390/rs13142650 (2021).
17.
Li He Estimation of Bankfull Discharge in the Lower Yellow River. Water Resour. 46, 160–171 https://doi.org/10.1134/S009780781902009X (2019).
18.
Baosheng, W. U., Shan, Z. H. E. N. G., Yi, S. H. E. N. & Zipu, M. A. Influencing Factors and Complex Response of Bankfull Discharge of the Lower Yellow River[J]. J. Basic. Sci. Eng. 33 (1), 145–157. https://doi.org/10.16058/j.issn.1005-0930.2025.01.013 (2025).
19.
Yellow River Conservancy Commission. Yellow River Hydrological Yearbook: Downstream Stations [S] 1960 (China Water & Power, 2022).
20.
Lili, Z. Min Lu,Hemant Ishwaran,Variable priority for unsupervised variable selection. Pattern Recogn. 172 (112727). https://doi.org/10.1016/j.patcog.2025.112727 (2026).
21.
Breiman, L., Random & Forests Mach. Learn., 45, 5–32. https://doi.org/10.1023/a:1010933404324 (2001).
22.
Wen, T., Dong, D., Chen, Q., Chen, L. & Roberts, C. Maximal information coefficient-based two-stage feature selection method for railway condition monitoring. IEEE Trans. Intell. Transp. Syst. 20 (7), 2681–2690. https://doi.org/10.1109/TITS.2018.2881284 (2019).
23.
Luo, J. et al. Maximal information coefficient and geodetector coupled quantification model: a new data-driven approach to coalbed methane reservoir potential evaluation. J. Petrol. Explor. Prod. Technol. 14, 2937–2951. https://doi.org/10.1007/s13202-024-01880-x (2024).
Total words in MS: 4672
Total words in Title: 15
Total words in Abstract: 122
Total Keyword count: 4
Total Images in MS: 9
Total Tables in MS: 3
Total Reference count: 23