1.Introduction
As the core component of electric vehicles and portable electronic devices, the performance state of lithium-ion batteries directly determines the reliability and safety of the entire system[1] Accurate prediction of the remaining useful life (RUL) of batteries is not only essential for implementing predictive health management and avoiding potential risks but also serves as a key step in optimizing the life-cycle cost of equipment, holding significant engineering and scientific importance[2]. However, battery aging is a complex nonlinear process involving multi-physics coupling and multi-time-scale evolution. The degradation trajectory is jointly influenced by internal electrochemical side reactions and external operating conditions, exhibiting strong time-varying characteristics and uncertainty[3]. This inherent complexity poses a major challenge in developing RUL prediction models that achieve high accuracy, strong generalization, and robust physical interpretability.
Existing RUL prediction methods can be broadly categorized into two types: model-based approaches and data-driven approaches. Model-based methods rely on explicit physical equations and internal battery mechanisms, offering clear theoretical foundations and strong interpretability. Depending on the desired modeling accuracy and complexity, physical models can be further divided into pseudo–two-dimensional (P2D) electrochemical models, single-particle models(SPM), and equivalent circuit models (ECM)[4]. Among them, the P2D model is widely used in mechanism-oriented research due to its high fidelity; however, the computational burden associated with solving coupled partial differential equations limits its suitability for real-time applications[5]. The ECM, on the other hand, approximates the battery’s electrical characteristics through lumped parameters, significantly reducing computational cost while preserving essential physical meaning, making it widely adopted in engineering practice[6]. For example, Guha et al.[7] proposed an RUL prediction method based on a fractional-order equivalent circuit model, which demonstrated advantages in both accuracy and real-time performance. Nevertheless, purely physics-based approaches often struggle to accurately characterize nonlinear degradation behavior under complex real-world conditions. Their predictive performance heavily depends on the accurate identification of model parameters and prior knowledge of degradation mechanisms, which limits their flexibility and adaptability when faced with unknown or variable operating conditions[8].
In recent years, data-driven approaches, represented by deep learning, have demonstrated strong potential in RUL prediction. Unlike physics-based methods, these approaches do not rely on explicit physical equations; instead, they leverage deep neural networks to automatically learn degradation features and mapping relationships from historical operational data, thus exhibiting excellent nonlinear fitting capabilities[9].In terms of time-series modeling, recurrent neural networks (RNNs) and their variants such as long short-term memory (LSTM) networks and gated recurrent units (GRU) have been widely employed due to their effectiveness in capturing temporal dependencies[10]. For instance, Lei et al. [11]proposed an Auto-CNN-LSTM method based on an improved CNN-LSTM architecture. By integrating autoencoders and adaptive filters, this approach enhanced feature dimensionality and output stability, thereby improving the accuracy of battery RUL prediction. Similarly, Ning et al.[12] developed a novel hybrid model combining improved variational mode decomposition (VMD), Gaussian process regression (GPR), and gated recurrent units (GRU), optimized via a grey wolf optimizer (GWO). This method achieved high-accuracy RUL prediction while effectively quantifying prediction uncertainty. However, the inherent recursive structure of RNN-based models makes the training process difficult to parallelize, and they tend to suffer from gradient vanishing or explosion when handling long sequences. These limitations hinder their ability to model long-term dependencies effectively[13].
To overcome the aforementioned limitations, the Transformer model, based on the self-attention mechanism, has been introduced into the field of time-series prediction. This model adopts an encoder–decoder architecture and leverages a global attention mechanism to directly capture dependencies between any two points in a sequence while supporting efficient parallel training[14].In the field of battery health management, Transformer models and their variants have demonstrated remarkable advantages. For example, Saleem U et al.[15] proposed TransRUL, which was evaluated on the CALCE dataset and compared with CNN, LSTM and TCN models. The results showed that the Transformer achieved significantly lower prediction errors in RUL estimation. Similarly, Zhu et al.[16] proposed a modal decomposition-based hybrid RUL prediction method that integrates an enhanced Informer-LSTM model. In their approach, the CEEMDAN algorithm was employed to decompose the capacity sequence into high- and low-frequency components. The Informer model was used to further decompose and predict the high-frequency components, while the LSTM network handled the low-frequency ones. Experimental results on battery datasets demonstrated that the proposed method achieved an average fitting accuracy of approximately 99%.However, the standard Transformer typically processes exogenous variables (such as temperature, current, and other auxiliary information) using simple feature concatenation, which limits its ability to effectively model the complex interactions between endogenous sequences and exogenous factors[17].
To address this issue, the TimeXer model[18] introduces an innovative architectural design that separates the embedding pathways of endogenous and exogenous sequences, and employs a dedicated cross-attention mechanism to fuse their information, thereby enabling fine grained modeling of multivariate time series data. This architecture provides a novel pathway for integrating physical information with data-driven models. However, existing studies have primarily focused on conventional time-series domains such as finance and meteorology[19]. Research that applies TimeXer specifically to lithium-ion battery RUL prediction-a task characterized by strong nonlinearity, high noise, and inherent physical constraints-remains relatively scarce, and its technical potential in this complex context has yet to be fully explored.
It is worth noting that the predictive performance of deep learning models largely depends on the proper configuration of hyperparameters. Traditional optimization methods such as grid search and random search are computationally expensive and often fail to locate global optima in high-dimensional parameter spaces[20]. To overcome these challenges, meta-heuristic optimization algorithms have been introduced into the field of hyperparameter tuning due to their superior global search capability. For instance, Durmus F et al. [21]employed a genetic algorithm to optimize CNN hyperparameters, effectively improving model convergence speed; Fu et al. [22]applied a particle swarm optimization (PSO) algorithm for automated LSTM architecture search. The WOA[23], as an emerging bio-inspired optimization technique, has attracted attention for its simplicity, few control parameters, and fast convergence speed. However, the standard WOA still suffers from limitations such as premature convergence and insufficient accuracy when dealing with complex optimization problems[24], necessitating further improvements for effective deep learning hyperparameter optimization.
In response to these limitations, this study proposes a TimeXer-based prediction framework that integrates a first-order RC equivalent circuit model (ECM) with an improved Whale Optimization Algorithm (IWOA).
The main contributions of this work are summarized as follows:
1) Physics-based degradation feature extraction. A first-order RC ECM is employed for online parameter identification to obtain key electrochemical parameters reflecting the internal state of the battery. Based on these parameters, a set of health indicators (HIs) is constructed through temporal statistical analysis to quantify battery degradation from multiple perspectives, providing physically interpretable features for subsequent data-driven prediction.
2) Improved Whale Optimization Algorithm (IWOA) for automatic hyperparameter tuning. To address the shortcomings of standard WOA, the IWOA introduces a nonlinear convergence factor and an adaptive weight perturbation mechanism, effectively balancing global exploration and local exploitation. The algorithm is applied to optimize critical hyperparameters of the TimeXer model, enhancing both training efficiency and prediction stability.
3) Deep integration of physical mechanisms and data-driven modeling. The proposed framework embeds the extracted health indicators as exogenous variables, which are fused within the dedicated attention mechanism of the TimeXer architecture. This strategy not only leverages the strong sequence modeling capability of deep learning but also incorporates physical priors and mechanistic constraints, forming an end-to-end hybrid predictive framework that achieves high accuracy while significantly improving model generalization and interpretability.
2. Proposed Method
2.1 Problem Definition
Lithium-ion batteries undergo irreversible performance degradation during long-term cycling operation. The remaining useful life (RUL) is defined as the number of remaining cycles from the current time until the actual battery capacity decays to a predefined threshold. Let
denote the actual capacity at the
t-th cycle, and
represent the initial capacity. When
decreases to the threshold
(where
is typically set to 0.8), the battery is considered to have reached its end of life. Therefore, the primary objective of RUL prediction is to accurately estimate the remaining number of cycles between the current cycle
t and the end-of-life cycle
based on historical monitoring data[
25].
2.2 First-Order Equivalent Circuit Model and Parameter Identification
To characterize the relationship between the external electrical behavior of the battery and its internal aging state, a first-order RC equivalent circuit model is adopted as the physical modeling framework in this study. As illustrated in Fig. 1, the model consists of an open-circuit voltage source
, an ohmic resistance
, and an RC parallel network connected in series[
26]. Among them,
represents the steady-state open-circuit voltage of the battery and is closely related to its state of charge and state of health;
denotes the ohmic internal resistance of the battery; and the RC network is employed to describe the dynamic voltage response induced by polarization effects. The model takes the load current
as the input and outputs the terminal voltage
, which can be directly measured.
|
|
A A Fig. 1 First-Order RC Equivalent Circuit Model
|
| The value of the proposed model lies in the fact that its key parameters evolve in a regular manner in response to irreversible aging reactions occurring inside the battery. To accurately extract these parameters from measured data, this study employs the RLS [27]algorithm for online parameter identification.The system is discretized and formulated as a linear regression model: |
where
denotes the system output,
is the regression vector constructed from historical voltage and current measurements,
represents the parameter vector to be identified, and
is the modeling error. By recursively updating the Kalman gain and the covariance matrix, the RLS algorithm enables optimal estimation of the parameter vector
.
The RLS algorithm performs recursive parameter estimation by minimizing a weighted least-squares cost function. The core update equations are given as follows:
where
K(k) is the Kalman gain matrix,
P(k) denotes the estimation error covariance matrix, and
is the forgetting factor, which provides a trade-off between tracking capability and estimation stability. When
is chosen close to unity, the algorithm places greater emphasis on historical data, which is beneficial for noise suppression; conversely, smaller values of
enhance the algorithm’s ability to track time-varying parameters.
Provided that the regression vector satisfies the persistent excitation condition and that the forgetting factor is properly selected, the RLS algorithm guarantees boundedness and asymptotic convergence of the parameter estimates, thereby enabling stable online identification of the equivalent circuit model parameters. Through this procedure, a parameter sequence
can be obtained from each charge-discharge cycle. These parameters possess clear physical interpretations and directly reflect the time-varying internal impedance and dynamic characteristics of the battery. Based on the identified parameter trajectories, statistical analysis is subsequently performed to extract health indicators that characterize the battery aging trend. These physically meaningful features are then used as inputs to the subsequent RUL prediction model, thereby achieving effective integration of physical mechanisms with data-driven learning approaches.
2.3 IWOA
The predictive performance of deep learning models largely depends on the configuration of their hyperparameters. Traditional manual tuning or grid search methods are not only inefficient but also struggle to locate global optima in complex, high-dimensional parameter spaces. This limitation significantly constrains both the model’s performance potential and deployment efficiency.
To address this challenge, this study introduces an Improved Whale Optimization Algorithm (IWOA) to achieve automated global optimization of key hyperparameters in the TimeXer model. The application of this algorithm is a crucial step in ensuring that the proposed physics-embedded framework can consistently achieve optimal performance.
2.3.1 Whale Optimization Algorithm
The Whale Optimization Algorithm (WOA)[28] is a metaheuristic optimization method inspired by the cooperative hunting behavior of humpback whales in nature. By mimicking the processes of encircling prey, spiral bubble-net attacking, and random searching, WOA iteratively searches for the global optimum. The mathematical formulation of WOA mainly consists of three stages, as described below.
(1) Encircling prey
Assume that the population size of whales is
, and the position vector of each whale at iteration
is denoted as
. Since the exact position of the prey is unknown, the algorithm assumes that the whale with the best fitness value in the current population represents the approximate location of the prey, denoted as
. Other whales update their positions by moving toward this best individual, thereby gradually encircling the prey. The encircling behavior is mathematically modeled as:
where
denotes the distance vector between the current whale and the best solution,
is the iteration index,
is the position vector of a whale, and
represents the best position vector obtained so far. The coefficient vectors
and
are defined as:
where
and
are random numbers uniformly distributed in [0,1],
is the convergence factor that linearly decreases from 2 to 0 as the iteration progresses, and
denotes the maximum number of iterations.
(2) Spiral bubble-net attacking mechanism
In real hunting scenarios, humpback whales approach their prey along a spiral-shaped trajectory. This behavior is modeled in WOA using a logarithmic spiral equation, and the position update rule is expressed as:
where
represents the distance between the whale and the current best position,
is a constant defining the shape of the spiral, and
is a random number.Since whales alternately perform shrinking encircling and spiral attacking behaviors during the hunting process, WOA employs a probabilistic mechanism to model these two strategies simultaneously. Let
denote the probability of performing the encircling prey behavior; accordingly, the probability of executing the spiral bubble-net attacking behavior is
. The unified position update rule can thus be written as:
where
is a random number.As the iteration proceeds, the convergence factor
gradually decreases, which restricts the range of the coefficient vector
to [-1,1]. When
, whale individuals update their positions toward the best solution, indicating that the algorithm enters the exploitation phase and intensifies the search around the prey.
(3) Search for prey
When
, the algorithm assumes that the current best solution may correspond to a local optimum. In this case, whales switch to a random search strategy to explore the solution space. Specifically, a random whale position
is selected, and other whales update their positions relative to this randomly chosen individual. The mathematical model is given by:
This mechanism enhances the global exploration capability of the algorithm, enabling it to escape from local optima and improving the overall optimization performance.
2.3.2 Improvement Strategies
Although the standard whale optimization algorithm (WOA) is characterized by a simple structure and a small number of control parameters, it still suffers from several limitations when dealing with complex optimization problems, such as insufficient convergence accuracy and a tendency to become trapped in local optima. To overcome these drawbacks, two core improvement strategies are proposed in this study.
(1) Nonlinear convergence factor: The convergence factor
, which plays a critical role in balancing exploration and exploitation in the WOA, is modified from the conventional linear decreasing scheme to a nonlinear adaptive strategy based on a sinusoidal function:
The sinusoidal function exhibits smooth, continuous, and nonlinear variation, enabling the algorithm to maintain strong global exploration capability in the early iterations while accelerating convergence toward the optimal solution in the later stages. This mechanism effectively enhances the balance between exploration and exploitation throughout the optimization process.
(2) Adaptive weight and random perturbation: An iteration-dependent adaptive weight factor
is introduced into the position update equation, and a random perturbation term
is incorporated during the exploitation phase:
where
and
denote the maximum and minimum values of the weight factor, respectively,
is the decay coefficient,
controls the perturbation amplitude, and
is a uniformly distributed random number.
Accordingly, the improved position update rule can be expressed as:
where
denotes the current best solution,
represents a randomly selected whale position, and the remaining parameters follow the definitions in the standard WOA framework.
The adaptive weight factor
contributes to accelerating convergence,while the random perturbation term
effectively assists the population in escaping local optima, thereby enhancing the robustness of the algorithm. Through these improvements, the proposed improved WOA (IWOA) preserves the advantages of the standard WOA while significantly enhancing its search capability and convergence stability in complex high-dimensional spaces. As a result, it provides reliable technical support for hyperparameter optimization of the TimeXer model[
29].
2.3.3 Performance Evaluation of IWOA
To verify the effectiveness of the proposed improved whale optimization algorithm (IWOA), comparative experiments are conducted in this section using the CEC2017 benchmark test suite. Four representative benchmark functions, namely F3, F5, F7, and F9, are selected, covering both unimodal and multimodal optimization scenarios[30], as illustrated in Fig. 2. These benchmark functions are employed to comprehensively evaluate the optimization performance of IWOA in comparison with the standard whale optimization algorithm (WOA) and the sparrow search algorithm (SSA) under varying levels of problem complexity.
To ensure fairness and objectivity in the performance evaluation, all algorithms are tested under identical experimental conditions. Specifically, the dimensionality of the benchmark functions is set to 10, the population size is fixed at 30, and the maximum number of iterations is set to 500. In addition, each algorithm is independently executed 30 times, and the average results are reported as the final performance metrics to mitigate the influence of stochastic randomness.
|
|
|
|
|
|
|
|
A Fig. 2 Three-dimensional fitness landscapes of benchmark functions (F3, F5, F7, and F9) and convergence curves of IWOA, WOA, and SSA on different test functions, including both unimodal and multimodal cases.
|
Table 1
Comparison of the performance of three algorithms on the benchmark functions
|
Function name
|
Type
|
Mathematical Expression
|
Algorithm
|
Mean Objective Function Value
|
|
F3
|
Unimodal
|
|
IWOA
|
4.39e + 03
|
|
WOA
|
1.56e + 04
|
|
SSA
|
6.26e + 03
|
|
F5
|
Multimodal
|
|
IWOA
|
5.64e + 02
|
|
WOA
|
5.85e + 02
|
|
SSA
|
5.78e + 02
|
|
F7
|
Multimodal
|
|
IWOA
|
7.92e + 02
|
|
WOA
|
8.03e + 02
|
|
SSA
|
8.14e + 02
|
|
F9
|
Multimodal
|
|
IWOA
|
1.38e + 03
|
|
WOA
|
1.66e + 03
|
|
SSA
|
1.77e + 03
|
Where Z = M·(x-o), M is a D×D orthogonal rotation matrix, o is a
-dimensional shift vector, and
.
A
As shown in Fig. 2and the results summarized in Table 1, the proposed IWOA consistently exhibits superior optimization performance and faster convergence speed across the four representative benchmark functions. For the unimodal function F3, the convergence curve of IWOA remains consistently below those of WOA and SSA throughout the entire iteration process. It demonstrates a rapid decrease in the early stage and continues to improve steadily in the later stage, achieving a final average objective function value of only
, which is significantly better than that of WOA (
) and SSA (
). This indicates that IWOA is more effective in both global exploration and fine-grained exploitation.For the multimodal functions F5, F7, and F9, IWOA similarly demonstrates a stronger ability to escape local optima. Its convergence curves exhibit smoother and more stable descending trends, and the final mean objective values reach
,
, and
, respectively, all outperforming the comparison algorithms. These results confirm that the proposed improvement strategies effectively enhance the global search capability and robustness of the algorithm in complex multimodal optimization landscapes.
2.4TimeXer
To effectively leverage exogenous variables and enhance the prediction accuracy of the target sequence, this study employs the TimeXer model. Built upon the classical Transformer architecture, the key innovation of TimeXer lies in its ability to process endogenous and exogenous variables using distinct embedding strategies, while capturing dependencies along both the temporal and variable dimensions through self-attention and cross-attention mechanisms, without modifying the original Transformer components.
The TimeXer architecture, illustrated in Fig. 3, comprises the following key components: (1)Endogenous Sequence Embedding: The endogenous sequence is divided into segments, generating segment-level time tokens and a sequence-level global token.(2)Exogenous Sequence Embedding: Each exogenous sequence is embedded as a variable-level token.(3)Endogenous Self-Attention Layer: Self-attention is computed among all endogenous tokens (including time tokens and the global token) to capture fine-grained temporal dependencies.(4)Exogenous-Endogenous Cross-Attention Layer: The endogenous global token serves as the query, while the exogenous variable tokens act as keys and values. Cross-attention integrates exogenous information into the representation of the endogenous sequence, enhancing the model’s ability to account for auxiliary inputs.
|
|
A Fig. 3 Architecture of the TimeXer Model
|
| In forecasting tasks involving exogenous variables, let the univariate target sequence (endogenous sequence) be denoted by,and the exogenous variables by. Here, and represent the look-back window lengths of the endogenous and exogenous sequences, respectively, and may differ to accommodate heterogeneous sampling frequencies. The prediction model aims to generate -step-ahead forecasts based on : |
2.4.1 Endogenous Sequence Embedding
To reduce sequence length while retaining local temporal structures, TimeXer divides
into where N=[T/P]and each segment has length
. Each segment is then linearly projected to obtain the segment-level temporal tokens:
Considering the potential mismatch in granularity between endogenous and exogenous information, TimeXer introduces a learnable global token
to provide a holistic representation of the sequence and to serve as the interface for exogenous-endogenous interaction:
Thus, the endogenous embedding fed into the encoder consists of
segment tokens
and one global token
.
2.4.2 Exogenous Sequence Embedding
For exogenous variables, each sequence
is embedded on a per-variable basis. The VariateEmbed (·) module maps each sequence into a
-dimensional variable token:
This design naturally handles differing sequence lengths, misaligned timestamps, and heterogeneous sampling rates, while enabling the model to capture inter-variable dependencies at the variable level.
2.4.3 Endogenous Self-Attention
The segment tokens and the global token jointly enter the self-attention layer to model intra-sequence dependencies. Let the input of the
-th layer be
. The self-attention update is given by:
The output is then normalized and passed through a feed-forward network (FFN):
During this process, the global token aggregates global contextual information while providing feedback to the segment tokens, enabling hierarchical temporal modeling.
2.4.4 Exogenous-Endogenous Cross-Attention
To incorporate exogenous information, the updated global token
is used as the query
, while the exogenous tokens
serve as keys
and values
in a cross-attention operation:
The resulting fusion representation is passed through an FFN to produce the updated global token:
The enhanced global token subsequently influences the segment tokens in the next self-attention layer, enabling the exogenous information to propagate through the endogenous sequence representation in a hierarchical manner.
2.4.5 Output Layer and Loss Function
Finally, the segment tokens and global token from the last layer
are flattened and projected to generate the
-step predictions:
The model is trained using the mean squared error (MSE) loss:
2.5 Prediction Process
The proposed IWOA-ECM-TimeXer hybrid prediction framework achieves accurate prediction of the remaining useful life of lithium-ion batteries through a structured end-to-end process.The core workflow consists of three main stages: physical feature extraction, model hyperparameter optimization, and time series prediction, as illustrated in Fig. 4.
|
|
A Fig. 4 IWOA-ECM-TimeXer framework flowchart
|
The framework begins with physical feature extraction: based on the first-order equivalent circuit model (ECM) and the recursive least squares (RLS) method, key electrochemical parameters (R₀, R₁, C₁) are online identified from battery operation data. These parameters serve as fundamental inputs for constructing a set of health factors that reflect battery degradation characteristics.
Subsequently, the process enters the model optimization and prediction stage: the extracted health factors are used as exogenous input features for the TimeXer model. The improved Whale Optimization Algorithm (IWOA) is then employed to perform global optimization of the model’s key hyperparameters, ensuring the best configuration. The optimized TimeXer model, leveraging its dual-path attention mechanism, conducts deep modeling of the health feature sequences, thereby achieving accurate prediction of the future remaining useful life (RUL).
3. Experimental Design
3.1 Dataset
A
As summarized in Table 2, this study employs the lithium-ion battery cycling datasets released by the University of Oxford.
Table 2
Description of the Oxford Battery Datasets
|
Datasets
|
Oxford
|
|
Battery capacity
|
740mAh
|
|
Material of battery cell
|
LiMO2/Graphite
|
|
Nominal voltage
|
4.2V
|
|
Test temperature
|
40℃
|
|
Charge/discharge cut-off voltage
|
4.2/2.7V
|
|
Charge/discharge rate
|
0.74A(1C)
|
The Oxford battery degradation dataset consists of aging data from eight Kokam lithium-ion pouch cells, labeled Cell 1 through Cell 8, each with a nominal capacity of 740 mAh. The negative electrode material of the batteries is graphite, while the positive electrode material is LiMO2. During the aging experiments, all batteries were subjected to charge–discharge cycling under dynamic operating conditions to simulate realistic load variations encountered in electric vehicle applications. A cycling current rate of 2C was applied to accelerate battery degradation. To accurately evaluate the capacity fading behavior, the battery capacity was measured every 100 aging cycles using a complete charge–discharge process at a 1C current rate. Figure 5 illustrates the capacity degradation trajectories of the eight batteries throughout the aging process[31].
|
|
A Fig. 5 Capacity degradation curves of Oxford datasets
|
3.2 Parameter Identification Results
To systematically investigate the evolution patterns of internal state parameters during the aging process of lithium-ion batteries, this study conducts comprehensive parameter identification based on a first-order equivalent circuit model across multiple battery datasets. Since all datasets exhibit similar aging trends, Cell 1 from the Oxford battery dataset is selected as a representative example. The identified key parameters(R0, R1 and C1 )over the entire aging cycle are presented in Fig. 6.
Figure 6(a) illustrates the evolution trajectory of the ohmic resistance R0. The identification results show that R0 increases from an initial value of 0.0209 Ω to 0.0406 Ω, corresponding to a 94.3% rise. This monotonically increasing trend is physically associated with irreversible aging mechanisms such as the continuous growth of the solid electrolyte interphase (SEI) layer and the loss of active lithium during cycling. (a) Evolution curve of ohmic resistance R0, (b) Evolution curve of polarization resistance R1, (c) Evolution curve of polarization capacitance C1, (d) Capacity degradation curve
Figure 6(b) presents the dynamic response characteristics of the polarization resistance R1, which rises from 0.02671 Ω to 0.0869 Ω. This nonlinear evolution pattern may result from structural phase transitions within the electrode materials and the cumulative effect of charge transfer resistance.
Figure 6(c) depicts the degradation trajectory of the polarization capacitance C1, which decreases from an initial value of 561 F to 118 F a reduction of 78.9%. This decline directly reflects the reduction of the effective electrode surface area and the deterioration of ionic diffusion capability, both of which are closely related to microstructural changes in the electrode materials.
Figure 6(d) presents the capacity fading trajectory of Cell 1. Identification results show that the capacity gradually fades from an initial value of about 0.74 mAh to about 0.53 mAh, showing a continuous decreasing trend with the increase of cycle number.
Overall, these parameter identification results not only validate the capability of the first-order equivalent circuit model to characterize the aging state of lithium-ion batteries but also provide important quantitative insights into the underlying degradation mechanisms. The pronounced variations observed in each parameter’s evolution trajectory demonstrate the effectiveness and sensitivity of equivalent circuit model parameters as indicators of battery health status.
3.2.1 Construction of High-Order Health Indicators
A
To further extract the aging information embedded in the equivalent circuit model parameters, this study constructs six types of high-order health indicators based on the three identified fundamental parameters (
R ,R1,C1 )These indicators possess clear physical interpretations and characterize the internal state variations of the battery from multiple perspectives[
32][
33]. The mathematical definitions and physical meanings of the high-order health indicators are summarized in Table 3.
Table 3
Definitions of high-order health indicators.
|
Health Indicator
|
Mathematical Expression
|
Physical Interpretation
|
|
H1: Total Impedance Factor
|
|
Represents the overall impedance state of the battery, reflecting the combined effects of ohmic and electrochemical polarization.
|
|
H2: Polarization Time Constant
|
|
Describes the dynamic response characteristics of the polarization process and is related to the charge transfer rate at the electrode–electrolyte interface.
|
|
H3: Internal Resistance Ratio
|
|
Reflects the relative variation between different types of impedance, indicating the dominant aging mode (ohmic or electrochemical polarization).
|
|
H4: Impedance Growth Factor
|
|
Quantifies the relative increase in ohmic resistance with respect to the initial value R00; directly related to the growth of the SEI layer thickness.
|
|
H5: Capacitance Degradation Factor
|
|
Represents the relative decline in polarization capacitance with respect to the initial value C10; reflects the loss of active surface area of the electrodes.
|
|
H6: Comprehensive Aging Index
|
|
A composite indicator integrating impedance growth and capacitance degradation, used to comprehensively assess the overall aging state of the battery.
|
Based on the feature space constructed from the above health indicators, further evaluation is required to assess the degree of association between each indicator and the battery’s capacity degradation, in order to identify the most predictive subset of features for subsequent modeling.
3.2.2 Correlation Analysis and Feature Selection
To quantitatively evaluate the relationship between the constructed high-order health indicators and battery capacity degradation, this study adopts the Spearman rank correlation coefficient for correlation analysis[34][35]. This method measures the monotonic relationship between variables based on rank differences and is well suited for handling nonlinear and non-normally distributed data, making it particularly applicable to the complex nonlinear characteristics encountered in battery aging processes.
Let the feature variable be denoted as
and the target capacity as
. The Spearman correlation coefficient is defined as:
A
where
and
are the ranks of
and
, respectively,
represents the difference between the corresponding ranks, and
denotes the sample size. The correlation results are illustrated in Fig. 7 and summarized in Table 4.
|
|
|
A Fig. 6 Heatmap and bar chart of Spearman correlation coefficients
|
Table 4
Correlation analysis results between health indicators and battery capacity
|
Dataset
|
H1
|
H2
|
H3
|
H4
|
H5
|
H6
|
|
Oxford
Battery
Dataset
|
Cell 1
|
-0.9953
|
0.9083
|
-0.6858
|
-0.9989
|
0.9773
|
-0.9933
|
|
Cell 2
|
-0.9923
|
0.9307
|
-0.5520
|
-0.9961
|
0.9791
|
-0.9924
|
|
Cell 3
|
-0.9961
|
0.8724
|
-0.8035
|
-0.9987
|
0.9801
|
-0.9929
|
|
Cell 4
|
-0.9978
|
0.9370
|
-0.9284
|
-0.9944
|
0.9956
|
-0.9979
|
|
Cell 5
|
-0.9989
|
0.9206
|
-0.8480
|
-0.9985
|
0.9978
|
-0.9984
|
|
Cell 6
|
-0.9972
|
0.8849
|
-0.6985
|
-0.9962
|
0.9961
|
-0.9979
|
|
Cell 7
|
-0.9976
|
0.9268
|
-0.7547
|
-0.9989
|
0.9888
|
-0.9951
|
|
Cell 8
|
-0.9958
|
0.9096
|
-0.8158
|
-0.9989
|
0.9835
|
-0.9929
|
As shown in Fig. 7 and Table 4, the correlations between the health indicators and capacity exhibit consistent trends across different datasets:(1) H₁, H₄, and H₆ show a strong negative correlation (ρ<-0.9), indicating that the increase in battery impedance is highly synchronized with capacity degradation.(2) H₂ and H₅ exhibit a significant positive correlation (𝜌>0.9), suggesting that both the extension of polarization time and the attenuation of capacitance intensify as capacity loss progresses.(3) H₃ presents a moderate negative correlation (− 0.55<𝜌<−0.80), implying that while it provides useful insights into distinguishing between different aging modes (ohmic-dominated or polarization-dominated), it is not a primary determinant of capacity variation.
Based on the above analysis, it can be observed that the constructed health indicators demonstrate complementary characteristics in describing battery degradation behavior. Among them, H₁, H₄, and H₆ are the most sensitive to capacity changes, serving as key indicators for characterizing the aging progression. Meanwhile, H₂ and H₅ reveal the dynamic evolution of polarization from a kinetic perspective, supporting real-time monitoring of battery health states. Therefore, by comprehensively considering correlation strength, physical interpretability, and inter-feature independence, this study ultimately selects the aforementioned(H1,H2,H4,H5,H6)five health indicators to construct a multi-scale health feature set. This feature set provides a holistic representation of battery degradation from impedance growth and polarization dynamics to overall deterioration, serving as a high-quality input foundation for the subsequent capacity prediction model.
4. Experimental Results and Analysis
4.1 Evaluation Metrics
To quantitatively evaluate the predictive performance of the proposed model, three commonly used regression performance metrics are adopted in this study, namely the root mean square error (RMSE), mean absolute error (MAE), and the coefficient of determination
[36]. The calculation formulas of these evaluation metrics are given as follows:
where
denotes the total number of lithium-ion battery capacity data samples,
represents the true capacity value of the lithium-ion battery,
denotes the corresponding predicted capacity value, and
is the mean value of the measured capacity data.
4.2 Hyperparameter Settings
In terms of hyperparameter configuration, reasonable search ranges are specified for several key hyperparameters of the TimeXer model, and an IWOA is employed to automatically search for the optimal combination within these ranges. Specifically, the learning rate is searched within the range of 0.0001 to 0.005 to balance training stability and convergence speed. The hidden feature dimension (d_model) is set between 32 and 128 to achieve a trade-off between model representation capability and computational efficiency. The number of attention heads (n_heads) is limited to 1–4 to avoid parameter redundancy caused by an excessive number of heads. The dropout rate is constrained to the range of 0.05–0.30 to mitigate the risk of overfitting. The numbers of encoder layers (e_layers) and decoder layers (d_layers) are searched within the ranges of 1–3 and 1–2, respectively.
The above hyperparameter ranges are determined by referring to commonly adopted configurations in existing studies on battery RUL prediction and time-series Transformer-based models[37][38], and are further adjusted according to the scale of the Oxford battery dataset and the overall model complexity. For the IWOA, the population size and the maximum number of iterations are set to 30 and 15, respectively, with the inertia weight nonlinearly decaying from 0.4 to 0.9. During the optimization process, the MSE on the validation set is used as the fitness function. After completing the hyperparameter search on the initially selected training–validation split, the obtained optimal hyperparameter combination is fixed and applied to all subsequent leave-one-battery-out cross-validation experiments. This strategy ensures the objectivity of hyperparameter selection and enhances the generalization capability of the proposed model.
4.3 RUL Prediction Comparison
This study is conducted based on the publicly available Oxford battery degradation dataset provided by the University of Oxford, which consists of eight lithium-ion batteries aged under identical experimental conditions. To systematically evaluate the cross-battery generalization capability of the proposed model, a rigorous Leave-One-Battery-Out (LOBO) cross-validation strategy is adopted.
Specifically, in each validation round, one battery is selected as the test set, while the remaining seven batteries are further divided into a training set and a validation set. Among these seven batteries, one battery is used for validation and the remaining six batteries are used for model training. Consequently, for each test battery, seven distinct “training–validation–test” configurations are generated. In total, the experimental framework involves 56 (8 × 7) independent model training and evaluation processes. This cross-validation design exhaustively covers all possible cross-battery combinations, enabling a comprehensive assessment of the model’s generalization ability and robustness across different batteries, while effectively preventing data leakage.
A
To ensure the scientific rigor and reliability of the experimental results, a complete set of comparative and ablation experiments is designed. In the comparative experiments, the proposed IWOA-TimeXer model is benchmarked against several baseline models, including WOA-TimeXer, TimeXer, CNN + LSTM, and the standard Transformer model, to verify the effectiveness of the proposed improvements. The ablation experiments are conducted by progressively removing key components of the model, thereby analyzing the contribution of each module to the overall performance. In addition, each experimental configuration is repeated multiple times, and statistical analysis is performed to mitigate the influence of random factors, ensuring the reproducibility of the results and the reliability of the conclusions. In the figures, the red dashed line in the figure indicates the point where the battery capacity has decreased to 80% of its initial value.The prediction results are shown in Fig. 8.
|
|
|
|
༈a༉ Cell 1
|
༈b༉ Cell 2
|
|
|
|
|
༈c༉ Cell 3
|
༈d༉ Cell 4
|
|
|
|
|
༈e༉ Cell 5
|
༈f༉ Cell 6
|
|
|
|
|
༈g༉ Cell 7
|
༈h༉ Cell 8
|
A Fig. 7 RUL prediction results on the Oxford battery dataset
|
| Table 5 Error metrics on the Oxford battery dataset |
First, as shown in the MAE radar chart, the IWOA-TimeXer model exhibits the most compact overall contour and remains closest to the center for almost all batteries, indicating the lowest MAE and the smallest overall prediction error. In particular, for Cell 1, Cell 3, Cell 6, and Cell 7, the MAE values of IWOA-TimeXer are only 0.0014, 0.0015, 0.0015, and 0.0008, respectively, which are significantly lower than those obtained by TimeXer, CNN + LSTM, and the Transformer model. Moreover, for Cell 2, which exhibits a more complex degradation trajectory, IWOA-TimeXer still achieves the lowest MAE (0.0057), demonstrating its robust predictive capability under non-stationary operating conditions. It is also observed that for Cell 2 and Cell 5, prediction errors increase noticeably toward the end of the battery life. This phenomenon can be attributed to the sudden acceleration of capacity degradation in the late stage of these batteries, where strong nonlinearity and non-stationary characteristics substantially increase the difficulty of remaining useful life prediction.
Second, the RMSE bar chart provides a direct comparison of error magnitude variations across different batteries. While all models exhibit noticeable performance fluctuations depending on the battery, the RMSE of IWOA-TimeXer consistently remains at the lowest level. Particularly for Cell 4, Cell 6, and Cell 7, which present relatively smooth degradation trends with minor local fluctuations, the RMSE values of IWOA-TimeXer are 0.0023, 0.0030, and 0.0011, respectively, which are markedly lower than those of WOA-TimeXer and the original TimeXer model. These results indicate that the IWOA-based optimization effectively enhances TimeXer’s ability to capture subtle capacity variations, enabling more accurate fitting during dynamically evolving degradation stages.
Finally, the R²scatter plot further highlights the performance differences from the perspective of goodness of fit. As shown in the figure, the R²values of IWOA-TimeXer consistently remain within a high range of 0.96–0.999 for most batteries. In particular, near-perfect fitting performance is achieved for Cell 7 (0.9992), Cell 1 (0.9979), and Cell 3 (0.9978). In contrast, the other models exhibit substantial degradation in fitting performance for Cell 2 and Cell 5; for example, the CNN + LSTM model attains an R²of only 0.8122 for Cell 2, while the TimeXer model drops sharply to 0.6273 for Cell 5. These results reveal the limitations of the baseline models in handling highly nonlinear degradation phases. By comparison, the scatter points of IWOA-TimeXer are more densely clustered and closer to the upper bound of 1.0, indicating superior cross-battery fitting capability and more stable generalization performance.
Overall, the comparative visualizations of the three evaluation metrics confirm that the proposed IWOA-TimeXer model achieves the best performance across all dimensions, including error magnitude (MAE and RMSE) and goodness of fit (R²). The model not only consistently outperforms the unoptimized TimeXer on all batteries, but also demonstrates clear advantages over WOA-TimeXer, CNN + LSTM, and the standard Transformer model. The combined trends observed in the three figures validate that IWOA-TimeXer offers lower prediction errors, stronger robustness, and higher cross-battery consistency, making it a highly suitable and reliable model for battery health prediction tasks.
5.Conclusion
This study addresses the limitations of traditional data-driven models in lithium-ion battery RUL prediction, including the lack of physical mechanism support, insufficient prediction accuracy, and limited interpretability. To this end, a TimeXer-based prediction framework incorporating an equivalent circuit model (ECM) and an IWOA is proposed. By integrating physical modeling with deep learning, the proposed approach enables accurate characterization and efficient prediction of battery degradation behavior. The main conclusions are summarized as follows:
(1) Effectiveness of physical features: The electrochemical parameters identified through ECM modeling can effectively reflect the internal degradation mechanisms of lithium-ion batteries. When these physically meaningful features are introduced as exogenous inputs into the TimeXer model, the model becomes more sensitive to capacity degradation trends, and the prediction results exhibit improved physical consistency. As a result, the interpretability of the prediction model is significantly enhanced.
(2) Advantages of the IWOA optimization strategy: The Improved Whale Optimization Algorithm demonstrates excellent stability and convergence efficiency in global hyperparameter search. Compared with conventional heuristic optimization algorithms, IWOA exhibits stronger global search capability and requires fewer control parameters, enabling it to effectively avoid local optima. The incorporation of IWOA leads to a more robust training process of the TimeXer model across different battery samples, thereby further improving prediction accuracy.
(3) Prediction performance and generalization capability: Experimental results on the Oxford battery dataset indicate that the proposed ECM-IWOA-TimeXer model outperforms WOA-TimeXer, TimeXer, CNN + LSTM, and the standard Transformer model in terms of MAE and RMSE. These results demonstrate the clear advantages of the proposed model in cross-battery generalization and complex degradation modeling, and verify the effectiveness of the physics-informed deep learning fusion strategy in enhancing battery RUL prediction accuracy.
Future work will focus on:(1) Incorporating higher-order physical models (e.g., second- or third-order RC models) to enrich the feature space and improve physical consistency;(2) Integrating multi-source signals (voltage, current, temperature, etc.) for multi-modal modeling, enhancing adaptability to complex operating conditions;(3) Embedding physical constraint equations within model architectures to build intelligent prediction systems that maintain both mechanistic consistency and generalization capability.
Authors’ contribution Pei Tang: Conceptualization and research design. Lihui Liu: Data collection, analysis, and initial manuscript drafting. Zhongran Yao: Provision of experimental materials, equipment, and technical support. Xiaoyong Gu and Zetao Qiu: Assistance in data collection and analysis. Changcheng Sun and Wenbo Lei: Contribution to writing and revising the manuscript.