Application of a TimeXer Model Incorporating ECM Based Features in Battery Remaining Useful Life Prediction

Present Address:

Pei Tang 1 Email25595224@qq.com

Lihui Liu 1✉ Emailllh3226471576@163.com Emailqzt1520691084@163.com

Wenbo Lei 1 Emailleiwenbo2024@163.com

Zetao Qiu 1

Zhongran Yao 2 Emailyaozr05130928@163.com

Xiaoyong Gu 3 Emailguxy@wxit.edu.cn

Changcheng Sun 1 Email@ycit.edu.cn

Changcheng Sun. sunchangcheng 1

1 School of Automotive Engineering Yancheng Institute of Technology 224051 Yancheng China

2 School of Automobile and Traffic Engineering Wuxi Institute of Technology 214121 Wuxi China

3 Engineering Research Center of New Energy Vehicle Energy Saving and Battery Safety Wuxi Institute of Technology 214121 Wuxi China

Pei Tang¹,Lihui Liu^1,*, Wenbo Lei¹,Zetao Qiu¹,Zhongran Yao²,Xiaoyong Gu³ and Changcheng Sun¹

¹School of Automotive Engineering, Yancheng Institute of Technology, Yancheng 224051, China.

²School of Automobile and Traffic Engineering, Wuxi Institute of Technology, Wuxi 214121, China.

³Engineering Research Center of New Energy Vehicle Energy Saving and Battery Safety, Wuxi Institute of Technology, Wuxi 214121, China.

Pei Tang. 25595224@qq.com

Lihui Liu. llh3226471576@163.com

Wenbo Lei. leiwenbo2024@163.com

Zetao Qiu. qzt1520691084@163.com

Zhongran Yao. yaozr05130928@163.com

Xiaoyong Gu. guxy@wxit.edu.cn

Changcheng Sun. sunchangcheng @ycit.edu.cn

Abstract

Accurate prediction of the remaining useful life (RUL) of lithium-ion batteries is critical for ensuring the safe and reliable operation of energy storage systems. Although existing data-driven methods demonstrate strong nonlinear modeling capabilities, they often suffer from a “black-box” nature and lack physical mechanism constraints, resulting in limited interpretability and generalization performance. To address these issues, this study proposes a TimeXer-based deep learning framework that integrates an equivalent circuit model (ECM) with an improved whale optimization algorithm (IWOA).First, a first-order RC equivalent circuit model is established, and recursive least squares is employed to achieve online parameter identification. Based on the identified physical parameters, six high-order health indicators with clear physical significance are constructed. Through correlation analysis, five key indicators are selected as exogenous inputs to the TimeXer model. Second, the dual-path attention mechanism of TimeXer is utilized to model battery degradation. Endogenous self-attention captures the temporal dependencies of the capacity degradation sequence, while exogenous–endogenous cross-attention effectively fuses physical health information, enabling accurate characterization of degradation trajectories. Subsequently, the IWOA algorithm is introduced to globally optimize the hyperparameters of the TimeXer model. By incorporating an improved nonlinear convergence factor and an adaptive perturbation strategy, IWOA effectively mitigates the tendency of the standard whale optimization algorithm to become trapped in local optima. Finally, experiments conducted on the Oxford battery dataset demonstrate that the proposed IWOA-TimeXer model achieves consistently superior prediction performance, with an average mean absolute error (MAE) of 0.00268. Compared with WOA-TimeXer, TimeXer, CNN + LSTM, and Transformer models, the MAE is reduced by 26.96%, 54.47%, 59.24%, and 58.28%, respectively.Overall, this study establishes a physics–data-driven hybrid framework that achieves both high prediction accuracy and strong interpretability, providing an effective solution for lithium-ion battery health management and offering valuable insights into the application of physics-informed neural networks in energy systems.

Keywords:

Lithium-ion battery

Remaining useful life

Equivalent circuit model

Whale optimization algorithm

TimeXer

1.Introduction

As the core component of electric vehicles and portable electronic devices, the performance state of lithium-ion batteries directly determines the reliability and safety of the entire system[1] Accurate prediction of the remaining useful life (RUL) of batteries is not only essential for implementing predictive health management and avoiding potential risks but also serves as a key step in optimizing the life-cycle cost of equipment, holding significant engineering and scientific importance[2]. However, battery aging is a complex nonlinear process involving multi-physics coupling and multi-time-scale evolution. The degradation trajectory is jointly influenced by internal electrochemical side reactions and external operating conditions, exhibiting strong time-varying characteristics and uncertainty[3]. This inherent complexity poses a major challenge in developing RUL prediction models that achieve high accuracy, strong generalization, and robust physical interpretability.

Existing RUL prediction methods can be broadly categorized into two types: model-based approaches and data-driven approaches. Model-based methods rely on explicit physical equations and internal battery mechanisms, offering clear theoretical foundations and strong interpretability. Depending on the desired modeling accuracy and complexity, physical models can be further divided into pseudo–two-dimensional (P2D) electrochemical models, single-particle models(SPM), and equivalent circuit models (ECM)[4]. Among them, the P2D model is widely used in mechanism-oriented research due to its high fidelity; however, the computational burden associated with solving coupled partial differential equations limits its suitability for real-time applications[5]. The ECM, on the other hand, approximates the battery’s electrical characteristics through lumped parameters, significantly reducing computational cost while preserving essential physical meaning, making it widely adopted in engineering practice[6]. For example, Guha et al.[7] proposed an RUL prediction method based on a fractional-order equivalent circuit model, which demonstrated advantages in both accuracy and real-time performance. Nevertheless, purely physics-based approaches often struggle to accurately characterize nonlinear degradation behavior under complex real-world conditions. Their predictive performance heavily depends on the accurate identification of model parameters and prior knowledge of degradation mechanisms, which limits their flexibility and adaptability when faced with unknown or variable operating conditions[8].

In recent years, data-driven approaches, represented by deep learning, have demonstrated strong potential in RUL prediction. Unlike physics-based methods, these approaches do not rely on explicit physical equations; instead, they leverage deep neural networks to automatically learn degradation features and mapping relationships from historical operational data, thus exhibiting excellent nonlinear fitting capabilities[9].In terms of time-series modeling, recurrent neural networks (RNNs) and their variants such as long short-term memory (LSTM) networks and gated recurrent units (GRU) have been widely employed due to their effectiveness in capturing temporal dependencies[10]. For instance, Lei et al. [11]proposed an Auto-CNN-LSTM method based on an improved CNN-LSTM architecture. By integrating autoencoders and adaptive filters, this approach enhanced feature dimensionality and output stability, thereby improving the accuracy of battery RUL prediction. Similarly, Ning et al.[12] developed a novel hybrid model combining improved variational mode decomposition (VMD), Gaussian process regression (GPR), and gated recurrent units (GRU), optimized via a grey wolf optimizer (GWO). This method achieved high-accuracy RUL prediction while effectively quantifying prediction uncertainty. However, the inherent recursive structure of RNN-based models makes the training process difficult to parallelize, and they tend to suffer from gradient vanishing or explosion when handling long sequences. These limitations hinder their ability to model long-term dependencies effectively[13].

To overcome the aforementioned limitations, the Transformer model, based on the self-attention mechanism, has been introduced into the field of time-series prediction. This model adopts an encoder–decoder architecture and leverages a global attention mechanism to directly capture dependencies between any two points in a sequence while supporting efficient parallel training[14].In the field of battery health management, Transformer models and their variants have demonstrated remarkable advantages. For example, Saleem U et al.[15] proposed TransRUL, which was evaluated on the CALCE dataset and compared with CNN, LSTM and TCN models. The results showed that the Transformer achieved significantly lower prediction errors in RUL estimation. Similarly, Zhu et al.[16] proposed a modal decomposition-based hybrid RUL prediction method that integrates an enhanced Informer-LSTM model. In their approach, the CEEMDAN algorithm was employed to decompose the capacity sequence into high- and low-frequency components. The Informer model was used to further decompose and predict the high-frequency components, while the LSTM network handled the low-frequency ones. Experimental results on battery datasets demonstrated that the proposed method achieved an average fitting accuracy of approximately 99%.However, the standard Transformer typically processes exogenous variables (such as temperature, current, and other auxiliary information) using simple feature concatenation, which limits its ability to effectively model the complex interactions between endogenous sequences and exogenous factors[17].

To address this issue, the TimeXer model[18] introduces an innovative architectural design that separates the embedding pathways of endogenous and exogenous sequences, and employs a dedicated cross-attention mechanism to fuse their information, thereby enabling fine grained modeling of multivariate time series data. This architecture provides a novel pathway for integrating physical information with data-driven models. However, existing studies have primarily focused on conventional time-series domains such as finance and meteorology[19]. Research that applies TimeXer specifically to lithium-ion battery RUL prediction-a task characterized by strong nonlinearity, high noise, and inherent physical constraints-remains relatively scarce, and its technical potential in this complex context has yet to be fully explored.

It is worth noting that the predictive performance of deep learning models largely depends on the proper configuration of hyperparameters. Traditional optimization methods such as grid search and random search are computationally expensive and often fail to locate global optima in high-dimensional parameter spaces[20]. To overcome these challenges, meta-heuristic optimization algorithms have been introduced into the field of hyperparameter tuning due to their superior global search capability. For instance, Durmus F et al. [21]employed a genetic algorithm to optimize CNN hyperparameters, effectively improving model convergence speed; Fu et al. [22]applied a particle swarm optimization (PSO) algorithm for automated LSTM architecture search. The WOA[23], as an emerging bio-inspired optimization technique, has attracted attention for its simplicity, few control parameters, and fast convergence speed. However, the standard WOA still suffers from limitations such as premature convergence and insufficient accuracy when dealing with complex optimization problems[24], necessitating further improvements for effective deep learning hyperparameter optimization.

In response to these limitations, this study proposes a TimeXer-based prediction framework that integrates a first-order RC equivalent circuit model (ECM) with an improved Whale Optimization Algorithm (IWOA).

The main contributions of this work are summarized as follows:

1) Physics-based degradation feature extraction. A first-order RC ECM is employed for online parameter identification to obtain key electrochemical parameters reflecting the internal state of the battery. Based on these parameters, a set of health indicators (HIs) is constructed through temporal statistical analysis to quantify battery degradation from multiple perspectives, providing physically interpretable features for subsequent data-driven prediction.

2) Improved Whale Optimization Algorithm (IWOA) for automatic hyperparameter tuning. To address the shortcomings of standard WOA, the IWOA introduces a nonlinear convergence factor and an adaptive weight perturbation mechanism, effectively balancing global exploration and local exploitation. The algorithm is applied to optimize critical hyperparameters of the TimeXer model, enhancing both training efficiency and prediction stability.

3) Deep integration of physical mechanisms and data-driven modeling. The proposed framework embeds the extracted health indicators as exogenous variables, which are fused within the dedicated attention mechanism of the TimeXer architecture. This strategy not only leverages the strong sequence modeling capability of deep learning but also incorporates physical priors and mechanistic constraints, forming an end-to-end hybrid predictive framework that achieves high accuracy while significantly improving model generalization and interpretability.

2. Proposed Method

2.1 Problem Definition

Lithium-ion batteries undergo irreversible performance degradation during long-term cycling operation. The remaining useful life (RUL) is defined as the number of remaining cycles from the current time until the actual battery capacity decays to a predefined threshold. Let

$\:{Q}_{t}$

denote the actual capacity at the t-th cycle, and

$\:{Q}_{0}$

represent the initial capacity. When

$\:{Q}_{t}$

decreases to the threshold

$\:{Q}_{t}=\alpha\:{Q}_{0}$

(where

$\:\alpha\:$

is typically set to 0.8), the battery is considered to have reached its end of life. Therefore, the primary objective of RUL prediction is to accurately estimate the remaining number of cycles between the current cycle t and the end-of-life cycle

$\:{t}_{\text{end}}$

based on historical monitoring data[25].

2.2 First-Order Equivalent Circuit Model and Parameter Identification

To characterize the relationship between the external electrical behavior of the battery and its internal aging state, a first-order RC equivalent circuit model is adopted as the physical modeling framework in this study. As illustrated in Fig. 1, the model consists of an open-circuit voltage source

$\:{U}_{\text{O}\text{C}}$

, an ohmic resistance

$\:{R}_{0}$

, and an RC parallel network connected in series[26]. Among them,

$\:{U}_{\text{O}\text{C}}$

represents the steady-state open-circuit voltage of the battery and is closely related to its state of charge and state of health;

$\:{R}_{0}$

denotes the ohmic internal resistance of the battery; and the RC network is employed to describe the dynamic voltage response induced by polarization effects. The model takes the load current

$\:I$

as the input and outputs the terminal voltage

$\:{V}_{0}$

, which can be directly measured.


A A Fig. 1 First-Order RC Equivalent Circuit Model
The value of the proposed model lies in the fact that its key parameters evolve in a regular manner in response to irreversible aging reactions occurring inside the battery. To accurately extract these parameters from measured data, this study employs the RLS [27]algorithm for online parameter identification.The system is discretized and formulated as a linear regression model:

$\:\begin{array}{c}y\left(k\right)={\phi\:}^{T}\left(k\right)\theta\:\left(k\right)+e\left(k\right)\#\left(1\right)\end{array}$

where

$\:y\left(k\right)$

denotes the system output,

$\:\varphi\:\left(k\right)$

is the regression vector constructed from historical voltage and current measurements,

$\:\theta\:\left(k\right)=[{R}_{0},{R}_{1},{C}_{1}{]}^{T}$

represents the parameter vector to be identified, and

$\:e\left(k\right)$

is the modeling error. By recursively updating the Kalman gain and the covariance matrix, the RLS algorithm enables optimal estimation of the parameter vector

$\:\theta\:\left(k\right)$

The RLS algorithm performs recursive parameter estimation by minimizing a weighted least-squares cost function. The core update equations are given as follows:

$\:\begin{array}{c}K\left(k\right)\:=\:P(k-1)\:\phi\:\left(k\right)\:/\:[\:\lambda\:\:+\:\phi\:ᵀ(k\left)\:P\right(k-1\left)\:\phi\:\right(k\left)\:\right]\#\left(2\right)\end{array}$

$\:\begin{array}{c}\theta\:\left(k\right)=\:\theta\:\left(k-1\right)+\:K\left(k\right)\left[\:y\left(k\right)-\:\phi\:ᵀ\left(k\right)\theta\:\left(k-1\right)\right]\#\left(3\right)\end{array}$

$\:\begin{array}{c}P\left(k\right)=\:\left(\frac{1}{\lambda\:}\right)P\left(k-1\right)-\:\left(\frac{1}{\lambda\:}\right)K\left(k\right)\phi\:ᵀ\left(k\right)P\left(k-1\right)\#\left(4\right)\end{array}$

where K(k) is the Kalman gain matrix, P(k) denotes the estimation error covariance matrix, and

$\:\left.\lambda\:\in\:(\text{0,1}\right]$

is the forgetting factor, which provides a trade-off between tracking capability and estimation stability. When

$\:\lambda\:$

is chosen close to unity, the algorithm places greater emphasis on historical data, which is beneficial for noise suppression; conversely, smaller values of

$\:\lambda\:$

enhance the algorithm’s ability to track time-varying parameters.

Provided that the regression vector satisfies the persistent excitation condition and that the forgetting factor is properly selected, the RLS algorithm guarantees boundedness and asymptotic convergence of the parameter estimates, thereby enabling stable online identification of the equivalent circuit model parameters. Through this procedure, a parameter sequence

$\:\left\{\theta\:\left(1\right),\theta\:\left(2\right),\dots\:\right\}$

can be obtained from each charge-discharge cycle. These parameters possess clear physical interpretations and directly reflect the time-varying internal impedance and dynamic characteristics of the battery. Based on the identified parameter trajectories, statistical analysis is subsequently performed to extract health indicators that characterize the battery aging trend. These physically meaningful features are then used as inputs to the subsequent RUL prediction model, thereby achieving effective integration of physical mechanisms with data-driven learning approaches.

2.3 IWOA

The predictive performance of deep learning models largely depends on the configuration of their hyperparameters. Traditional manual tuning or grid search methods are not only inefficient but also struggle to locate global optima in complex, high-dimensional parameter spaces. This limitation significantly constrains both the model’s performance potential and deployment efficiency.

To address this challenge, this study introduces an Improved Whale Optimization Algorithm (IWOA) to achieve automated global optimization of key hyperparameters in the TimeXer model. The application of this algorithm is a crucial step in ensuring that the proposed physics-embedded framework can consistently achieve optimal performance.

2.3.1 Whale Optimization Algorithm

The Whale Optimization Algorithm (WOA)[28] is a metaheuristic optimization method inspired by the cooperative hunting behavior of humpback whales in nature. By mimicking the processes of encircling prey, spiral bubble-net attacking, and random searching, WOA iteratively searches for the global optimum. The mathematical formulation of WOA mainly consists of three stages, as described below.

(1) Encircling prey

Assume that the population size of whales is

$\:\:N$

, and the position vector of each whale at iteration

$\:t$

is denoted as

$\:X\left(t\right)$

. Since the exact position of the prey is unknown, the algorithm assumes that the whale with the best fitness value in the current population represents the approximate location of the prey, denoted as

$\:{X}^{*}\left(t\right)$

. Other whales update their positions by moving toward this best individual, thereby gradually encircling the prey. The encircling behavior is mathematically modeled as:

$\:\begin{array}{c}D=\left|C{X}^{*}\left(t\right)-X\left(t\right)\right|\#\left(5\right)\end{array}$

$\:\begin{array}{c}X\left(t+1\right)={X}^{*}\left(t\right)-AD\#\left(6\right)\end{array}$

where

$\:D$

denotes the distance vector between the current whale and the best solution,

$\:t\:$

is the iteration index,

$\:X\left(t\right)$

is the position vector of a whale, and

$\:{X}^{*}\left(t\right)$

represents the best position vector obtained so far. The coefficient vectors

$\:A$

and

$\:C$

are defined as:

$\:\begin{array}{c}A=2a{r}_{1}-a\#\left(7\right)\end{array}$

$\:\begin{array}{c}C=2{r}_{2}\#\left(8\right)\end{array}$

$\:\begin{array}{c}a=2-\frac{2t}{{T}_{max}}\#\left(9\right)\end{array}$

where

$\:{r}_{1}$

and

$\:{r}_{2}$

are random numbers uniformly distributed in [0,1],

$\:a$

is the convergence factor that linearly decreases from 2 to 0 as the iteration progresses, and

$\:{T}_{\text{m}\text{a}\text{x}}$

denotes the maximum number of iterations.

(2) Spiral bubble-net attacking mechanism

In real hunting scenarios, humpback whales approach their prey along a spiral-shaped trajectory. This behavior is modeled in WOA using a logarithmic spiral equation, and the position update rule is expressed as:

$\:\begin{array}{c}X\left(t+1\right)={X}^{*}\left(t\right)+{D}_{p}{e}^{bl}\text{cos}\left(2\pi\:l\right)\#\left(10\right)\end{array}$

where

$\:{D}_{p}=\mid\:{X}^{*}\left(t\right)-X\left(t\right)\mid\:$

represents the distance between the whale and the current best position,

$\:b$

is a constant defining the shape of the spiral, and

$\:l\in\:[-\text{1,1}]$

is a random number.Since whales alternately perform shrinking encircling and spiral attacking behaviors during the hunting process, WOA employs a probabilistic mechanism to model these two strategies simultaneously. Let

$\:{P}_{i}$

denote the probability of performing the encircling prey behavior; accordingly, the probability of executing the spiral bubble-net attacking behavior is

$\:1-{P}_{i}$

. The unified position update rule can thus be written as:

$\:\begin{array}{c}X\left(t+1\right)=\left\{\begin{array}{c}{X}^{*}\left(t\right)-AD\#,p<{P}_{i}\\\:{X}^{*}\left(t\right)+{D}_{p}{e}^{bl}\text{cos}\left(2\pi\:l\right)\#,\#p\ge\:{P}_{i}\end{array}\right.\#\left(11\right)\end{array}$

where

$\:p\in\:\left[\text{0,1}\right]$

is a random number.As the iteration proceeds, the convergence factor

$\:a$

gradually decreases, which restricts the range of the coefficient vector

$\:A$

to [-1,1]. When

$\:\mid\:A\mid\:<1$

, whale individuals update their positions toward the best solution, indicating that the algorithm enters the exploitation phase and intensifies the search around the prey.

(3) Search for prey

When

$\:\mid\:A\mid\:\ge\:1$

, the algorithm assumes that the current best solution may correspond to a local optimum. In this case, whales switch to a random search strategy to explore the solution space. Specifically, a random whale position

$\:{X}_{\text{rand}}$

is selected, and other whales update their positions relative to this randomly chosen individual. The mathematical model is given by:

$\:\begin{array}{c}D=\left|C{X}_{rand}-X\left(t\right)\right|\#\left(12\right)\end{array}$

$\:\begin{array}{c}X\left(t+1\right)={X}_{rand}-AD\#\left(13\right)\end{array}$

This mechanism enhances the global exploration capability of the algorithm, enabling it to escape from local optima and improving the overall optimization performance.

2.3.2 Improvement Strategies

Although the standard whale optimization algorithm (WOA) is characterized by a simple structure and a small number of control parameters, it still suffers from several limitations when dealing with complex optimization problems, such as insufficient convergence accuracy and a tendency to become trapped in local optima. To overcome these drawbacks, two core improvement strategies are proposed in this study.

(1) Nonlinear convergence factor: The convergence factor

$\:a$

, which plays a critical role in balancing exploration and exploitation in the WOA, is modified from the conventional linear decreasing scheme to a nonlinear adaptive strategy based on a sinusoidal function:

$\:\begin{array}{c}a=2-2{\left(\frac{t}{{T}_{max}}\right)}^{\frac{1}{2}}\text{sin}\left(\pi\:\frac{t}{{T}_{max}}\right)\#\left(14\right)\end{array}$

The sinusoidal function exhibits smooth, continuous, and nonlinear variation, enabling the algorithm to maintain strong global exploration capability in the early iterations while accelerating convergence toward the optimal solution in the later stages. This mechanism effectively enhances the balance between exploration and exploitation throughout the optimization process.

(2) Adaptive weight and random perturbation: An iteration-dependent adaptive weight factor

$\:\omega\:$

is introduced into the position update equation, and a random perturbation term

$\:\delta\:$

is incorporated during the exploitation phase:

$\:\begin{array}{c}\omega\:={\omega\:}_{min}+\left({\omega\:}_{max}-{\omega\:}_{min}\right){e}^{-\frac{\alpha\:t}{{T}_{max}}}\#\left(15\right)\end{array}$

$\:\begin{array}{c}\delta\:=\beta\:\left(2rand-1\right){e}^{-\frac{t}{{T}_{max}}}\#\left(16\right)\end{array}$

where

$\:{\omega\:}_{\text{m}\text{a}\text{x}}$

and

$\:{\omega\:}_{\text{m}\text{i}\text{n}}$

denote the maximum and minimum values of the weight factor, respectively,

$\:\alpha\:$

is the decay coefficient,

$\:\beta\:$

controls the perturbation amplitude, and

$\:\text{rand}\in\:\left[\text{0,1}\right]$

is a uniformly distributed random number.

Accordingly, the improved position update rule can be expressed as:

$\:\begin{array}{cccc}&\:X(t+1)=\left\{\begin{array}{cc}\omega\:{X}^{*}\left(t\right)-AD+\delta\:,&\:p<{P}_{i},\\\:\omega\:{X}^{*}\left(t\right)+{D}_{p}{e}^{bl}\text{c}\text{o}\text{s}\left(2\pi\:l\right)+\delta\:,&\:p\ge\:{P}_{i},\\\:{X}_{\text{rand}}-AD,&\:\end{array}\right.&\:&\:\text{(17)}\end{array}$

where

$\:{X}^{*}\left(t\right)$

denotes the current best solution,

$\:{X}_{\text{rand}}$

represents a randomly selected whale position, and the remaining parameters follow the definitions in the standard WOA framework.

The adaptive weight factor

$\:\omega\:$

contributes to accelerating convergence,while the random perturbation term

$\:\delta\:$

effectively assists the population in escaping local optima, thereby enhancing the robustness of the algorithm. Through these improvements, the proposed improved WOA (IWOA) preserves the advantages of the standard WOA while significantly enhancing its search capability and convergence stability in complex high-dimensional spaces. As a result, it provides reliable technical support for hyperparameter optimization of the TimeXer model[29].

2.3.3 Performance Evaluation of IWOA

To verify the effectiveness of the proposed improved whale optimization algorithm (IWOA), comparative experiments are conducted in this section using the CEC2017 benchmark test suite. Four representative benchmark functions, namely F3, F5, F7, and F9, are selected, covering both unimodal and multimodal optimization scenarios[30], as illustrated in Fig. 2. These benchmark functions are employed to comprehensively evaluate the optimization performance of IWOA in comparison with the standard whale optimization algorithm (WOA) and the sparrow search algorithm (SSA) under varying levels of problem complexity.

To ensure fairness and objectivity in the performance evaluation, all algorithms are tested under identical experimental conditions. Specifically, the dimensionality of the benchmark functions is set to 10, the population size is fixed at 30, and the maximum number of iterations is set to 500. In addition, each algorithm is independently executed 30 times, and the average results are reported as the final performance metrics to mitigate the influence of stochastic randomness.




A Fig. 2 Three-dimensional fitness landscapes of benchmark functions (F3, F5, F7, and F9) and convergence curves of IWOA, WOA, and SSA on different test functions, including both unimodal and multimodal cases.

Table 1

Comparison of the performance of three algorithms on the benchmark functions

Function name	Type	Mathematical Expression	Algorithm	Mean Objective Function Value
F3	Unimodal	$\:{F}_{3}\left(\text{x}\right)=\:{Z}_{1}^{2}\:+\:{10}^{6}{\sum\:}_{\text{i}=2}^{\text{D}}{\text{Z}}_{\text{i}}^{2}\:+\:300$	IWOA	4.39e + 03
			WOA	1.56e + 04
			SSA	6.26e + 03
F5	Multimodal	$\:{F}_{5}\left(\text{x}\right)=\:10\text{D}+{\sum\:}_{i=1}^{D}{[\text{Z}}_{\text{i}}^{2}-10\text{cos}\left(2\pi\:{Z}_{i}\right)]\:+\:500$	IWOA	5.64e + 02
			WOA	5.85e + 02
			SSA	5.78e + 02
F7	Multimodal	$\:{\text{F}}^{7}\left(\text{x}\right)=\text{min}\left\{\sum\:{\left({Z}_{i}-{\mu\:}_{0}\right)}^{2},\:\text{d}\text{D}+\text{S}\sum\:{\left({Z}_{i}-{\mu\:}_{1}\right)}^{2}\right\}\:\:\:\:+10[\text{D}-\sum\:\text{cos}\left(2\pi\:\left({Z}_{i}-{\mu\:}_{0}\right)\right)]+700$	IWOA	7.92e + 02
			WOA	8.03e + 02
			SSA	8.14e + 02
F9	Multimodal	$\:{F}_{9}\left(\text{x}\right)=\:10\text{D}+{\sum\:}_{i=1}^{D}{[\text{Z}}_{i}^{2}-10\text{cos}\left(2\pi\:{Z}_{i}\right)]\:+\:900$	IWOA	1.38e + 03
			WOA	1.66e + 03
			SSA	1.77e + 03

Where Z = M·(x-o), M is a D×D orthogonal rotation matrix, o is a

$\:D$

-dimensional shift vector, and

$\:D=10$

As shown in Fig. 2and the results summarized in Table 1, the proposed IWOA consistently exhibits superior optimization performance and faster convergence speed across the four representative benchmark functions. For the unimodal function F3, the convergence curve of IWOA remains consistently below those of WOA and SSA throughout the entire iteration process. It demonstrates a rapid decrease in the early stage and continues to improve steadily in the later stage, achieving a final average objective function value of only

$\:4.39\times\:{10}^{3}$

, which is significantly better than that of WOA (

$\:1.56\times\:{10}^{4}$

) and SSA (

$\:6.26\times\:{10}^{3}$

). This indicates that IWOA is more effective in both global exploration and fine-grained exploitation.For the multimodal functions F5, F7, and F9, IWOA similarly demonstrates a stronger ability to escape local optima. Its convergence curves exhibit smoother and more stable descending trends, and the final mean objective values reach

$\:5.64\times\:{10}^{2}$

$\:7.92\times\:{10}^{2}$

, and

$\:1.38\times\:{10}^{3}$

, respectively, all outperforming the comparison algorithms. These results confirm that the proposed improvement strategies effectively enhance the global search capability and robustness of the algorithm in complex multimodal optimization landscapes.

2.4TimeXer

To effectively leverage exogenous variables and enhance the prediction accuracy of the target sequence, this study employs the TimeXer model. Built upon the classical Transformer architecture, the key innovation of TimeXer lies in its ability to process endogenous and exogenous variables using distinct embedding strategies, while capturing dependencies along both the temporal and variable dimensions through self-attention and cross-attention mechanisms, without modifying the original Transformer components.

The TimeXer architecture, illustrated in Fig. 3, comprises the following key components: (1)Endogenous Sequence Embedding: The endogenous sequence is divided into segments, generating segment-level time tokens and a sequence-level global token.(2)Exogenous Sequence Embedding: Each exogenous sequence is embedded as a variable-level token.(3)Endogenous Self-Attention Layer: Self-attention is computed among all endogenous tokens (including time tokens and the global token) to capture fine-grained temporal dependencies.(4)Exogenous-Endogenous Cross-Attention Layer: The endogenous global token serves as the query, while the exogenous variable tokens act as keys and values. Cross-attention integrates exogenous information into the representation of the endogenous sequence, enhancing the model’s ability to account for auxiliary inputs.


A Fig. 3 Architecture of the TimeXer Model
In forecasting tasks involving exogenous variables, let the univariate target sequence (endogenous sequence) be denoted by $\:{\text{x}}_{1:T}=\left\{{x}_{1},{x}_{2},\cdots\:,{x}_{T}\right\}\in\:{\mathbb{R}}^{T\times\:1}$ ,and the exogenous variables by $\:{\text{z}}_{1:{T}_{ex}}=\left\{{\text{z}}_{1:{T}_{ex}}^{\left(1\right)},{\text{z}}_{1:{T}_{ex}}^{\left(2\right)},\cdots\:{\text{z}}_{1:{T}_{ex}}^{\left(C\right)}\right\}\in\:{\mathbb{R}}^{{T}_{ex}\times\:C}$ . Here, $\:T$ and $\:{T}_{\text{e}\text{x}}$ represent the look-back window lengths of the endogenous and exogenous sequences, respectively, and may differ to accommodate heterogeneous sampling frequencies. The prediction model $\:{F}_{\theta\:}$ aims to generate $\:S$ -step-ahead forecasts based on $\:({x}_{1:T},{z}_{1:{T}_{\text{e}\text{x}}})$ :

Fig. 3 Architecture of the TimeXer Model

In forecasting tasks involving exogenous variables, let the univariate target sequence (endogenous sequence) be denoted by

$\:{\text{x}}_{1:T}=\left\{{x}_{1},{x}_{2},\cdots\:,{x}_{T}\right\}\in\:{\mathbb{R}}^{T\times\:1}$

,and the exogenous variables by

$\:{\text{z}}_{1:{T}_{ex}}=\left\{{\text{z}}_{1:{T}_{ex}}^{\left(1\right)},{\text{z}}_{1:{T}_{ex}}^{\left(2\right)},\cdots\:{\text{z}}_{1:{T}_{ex}}^{\left(C\right)}\right\}\in\:{\mathbb{R}}^{{T}_{ex}\times\:C}$

. Here,

$\:T$

and

$\:{T}_{\text{e}\text{x}}$

represent the look-back window lengths of the endogenous and exogenous sequences, respectively, and may differ to accommodate heterogeneous sampling frequencies. The prediction model

$\:{F}_{\theta\:}$

aims to generate

$\:S$

-step-ahead forecasts based on

$\:({x}_{1:T},{z}_{1:{T}_{\text{e}\text{x}}})$

$\:\begin{array}{c}{\widehat{\text{x}}}_{T+1:T+S}={\mathcal{F}}_{\theta\:}\left({\text{x}}_{1:T},{\text{z}}_{1:{T}_{ex}}\right)\#\left(18\right)\end{array}$

2.4.1 Endogenous Sequence Embedding

To reduce sequence length while retaining local temporal structures, TimeXer divides

$\:{x}_{1:T}$

into where N=[T/P]and each segment has length

$\:P$

. Each segment is then linearly projected to obtain the segment-level temporal tokens:

$\:\begin{array}{c}\left\{{s}_{1},{s}_{2},\cdots\:{s}_{N}\right\}=Patchify\left({\text{x}}_{1:T}\right)\#\left(19\right)\end{array}$

$\:\begin{array}{c}{P}_{en}=PatchEmbed\left({s}_{1},{s}_{2},\cdots\:,{s}_{N}\right)\#\left(20\right)\end{array}$

Considering the potential mismatch in granularity between endogenous and exogenous information, TimeXer introduces a learnable global token

$\:{G}_{\text{e}\text{n}}$

to provide a holistic representation of the sequence and to serve as the interface for exogenous-endogenous interaction:

$\:\begin{array}{c}{G}_{en}=Learnable\left(\text{x}\right)\#\left(21\right)\end{array}$

Thus, the endogenous embedding fed into the encoder consists of

$\:N$

segment tokens

$\:{P}_{\text{e}\text{n}}$

and one global token

$\:{G}_{\text{e}\text{n}}$

2.4.2 Exogenous Sequence Embedding

For exogenous variables, each sequence

$\:{z}^{\left(i\right)}$

is embedded on a per-variable basis. The VariateEmbed (·) module maps each sequence into a

$\:D$

-dimensional variable token:

$\:\begin{array}{c}{V}_{ex,i}=VariateEmbed\left({\text{z}}^{\left(i\right)}\right),i\in\:\left\{\text{1,2},\cdots\:,C\right\}\#\left(22\right)\end{array}$

This design naturally handles differing sequence lengths, misaligned timestamps, and heterogeneous sampling rates, while enabling the model to capture inter-variable dependencies at the variable level.

2.4.3 Endogenous Self-Attention

The segment tokens and the global token jointly enter the self-attention layer to model intra-sequence dependencies. Let the input of the

$\:l$

-th layer be

$\:[{P}_{\text{e}\text{n}}^{l},{G}_{\text{e}\text{n}}^{l}]$

. The self-attention update is given by:

$\:\begin{array}{c}\left[{P}_{en}^{l,attn},{G}_{en}^{l,attn}\right]=Self-Attention\left(\left[{P}_{en}^{l},{G}_{en}^{l}\right]\right)\#\left(23\right)\end{array}$

The output is then normalized and passed through a feed-forward network (FFN):

$\:\begin{array}{c}\left[{P}_{en}^{l+1},{G}_{en}^{l+1}\right]=FFN\left(LayerNorm\left(\left[{P}_{en}^{l},{G}_{en}^{l}\right]+\left[{P}_{en}^{l,attn},{G}_{en}^{l,attn}\right]\right)\right)\#\left(24\right)\end{array}$

During this process, the global token aggregates global contextual information while providing feedback to the segment tokens, enabling hierarchical temporal modeling.

2.4.4 Exogenous-Endogenous Cross-Attention

To incorporate exogenous information, the updated global token

$\:{G}_{\text{e}\text{n}}^{l+1}$

is used as the query

$\:Q$

, while the exogenous tokens

$\:{V}_{\text{e}\text{x}}$

serve as keys

$\:K$

and values

$\:V$

in a cross-attention operation:

$\:\begin{array}{c}{G}_{en}^{l+1,cross}=Cross-Attention\left(Q={G}_{en}^{l+1},K={V}_{ex},V={V}_{ex}\right)\#\left(25\right)\end{array}$

The resulting fusion representation is passed through an FFN to produce the updated global token:

$\:\begin{array}{c}{G}_{en}^{l+1}=FFN\left(LayerNorm\left({G}_{en}^{l+1}+{G}_{en}^{l+1,cross}\right)\right)\#\left(26\right)\end{array}$

The enhanced global token subsequently influences the segment tokens in the next self-attention layer, enabling the exogenous information to propagate through the endogenous sequence representation in a hierarchical manner.

2.4.5 Output Layer and Loss Function

Finally, the segment tokens and global token from the last layer

$\:L$

are flattened and projected to generate the

$\:S$

-step predictions:

$\:\begin{array}{c}{\widehat{\text{x}}}_{T+1:T+S}=Projection\left(\left[{P}_{en}^{L},{G}_{en}^{L}\right]\right)\#\left(27\right)\end{array}$

The model is trained using the mean squared error (MSE) loss:

$\:\begin{array}{c}L=\frac{1}{S}\sum\:_{i=1}^{S}{‖{x}_{T+i}-{\widehat{x}}_{T+i}‖}_{2}^{2}\#\left(28\right)\end{array}$

2.5 Prediction Process

The proposed IWOA-ECM-TimeXer hybrid prediction framework achieves accurate prediction of the remaining useful life of lithium-ion batteries through a structured end-to-end process.The core workflow consists of three main stages: physical feature extraction, model hyperparameter optimization, and time series prediction, as illustrated in Fig. 4.


A Fig. 4 IWOA-ECM-TimeXer framework flowchart

The framework begins with physical feature extraction: based on the first-order equivalent circuit model (ECM) and the recursive least squares (RLS) method, key electrochemical parameters (R₀, R₁, C₁) are online identified from battery operation data. These parameters serve as fundamental inputs for constructing a set of health factors that reflect battery degradation characteristics.

Subsequently, the process enters the model optimization and prediction stage: the extracted health factors are used as exogenous input features for the TimeXer model. The improved Whale Optimization Algorithm (IWOA) is then employed to perform global optimization of the model’s key hyperparameters, ensuring the best configuration. The optimized TimeXer model, leveraging its dual-path attention mechanism, conducts deep modeling of the health feature sequences, thereby achieving accurate prediction of the future remaining useful life (RUL).

3. Experimental Design

3.1 Dataset

As summarized in Table 2, this study employs the lithium-ion battery cycling datasets released by the University of Oxford.

Table 2

Description of the Oxford Battery Datasets

Datasets	Oxford
Battery capacity	740mAh
Material of battery cell	LiMO₂/Graphite
Nominal voltage	4.2V
Test temperature	40℃
Charge/discharge cut-off voltage	4.2/2.7V
Charge/discharge rate	0.74A(1C)

The Oxford battery degradation dataset consists of aging data from eight Kokam lithium-ion pouch cells, labeled Cell 1 through Cell 8, each with a nominal capacity of 740 mAh. The negative electrode material of the batteries is graphite, while the positive electrode material is LiMO₂. During the aging experiments, all batteries were subjected to charge–discharge cycling under dynamic operating conditions to simulate realistic load variations encountered in electric vehicle applications. A cycling current rate of 2C was applied to accelerate battery degradation. To accurately evaluate the capacity fading behavior, the battery capacity was measured every 100 aging cycles using a complete charge–discharge process at a 1C current rate. Figure 5 illustrates the capacity degradation trajectories of the eight batteries throughout the aging process[31].


A Fig. 5 Capacity degradation curves of Oxford datasets

3.2 Parameter Identification Results

To systematically investigate the evolution patterns of internal state parameters during the aging process of lithium-ion batteries, this study conducts comprehensive parameter identification based on a first-order equivalent circuit model across multiple battery datasets. Since all datasets exhibit similar aging trends, Cell 1 from the Oxford battery dataset is selected as a representative example. The identified key parameters(R₀, R₁ and C₁ )over the entire aging cycle are presented in Fig. 6.

Figure 6(a) illustrates the evolution trajectory of the ohmic resistance R₀. The identification results show that R₀ increases from an initial value of 0.0209 Ω to 0.0406 Ω, corresponding to a 94.3% rise. This monotonically increasing trend is physically associated with irreversible aging mechanisms such as the continuous growth of the solid electrolyte interphase (SEI) layer and the loss of active lithium during cycling. (a) Evolution curve of ohmic resistance R0, (b) Evolution curve of polarization resistance R1, (c) Evolution curve of polarization capacitance C1, (d) Capacity degradation curve

Figure 6(b) presents the dynamic response characteristics of the polarization resistance R₁, which rises from 0.02671 Ω to 0.0869 Ω. This nonlinear evolution pattern may result from structural phase transitions within the electrode materials and the cumulative effect of charge transfer resistance.

Figure 6(c) depicts the degradation trajectory of the polarization capacitance C₁, which decreases from an initial value of 561 F to 118 F a reduction of 78.9%. This decline directly reflects the reduction of the effective electrode surface area and the deterioration of ionic diffusion capability, both of which are closely related to microstructural changes in the electrode materials.

Figure 6(d) presents the capacity fading trajectory of Cell 1. Identification results show that the capacity gradually fades from an initial value of about 0.74 mAh to about 0.53 mAh, showing a continuous decreasing trend with the increase of cycle number.

Overall, these parameter identification results not only validate the capability of the first-order equivalent circuit model to characterize the aging state of lithium-ion batteries but also provide important quantitative insights into the underlying degradation mechanisms. The pronounced variations observed in each parameter’s evolution trajectory demonstrate the effectiveness and sensitivity of equivalent circuit model parameters as indicators of battery health status.

3.2.1 Construction of High-Order Health Indicators

To further extract the aging information embedded in the equivalent circuit model parameters, this study constructs six types of high-order health indicators based on the three identified fundamental parameters (R ,R_1,C₁ )These indicators possess clear physical interpretations and characterize the internal state variations of the battery from multiple perspectives[32][33]. The mathematical definitions and physical meanings of the high-order health indicators are summarized in Table 3.

Table 3

Definitions of high-order health indicators.

Health Indicator	Mathematical Expression	Physical Interpretation
H1: Total Impedance Factor	$\:{H}_{1}={R}_{0}+{R}_{1}$	Represents the overall impedance state of the battery, reflecting the combined effects of ohmic and electrochemical polarization.
H2: Polarization Time Constant	$\:{H}_{2}=\tau\:={R}_{1}\times\:{C}_{1}$	Describes the dynamic response characteristics of the polarization process and is related to the charge transfer rate at the electrode–electrolyte interface.
H3: Internal Resistance Ratio	$\:{H}_{3}=\frac{{R}_{1}}{{R}_{0}}$	Reflects the relative variation between different types of impedance, indicating the dominant aging mode (ohmic or electrochemical polarization).
H4: Impedance Growth Factor	$\:{H}_{4}=\frac{{R}_{0}-{R}_{00}}{{R}_{00}}$	Quantifies the relative increase in ohmic resistance with respect to the initial value R₀₀; directly related to the growth of the SEI layer thickness.
H5: Capacitance Degradation Factor	$\:{H}_{5}=\frac{{C}_{1}-{C}_{10}}{{C}_{10}}$	Represents the relative decline in polarization capacitance with respect to the initial value C₁₀; reflects the loss of active surface area of the electrodes.
H6: Comprehensive Aging Index	$\:{H}_{6}=\frac{{R}_{0}}{{R}_{00}}\times\:\frac{{C}_{10}}{{C}_{1}}$	A composite indicator integrating impedance growth and capacitance degradation, used to comprehensively assess the overall aging state of the battery.

Based on the feature space constructed from the above health indicators, further evaluation is required to assess the degree of association between each indicator and the battery’s capacity degradation, in order to identify the most predictive subset of features for subsequent modeling.

3.2.2 Correlation Analysis and Feature Selection

To quantitatively evaluate the relationship between the constructed high-order health indicators and battery capacity degradation, this study adopts the Spearman rank correlation coefficient for correlation analysis[34][35]. This method measures the monotonic relationship between variables based on rank differences and is well suited for handling nonlinear and non-normally distributed data, making it particularly applicable to the complex nonlinear characteristics encountered in battery aging processes.

Let the feature variable be denoted as

$\:{X}_{i}$

and the target capacity as

$\:{Y}_{i}$

. The Spearman correlation coefficient is defined as:

$\:\begin{array}{c}\rho\:=1-\frac{6{\sum\:}_{i=1}^{m}{{D}_{i}}^{2}}{m\left({m}^{2}-1\right)}=1-\frac{6{\sum\:}_{i=1}^{m}{\left({R}_{i}-{S}_{i}\right)}^{2}}{m\left({m}^{2}-1\right)}\#\left(29\right)\end{array}$

where

$\:{R}_{i}$

and

$\:{S}_{i}$

are the ranks of

$\:{X}_{i}$

and

$\:{Y}_{i}$

, respectively,

$\:{D}_{i}$

represents the difference between the corresponding ranks, and

$\:m$

denotes the sample size. The correlation results are illustrated in Fig. 7 and summarized in Table 4.


A Fig. 6 Heatmap and bar chart of Spearman correlation coefficients

Table 4

Correlation analysis results between health indicators and battery capacity

Dataset		H1	H2	H3	H4	H5	H6
Oxford Battery Dataset	Cell 1	-0.9953	0.9083	-0.6858	-0.9989	0.9773	-0.9933
	Cell 2	-0.9923	0.9307	-0.5520	-0.9961	0.9791	-0.9924
	Cell 3	-0.9961	0.8724	-0.8035	-0.9987	0.9801	-0.9929
	Cell 4	-0.9978	0.9370	-0.9284	-0.9944	0.9956	-0.9979
	Cell 5	-0.9989	0.9206	-0.8480	-0.9985	0.9978	-0.9984
	Cell 6	-0.9972	0.8849	-0.6985	-0.9962	0.9961	-0.9979
	Cell 7	-0.9976	0.9268	-0.7547	-0.9989	0.9888	-0.9951
	Cell 8	-0.9958	0.9096	-0.8158	-0.9989	0.9835	-0.9929

As shown in Fig. 7 and Table 4, the correlations between the health indicators and capacity exhibit consistent trends across different datasets:(1) H₁, H₄, and H₆ show a strong negative correlation (ρ<-0.9), indicating that the increase in battery impedance is highly synchronized with capacity degradation.(2) H₂ and H₅ exhibit a significant positive correlation (𝜌>0.9), suggesting that both the extension of polarization time and the attenuation of capacitance intensify as capacity loss progresses.(3) H₃ presents a moderate negative correlation (− 0.55<𝜌<−0.80), implying that while it provides useful insights into distinguishing between different aging modes (ohmic-dominated or polarization-dominated), it is not a primary determinant of capacity variation.

Based on the above analysis, it can be observed that the constructed health indicators demonstrate complementary characteristics in describing battery degradation behavior. Among them, H₁, H₄, and H₆ are the most sensitive to capacity changes, serving as key indicators for characterizing the aging progression. Meanwhile, H₂ and H₅ reveal the dynamic evolution of polarization from a kinetic perspective, supporting real-time monitoring of battery health states. Therefore, by comprehensively considering correlation strength, physical interpretability, and inter-feature independence, this study ultimately selects the aforementioned(H₁,H₂,H₄,H_5,H₆)five health indicators to construct a multi-scale health feature set. This feature set provides a holistic representation of battery degradation from impedance growth and polarization dynamics to overall deterioration, serving as a high-quality input foundation for the subsequent capacity prediction model.

4. Experimental Results and Analysis

4.1 Evaluation Metrics

To quantitatively evaluate the predictive performance of the proposed model, three commonly used regression performance metrics are adopted in this study, namely the root mean square error (RMSE), mean absolute error (MAE), and the coefficient of determination

$\:{R}^{2}$

[36]. The calculation formulas of these evaluation metrics are given as follows:

$\:\begin{array}{c}RMSE=\sqrt{\frac{1}{N}\sum\:_{i=1}^{N}{{(y}_{i}-{\widehat{y}}_{i})}^{2}}\#\left(30\right)\end{array}$

$\:\begin{array}{c}MAE=\frac{1}{N}\sum\:_{i=1}^{N}\left|{y}_{i}-{\widehat{y}}_{i}\right|\#\left(31\right)\end{array}$

$\:\begin{array}{c}{R}^{2}=1-\frac{\sum\:_{i=1}^{N}\:{\left({\widehat{y}}_{i}-{y}_{i}\right)}^{2}}{\sum\:_{i=1}^{N}\:{\left({y}_{i}-\stackrel{-}{y}\right)}^{2}},\#\left(32\right)\end{array}$

where

$\:N$

denotes the total number of lithium-ion battery capacity data samples,

$\:{y}_{i}$

represents the true capacity value of the lithium-ion battery,

$\:{\widehat{y}}_{i}$

denotes the corresponding predicted capacity value, and

$\:\stackrel{\prime }{y}$

is the mean value of the measured capacity data.

4.2 Hyperparameter Settings

In terms of hyperparameter configuration, reasonable search ranges are specified for several key hyperparameters of the TimeXer model, and an IWOA is employed to automatically search for the optimal combination within these ranges. Specifically, the learning rate is searched within the range of 0.0001 to 0.005 to balance training stability and convergence speed. The hidden feature dimension (d_model) is set between 32 and 128 to achieve a trade-off between model representation capability and computational efficiency. The number of attention heads (n_heads) is limited to 1–4 to avoid parameter redundancy caused by an excessive number of heads. The dropout rate is constrained to the range of 0.05–0.30 to mitigate the risk of overfitting. The numbers of encoder layers (e_layers) and decoder layers (d_layers) are searched within the ranges of 1–3 and 1–2, respectively.

The above hyperparameter ranges are determined by referring to commonly adopted configurations in existing studies on battery RUL prediction and time-series Transformer-based models[37][38], and are further adjusted according to the scale of the Oxford battery dataset and the overall model complexity. For the IWOA, the population size and the maximum number of iterations are set to 30 and 15, respectively, with the inertia weight nonlinearly decaying from 0.4 to 0.9. During the optimization process, the MSE on the validation set is used as the fitness function. After completing the hyperparameter search on the initially selected training–validation split, the obtained optimal hyperparameter combination is fixed and applied to all subsequent leave-one-battery-out cross-validation experiments. This strategy ensures the objectivity of hyperparameter selection and enhances the generalization capability of the proposed model.

4.3 RUL Prediction Comparison

This study is conducted based on the publicly available Oxford battery degradation dataset provided by the University of Oxford, which consists of eight lithium-ion batteries aged under identical experimental conditions. To systematically evaluate the cross-battery generalization capability of the proposed model, a rigorous Leave-One-Battery-Out (LOBO) cross-validation strategy is adopted.

Specifically, in each validation round, one battery is selected as the test set, while the remaining seven batteries are further divided into a training set and a validation set. Among these seven batteries, one battery is used for validation and the remaining six batteries are used for model training. Consequently, for each test battery, seven distinct “training–validation–test” configurations are generated. In total, the experimental framework involves 56 (8 × 7) independent model training and evaluation processes. This cross-validation design exhaustively covers all possible cross-battery combinations, enabling a comprehensive assessment of the model’s generalization ability and robustness across different batteries, while effectively preventing data leakage.

To ensure the scientific rigor and reliability of the experimental results, a complete set of comparative and ablation experiments is designed. In the comparative experiments, the proposed IWOA-TimeXer model is benchmarked against several baseline models, including WOA-TimeXer, TimeXer, CNN + LSTM, and the standard Transformer model, to verify the effectiveness of the proposed improvements. The ablation experiments are conducted by progressively removing key components of the model, thereby analyzing the contribution of each module to the overall performance. In addition, each experimental configuration is repeated multiple times, and statistical analysis is performed to mitigate the influence of random factors, ensuring the reproducibility of the results and the reliability of the conclusions. In the figures, the red dashed line in the figure indicates the point where the battery capacity has decreased to 80% of its initial value.The prediction results are shown in Fig. 8.


༈a༉ Cell 1	༈b༉ Cell 2

༈c༉ Cell 3	༈d༉ Cell 4

༈e༉ Cell 5	༈f༉ Cell 6

༈g༉ Cell 7	༈h༉ Cell 8
A Fig. 7 RUL prediction results on the Oxford battery dataset
Table 5 Error metrics on the Oxford battery dataset

Fig. 9

illustrates the comprehensive prediction performance of different models across eight batteries (Cell 1-Cell 8), including a radar chart of MAE, a bar chart comparison of RMSE, and a scatter plot distribution of

$\:{R}^{2}$

. By jointly analyzing these three visualizations together with the numerical results reported in Table 5, the superior performance of the proposed IWOA-TimeXer model over the other benchmark models can be clearly observed in terms of accuracy, stability, and consistency.

First, as shown in the MAE radar chart, the IWOA-TimeXer model exhibits the most compact overall contour and remains closest to the center for almost all batteries, indicating the lowest MAE and the smallest overall prediction error. In particular, for Cell 1, Cell 3, Cell 6, and Cell 7, the MAE values of IWOA-TimeXer are only 0.0014, 0.0015, 0.0015, and 0.0008, respectively, which are significantly lower than those obtained by TimeXer, CNN + LSTM, and the Transformer model. Moreover, for Cell 2, which exhibits a more complex degradation trajectory, IWOA-TimeXer still achieves the lowest MAE (0.0057), demonstrating its robust predictive capability under non-stationary operating conditions. It is also observed that for Cell 2 and Cell 5, prediction errors increase noticeably toward the end of the battery life. This phenomenon can be attributed to the sudden acceleration of capacity degradation in the late stage of these batteries, where strong nonlinearity and non-stationary characteristics substantially increase the difficulty of remaining useful life prediction.

Second, the RMSE bar chart provides a direct comparison of error magnitude variations across different batteries. While all models exhibit noticeable performance fluctuations depending on the battery, the RMSE of IWOA-TimeXer consistently remains at the lowest level. Particularly for Cell 4, Cell 6, and Cell 7, which present relatively smooth degradation trends with minor local fluctuations, the RMSE values of IWOA-TimeXer are 0.0023, 0.0030, and 0.0011, respectively, which are markedly lower than those of WOA-TimeXer and the original TimeXer model. These results indicate that the IWOA-based optimization effectively enhances TimeXer’s ability to capture subtle capacity variations, enabling more accurate fitting during dynamically evolving degradation stages.

Finally, the R²scatter plot further highlights the performance differences from the perspective of goodness of fit. As shown in the figure, the R²values of IWOA-TimeXer consistently remain within a high range of 0.96–0.999 for most batteries. In particular, near-perfect fitting performance is achieved for Cell 7 (0.9992), Cell 1 (0.9979), and Cell 3 (0.9978). In contrast, the other models exhibit substantial degradation in fitting performance for Cell 2 and Cell 5; for example, the CNN + LSTM model attains an R²of only 0.8122 for Cell 2, while the TimeXer model drops sharply to 0.6273 for Cell 5. These results reveal the limitations of the baseline models in handling highly nonlinear degradation phases. By comparison, the scatter points of IWOA-TimeXer are more densely clustered and closer to the upper bound of 1.0, indicating superior cross-battery fitting capability and more stable generalization performance.

Overall, the comparative visualizations of the three evaluation metrics confirm that the proposed IWOA-TimeXer model achieves the best performance across all dimensions, including error magnitude (MAE and RMSE) and goodness of fit (R²). The model not only consistently outperforms the unoptimized TimeXer on all batteries, but also demonstrates clear advantages over WOA-TimeXer, CNN + LSTM, and the standard Transformer model. The combined trends observed in the three figures validate that IWOA-TimeXer offers lower prediction errors, stronger robustness, and higher cross-battery consistency, making it a highly suitable and reliable model for battery health prediction tasks.

5.Conclusion

This study addresses the limitations of traditional data-driven models in lithium-ion battery RUL prediction, including the lack of physical mechanism support, insufficient prediction accuracy, and limited interpretability. To this end, a TimeXer-based prediction framework incorporating an equivalent circuit model (ECM) and an IWOA is proposed. By integrating physical modeling with deep learning, the proposed approach enables accurate characterization and efficient prediction of battery degradation behavior. The main conclusions are summarized as follows:

(1) Effectiveness of physical features: The electrochemical parameters identified through ECM modeling can effectively reflect the internal degradation mechanisms of lithium-ion batteries. When these physically meaningful features are introduced as exogenous inputs into the TimeXer model, the model becomes more sensitive to capacity degradation trends, and the prediction results exhibit improved physical consistency. As a result, the interpretability of the prediction model is significantly enhanced.

(2) Advantages of the IWOA optimization strategy: The Improved Whale Optimization Algorithm demonstrates excellent stability and convergence efficiency in global hyperparameter search. Compared with conventional heuristic optimization algorithms, IWOA exhibits stronger global search capability and requires fewer control parameters, enabling it to effectively avoid local optima. The incorporation of IWOA leads to a more robust training process of the TimeXer model across different battery samples, thereby further improving prediction accuracy.

(3) Prediction performance and generalization capability: Experimental results on the Oxford battery dataset indicate that the proposed ECM-IWOA-TimeXer model outperforms WOA-TimeXer, TimeXer, CNN + LSTM, and the standard Transformer model in terms of MAE and RMSE. These results demonstrate the clear advantages of the proposed model in cross-battery generalization and complex degradation modeling, and verify the effectiveness of the physics-informed deep learning fusion strategy in enhancing battery RUL prediction accuracy.

Future work will focus on:(1) Incorporating higher-order physical models (e.g., second- or third-order RC models) to enrich the feature space and improve physical consistency;(2) Integrating multi-source signals (voltage, current, temperature, etc.) for multi-modal modeling, enhancing adaptability to complex operating conditions;(3) Embedding physical constraint equations within model architectures to build intelligent prediction systems that maintain both mechanistic consistency and generalization capability.

Authors’ contribution Pei Tang: Conceptualization and research design. Lihui Liu: Data collection, analysis, and initial manuscript drafting. Zhongran Yao: Provision of experimental materials, equipment, and technical support. Xiaoyong Gu and Zetao Qiu: Assistance in data collection and analysis. Changcheng Sun and Wenbo Lei: Contribution to writing and revising the manuscript.

Funding

This research was supported by the Natural Science Research of Jiangsu Higher Education Institutions of China (Grant No. 23KJB430038).

Data Availability

Oxford Battery Degradation Dataset 1 - ORA - Oxford University Research Archive

Declarations

The authors declare that they have no competing interests.

ORCID iDs

Pei Tang: https://orcid.org/0009-0000-7356-610X

Lihui Liu:https://orcid.org/0009-0005-6286-4968

Wenbo Lei: https://orcid.org/0009-0003-8641-3120

Author Contribution

Pei Tang: Conceptualization and research design. Lihui Liu: Data collection, analysis, and initial manuscript drafting. Zhongran Yao: Provision of experimental materials, equipment, and technical support. Xiaoyong Gu and Zetao Qiu: Assistance in data collection and analysis. Changcheng Sun and Wenbo Lei: Contribution to writing and revising the manuscript.

References

Tao J, Wang S, Cao W et al (2024) A comprehensive review of state-of-charge and state-of-health estimation for lithium-ion battery energy storage systems. Ionics 3010:5903–5927. https://doi.org/10.1007/s11581-024-05686-z

Ansari S, Ayob A, Lipu MSH et al (2022) Remaining useful life prediction for lithium-ion battery storage system: A comprehensive review of methods, key factors, issues and future outlook. Energy Rep 8:12153–12185. https://doi.org/10.1016/j.egyr.2022.09.043

Xu W, Mao R, Han P et al (2025) A comprehensive review of lithium-ion battery remaining useful life prediction: methodologies, datasets, performance metrics, and future perspectives. Meas Sci Technol. https://doi.org/10.1088/1361-6501/adfb97

Yang K, Wang S, Zhou L et al (2025) A Critical Review of AI-Based Battery Remaining Useful Life Prediction for Energy Storage Systems. Batteries 11(10):376. https://doi.org/10.3390/batteries11100376

Hussain A, Mao Z, Li M et al (2025) A Comprehensive Review of the Pseudo-Two‐Dimensional (P2D) Model: Model Development, Solutions Methods, and Applications. Adv Theory Simulations 8(5):2401016. https://doi.org/10.1002/adts.202401016

Zheng B, Deng Z, Luo Z et al (2025) A comprehensive review of lithium-ion battery modelling research and prospects: in-depth analysis of current research and future directions. Appl Energy 401 126688. https://doi.org/10.1016/j.apenergy.2025.126688

Guha A, Patra A (2018) Online estimation of the electrochemical impedance spectrum and remaining useful life of lithium-ion batteries. IEEE Trans Instrum Meas 67(8):1836–1849. https://doi.org/10.1109/TIM.2018.2809138

Li J, Zhao S, Miah MS et al (2023) Research on the remaining useful life prediction method for lithium-ion batteries by fusion of feature engineering and deep learning. Energy Rep 10:3629–3638. https://doi.org/10.1016/j.egyr.2023.10.030

Zhao B, Zhang W, Zhang Y et al (2024) Research on the remaining useful life prediction method for lithium-ion batteries by fusion of feature engineering and deep learning. Appl Energy 358 122325. https://doi.org/10.1016/j.apenergy.2023.122325

10.

Zhao J, Qu X, Li Y et al (2025) Real-time prediction of battery remaining useful life using hybrid-fusion deep neural networks. Energy 328 136618. https://doi.org/10.1016/j.energy.2025.136618

11.

Ren L, Dong J, Wang X et al (2020) A data-driven auto-CNN-LSTM prediction model for lithium-ion battery remaining useful life. IEEE Trans Industr Inf 17(5):3478–3487. https://doi.org/10.1109/TII.2020.3008223

12.

Hongyang N, Nana F, Ming Y et al (2025) Prediction of Lithium-ion Battery’s Remaining Useful Life Using an Improved Approach Combining Variational Mode Decomposition, Gaussian Process Regression, and Gated Recurrent Unit. Int J Electrochem Sci 100973. https://doi.org/10.1016/j.ijoes.2025.100973

13.

Mou J, Yang Q, Tang Y et al (2024) Prediction of the Remaining Useful Life of Lithium-Ion Batteries Based on the 1D CNN-BLSTM Neural Network. 152. https://doi.org/10.3390/batteries10050152. Batteries 10

14.

Wen Q, Zhou T, Zhang C et al (2022) Transformers in time series: A survey. arXiv preprint arXiv:2202.07125. https://doi.org/10.48550/arXiv.2202.07125

15.

Saleem U, Liu W, Riaz S et al (2024) TransRUL: A Transformer-Based Multihead Attention Model for Enhanced Prediction of Battery Remaining Useful Life. Energies 17(16):3976. https://doi.org/10.3390/en17163976

16.

Zhu X, Li L, Wang G et al (2025) A Lithium-Ion Battery Remaining Useful Life Prediction Method Based on Mode Decomposition and Informer-LSTM. Electronics 14(19):3886. https://doi.org/10.3390/electronics14193886

17.

Tayal K, Renganathan A, Jia X et al (2024) ExoTST: Exogenous-Aware Temporal Sequence Transformer for Time Series Prediction. IEEE International Conference on Data Mining 857–862 https://doi.org/10.1109/ICDM59182.2024.00105

18.

Wang Y, Wu H, Dong J et al (2024) Timexer: Empowering transformers for time series forecasting with exogenous variables. Adv Neural Inf Process Syst 37:469–498. https://doi.org/10.52202/079017-0015

19.

Lu K, Gao S, Li J et al (2025) Prediction of Power System Ramping Demand Using Meteorological Features. IEEE Access. https://doi.org/10.1109/ACCESS.2025.3567145

20.

Xia F, Yu Y, Chen J (2024) SOH and RUL prediction of lithium batteries based on fusions of RLOESS filtered electrochemical and thermal features by bidirectional gated recurrent unit network. J Energy Storage. https://doi.org/10.1016/j.est.2024.114134. 102 114134

21.

Durmus F, Karagol S (2024) Lithium-Ion Battery Capacity Prediction with GA-Optimized CNN, RNN, and BP. Appl Sci 14:5662. https://doi.org/10.3390/app14135662

22.

Fu L, Jiang B, Zhu J et al (2025) Early Remaining Useful Life Prediction for Lithium-Ion Batteries Using a Gaussian Process Regression Model Based on Degradation Pattern Recognition. Batteries 11(6):221. https://doi.org/10.3390/batteries11060221

23.

Sun C, Qu A, Zhang J et al (2023) Remaining useful life prediction for lithium-ion batteries based on improved variational mode decomposition and machine learning algorithm. Energies 16(1):313. https://doi.org/10.3390/en16010313

24.

Li J, Ye M, Wang Y et al (2023) A hybrid framework for predicting the remaining useful life of battery using Gaussian process regression. J Energy Storage 66 107513. https://doi.org/10.1016/j.est.2023.107513

25.

Chen C, Wei J, Li Z (2023) Remaining useful life prediction for lithium-ion batteries based on a hybrid deep learning model. Processes 11(8):2333. https://doi.org/10.3390/pr11082333

26.

Dai Nguyen C, Bae SJ (2023) Equivalent circuit simulated deep network architecture and transfer learning for remaining useful life prediction of lithium-ion batteries. J energy storage. https://doi.org/10.1016/j.est.2023.108042. 71 108042

27.

Hong S, Qin C, Lai X et al (2023) State-of-health estimation and remaining useful life prediction for lithium-ion batteries based on an improved particle filter algorithm. J energy storage. https://doi.org/10.1016/j.est.2023.107179. 64 107179

28.

Li Z, Li A, Bai F et al (2023) Remaining useful life prediction of lithium battery based on ACNN-Mogrifier LSTM-MMD. Meas Sci Technol 35(1):016101. https://doi.org/10.1088/1361-6501/ad006d

29.

Wang G et al (2025) State of health estimation for lithium-ion batteries based on incremental capacity analysis and Mamba model optimized by improved whale optimization algorithm. Ionics 31(9):9291–9311. https://doi.org/10.1007/s11581-025-06564-y

30.

Li Y, Shi H, Huang Q et al (2025) Enhanced multi-scale signal decomposition transformer neural network for state of health estimation of lithium-ion batteries. J Energy Storage. https://doi.org/10.1016/j.est.2025.118191. 134 118191

31.

Christoph R, Birkl (2017) Diagnosis and Prognosis of Degradation in Lithium-Ion Batteries, PhD thesis, University of Oxford, Department of Engineering Science. https://ora.ox.ac.uk/objects/uuid:7d8ccb9c-1469-4209-9995-5871fc908b54

32.

Li D, Xu L, Cheng Z (2023) The co-estimation of states for lithium-ion batteries based on segment data. J Energy Storage 62:106787. https://doi.org/10.1016/j.est.2023.106787

33.

Lazanas AC, Prodromidis MI (2023) Electrochemical impedance spectroscopy a tutorial. ACS Meas Sci au 3(3):162–193. https://doi.org/10.1021/acsmeasuresciau.2c00070

34.

Li L, Li Y, Zhang J (2024) A hybrid remaining useful life prediction method for lithium-ion batteries based on transfer learning with CDRSN-BiGRU-AM. Meas Sci Technol 35(5):056124. https://doi.org/10.1088/1361-6501/ad282e

35.

Wang Y, Zhao Y (2023) Three-stage feature selection approach for deep learning‐based RUL prediction methods. Qual Reliab Eng Int 39(4):1223–1247. https://doi.org/10.1002/qre.3288

36.

Zheng D, Zhang Y, Guo X et al (2025) Research on the remaining useful life prediction method for lithium-ion batteries based on feature engineering and CNN-BiGRU-AM model. Ionics 31:5717–5736. https://doi.org/10.1007/s11581-025-06293-2

37.

Han Y, Li C, Zheng L et al (2023) Remaining useful life prediction of lithium-ion batteries by using a denoising transformer-based neural network. Energies 16(17):6328. https://doi.org/10.3390/en16176328

38.

Rastegarpanah A, Asif ME, Stolkin R (2024) Hybrid neural networks for enhanced predictions of remaining useful life in lithium-ion batteries. Batteries 10(3):106. https://doi.org/10.3390/batteries10030106

Yes