Innovative Transformer-Driven Remaining Useful Life (RUL) Prediction Enhanced by Adaptive Multi-Scale Feature Engineering

Lydia Hsiao-Mei Lin ^a, Fang-Kai Ting ^b, Shu-Han Liao ^{c, *}, Simon Hung-Yi Lu ^d, Richard Tzong-Han Tsai ^e

Lydia.hmlin@gmail.com, cadid109502525@g.ncu.edu.tw, shliao@gms.tku.edu.tw, simon.lu@harbortech.com.tw, thtsai@g.ncu.edu.tw

^a Department of Business Administration, National Taiwan University of Science and Technology, Taipei City, Taiwan

^b Department of Computer Science and Information Engineering, National Central University, Taoyuan City, Taiwan

^c Department of Electrical and Computer Engineering, Tamkang University, Tamsui 25137, Taiwan

^d Harbor Technology Solutions Ltd., Taipei City, Taiwan

^e Center for GIS, Research Center for Humanities and Social Sciences, Academia Sinica, Taipei City, Taiwan

* Corresponding author: shliao@gms.tku.edu.tw (S.H. Liao)

Abstract

Mechanical equipment often undergoes remaining life prediction (RUL) to ensure reliable, efficient, and optimal performance. A vast amount of industrial measurement data can significantly enhance the effectiveness of data-driven methods for RUL prediction. This study utilizes the Transformer architecture to predict the RUL of the FEMTO-ST bearings dataset, generated from the PRONOSTIA platform, an experimental platform for accelerated bearing degradation testing. Th proposed method introduces four key improvements to the Dual Aspect Self-Attention Transformer (DAST) framework. It is called Multi-scale Feature DAST(MFDAST) and includes the following improvements: (1) multi-scale feature extraction for enhanced performance, (2) an advanced attention mechanism, (3) the use of a Health Index (HI) for precise degradation tracking, and (4) model optimization through a genetic algorithm. The results show that the study comprehensively analyzes global and local features within extensive datasets by augmenting the DAST model with a multi-scale feature encoder layer. This methodological advancement reduces RMSE by 50%, outperforming traditional Recurrent Neural Network (RNN) approaches.

Keywords:

Remaining Useful Life prediction

Transformer

Multi-scale feature

DAST

FEMTO-ST

Introduction

In modern manufacturing environments, the reliability and performance of machinery heavily depend on the health and longevity of critical components such as bearings. Accurately predicting the Remaining Useful Life (RUL) of these components is essential for minimizing downtime and optimizing maintenance schedules. Traditionally, the estimated service life of bearings is determined based on the average lifespan of the bearing model, assuming uniformity across all bearings in a batch. However, variations in material properties, operational conditions, and installation quality can lead to significant differences in lifespan among bearings of the same type. Therefore, real-time prediction of the lifespan for individual bearings would help reduce resource wastage caused by conservative estimations. The advent of the Industrial Internet of Things (IIoT) has revolutionized predictive maintenance by enabling real-time monitoring and analysis of machinery health through a network of interconnected devices. IIoT refers to the application of IoT technologies within industrial settings, where machine-generated data from various sensors is intelligently managed and analyzed. This data can include information such as vibration signals, temperature, humidity, pressure, and rotational speed, collected continuously from sensors installed on equipment. The integration of IIoT with advanced analytics allows for a more granular and dynamic understanding of individual bearing conditions, facilitating more accurate RUL predictions. One of the primary benefits of IIoT in predictive maintenance is its ability to collect vast amounts of real-time data that can be used to build more robust and accurate models for RUL prediction[1]. For instance, by employing IIoT-enabled sensors and edge devices, factories can collect continuous streams of high-frequency data, which can then be transmitted to centralized or cloud-based platforms for further analysis using machine learning models. This approach enables the development of predictive maintenance strategies that are not only more accurate but also more responsive to changing conditions. For example, IIoT systems can detect early signs of bearing degradation by analyzing patterns in the collected data, allowing for preemptive maintenance actions before a failure occurs. In the context of machine learning, the integration of IIoT data has led to significant advancements in the development of RUL prediction models. Traditional approaches, such as Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs), have been widely used to predict the RUL of bearings based on sensor data[2]. These models have shown promising results but often struggle with capturing long-term dependencies and complex patterns in the data. Moreover, they typically require extensive manual feature engineering and may not fully leverage the continuous and high-dimensional nature of IIoT data. Recent studies have attempted to address these challenges by using advanced deep learning techniques that can automatically learn features from raw data and account for temporal dependencies. One of the most promising approaches in this area is the use of the Transformer model, which is known for its powerful Attention Mechanism and ability to handle long-range dependencies[3]. Unlike traditional models, the Transformer can process the entire sequence of data in parallel, allowing it to capture complex relationships across time without being constrained by the limitations of sequential processing. This parallel processing capability not only improves the model's ability to learn from the data but also significantly reduces training time, making it a more efficient choice for real-time applications. The combination of IIoT and Transformer-based models provides a powerful framework for real-time and accurate RUL prediction of bearings[4]. By utilizing the data collected from IIoT sensors and processing it with a sophisticated deep learning model, we can achieve a more individualized assessment of bearing health, thereby enhancing the overall efficiency and reliability of industrial operations. Furthermore, the use of IIoT in predictive maintenance extends beyond just RUL prediction; it also enables the development of smart maintenance strategies that can be dynamically adapted based on real-time data, reducing maintenance costs and improving the longevity of equipment. Recent studies have explored various applications of IIoT in predictive maintenance. For example, Singh et al. demonstrated the use of IIoT devices for predicting the RUL of machinery in industrial settings using advanced models such as Convolutional Neural Networks (CNNs) and Gated Recurrent Units (GRUs) [5–6]. Their study highlighted the potential of IIoT in not only predicting the RUL but also in improving the resilience of predictive maintenance systems against threats such as False Data Injection Attacks (FDIA). This highlights the importance of integrating IIoT with robust machine learning models to enhance both the accuracy and security of predictive maintenance systems. In conclusion, the integration of IIoT and advanced deep learning models represents a significant advancement in the field of predictive maintenance. By leveraging the continuous, real-time data provided by IIoT sensors, along with the powerful learning capabilities of models like the Transformer, it is possible to develop more accurate and efficient RUL prediction systems[7][8]. This not only helps in reducing maintenance costs and downtime but also contributes to the overall improvement of industrial efficiency and reliability. The complete code and model data for this study are publicly available at GitHub repository. This repository includes all necessary components for data preprocessing, model training, visualization, and more, providing a comprehensive framework for researchers and practitioners interested in advancing the field of predictive maintenance.

1. Materials and Methods

A:Transformer with Multi-Scale Features and Dual aspect self-attention on RUL Prediction:

Figure 1 illustrates the MFDAST framework for RUL prediction, which has two main parts: preprocessing and model prediction. The preprocessing part includes multi-scale feature extraction and time-sliding windows. The feature set obtained after preprocessing is inputted into the model, thus obtaining the predicted RUL values for the bearings.

Fig. 1

System Architecture.

B: Multi-Scale Features Extraction

According to the description of the dataset, each CSV file contains 2560 records. Without feature extraction, the total number of data points would be 2560 x (500 ~ 2800) based on the number of CSV files. However, without preprocessing, the vibration signals within a single time step might not sufficiently represent degradation information. The effectiveness of data without preprocessing, as depicted in Fig. 2., is typically negligible. Additionally, this would significantly increase training time. Therefore, preprocessing of data features is necessary for datasets with a large number of detailed records.

Based on the feature extraction formula provided below (22), where p10, p18, p19, and p20 are confirmed to be independent of the time step, as shown in Fig. 3., these four features are removed, leaving 16 remaining features. Furthermore, existing RUL prediction studies mostly perform feature extraction on the original signals at a single scale. To address this issue, we segmented the sampled point data into multiple scales, calculating several time-domain and frequency-domain features within each sub-segment, achieving multi-scale feature extraction of the original signal.

Since the extracted multi-scale features contain richer degradation information and simultaneously consider both global and local features of the sampled point data, they can effectively suppress information loss caused by feature extraction at a single scale.

The specific process of multi-scale feature extraction is illustrated in Fig. 4. Assuming the length of the original signal of the bearing is 2560, we performed segmentation into no segmentation (scale 1), bi-segmentation (scale 2), and quad-segmentation (scale 4). At each scale, we computed 16 feature indicators within each sub-segment, resulting in three sets of input features for scales 1, 2, and 4, denoted as z1, z2, and z4, respectively, with lengths of 16, 32, and 64, respectively.

Figure 5.a. and Fig. 5.b. show schematic diagrams after feature extraction.

$\:{p}_{1}\:=\:max\:x\left(i\right)\:\left(1\right),\:\:{p}_{2}\:=\:min\:x\left(i\right)\:\left(2\right)$

$\:{p}_{3}\:=\:max\:\left|x\left(i\right)\right|\:\left(3\right),\:\:{p}_{4}\:=max\:x\left(i\right)-min\:x\left(i\right)\:\left(4\right)$

$\:{p}_{5}=\frac{1}{N}\sum\:_{i=1}^{N}\left|x\left(i\right)\right|\:\left(5\right),\:\:{p}_{6}={\left(\frac{1}{N}\sqrt{\sum\:_{i=1}^{N}\left|x\left(i\right)\right|}\right)}^{2}\:\left(6\right)$

$\:{p}_{7}=\frac{1}{N-1}\sum\:_{i=1}^{N}{\left[x\left(i\right)-\stackrel{-}{x}\right]}^{2}\:\left(7\right),\:\:{p}_{8}=\sqrt{\frac{1}{N-1}\sum\:_{i=1}^{N}{\left[x\left(i\right)-\stackrel{-}{x}\right]}^{2}\:\left(8\right)}$

$\:{p}_{9}=\sqrt{\frac{1}{N}\sum\:_{i=1}^{N}{\left[x\left(i\right)\right]}^{2}}\:\left(9\right),\:\:{p}_{10}=\frac{\sum\:_{i=1}^{N}{\left[x\left(i\right)-\stackrel{-}{x}\right]}^{3}}{\left(N-1\right){p}_{8}^{3}}\:\left(10\right)$

$\:{p}_{11}=\frac{N{p}_{9}}{\sum\:_{i=1}^{N}\left|x\left(i\right)\right|}\:\left(11\right),\:\:{p}_{12}=\frac{{p}_{9}}{{p}_{5}}\:\left(12\right)$

$\:{p}_{13}=\frac{{p}_{3}}{{p}_{9}}{\:\left(13\right),\:\:p}_{14}=\frac{{p}_{3}}{{p}_{5}}\:\left(14\right)$

$\:{p}_{15}=\frac{{p}_{3}}{{p}_{6}}{\:\left(15\right),\:\:p}_{16}=\frac{{p}_{3}}{{p}_{9}^{2}}\:\left(16\right)$

$\:{p}_{17}=\frac{1}{M}\sum\:_{j=1}^{M}s\left(j\right)\:\left(17\right),\:\:{p}_{18}=\frac{\sum\:_{j=1}^{M}{f}_{j}s\left(j\right)}{\sum\:_{j=1}^{M}s\left(j\right)}\:\left(18\right)$

$\:{p}_{19}=\sqrt{\frac{\sum\:_{j=1}^{M}{{f}_{j}}^{2}s\left(j\right)}{\sum\:_{j=1}^{M}s\left(j\right)}}\:\left(19\right),\:\:{p}_{20}=\sqrt{\frac{\sum\:_{j=1}^{M}{{(f}_{j}-{p}_{18})}^{2}s\left(j\right)}{\sum\:_{j=1}^{M}s\left(j\right)}}\:\left(20\right)$

Fig. 2

Predict without preprocessing.

Fig. 3

p17, p18, p19, p20 independent of the time step.

Fig. 4

multi-scale feature extraction.

(a)

Data feature extraction was not normalized.

(b)

Data feature extraction normalized.

Figure 5. (a) and (b) show the data feature extraction.

C:Sliding window processing

During the bearing’s operation, sampling is conducted C times, with each sampling containing K feature values. To expand the model’s training sample data and consider the time dependency between adjacent sampling points, the standardized input feature set undergoes a gradual overlapping sliding window process. Assuming the window size is T, this results in (C - T + 1) time window sequences, as depicted in Fig. 6.

D: HI (Health Index) Reconstruction

Since this paper focuses on the acceleration data during bearing degradation, where acceleration rapidly increases upon fault occurrence, the exponential function is considered the optimal form to simulate the Health Index. This function models the accumulated degradation over time, as shown in (3). Here, "a" is a hyperparameter representing the convergence rate of the exponential function. By solving (21) with condition (22), parameters "d" and "τ" can be determined.

$\:HI\left(t\right)=\:d\:-\:{e}^{t\tau\:}+a\:\:\left(21\right)$

Assuming that the bearing operates normally when HI = 1, and it is in a completely faulty mode when HI = 0.

$\:\left\{\begin{array}{c}HI\left({t}_{min}\right)=1\\\:HI\left({t}_{max}\right)=0\end{array}\right.\:\left(22\right)$

The simulated Health Index (HI) for the training and testing sets is depicted in Fig. 7.

Fig. 6

Sliding window.

Fig. 7

The simulated Health Index (HI) for the training and testing set.

2. Experimental setup

A: Datasets

The PRONOSTIA platform is a dedicated accelerated degradation system designed to provide real-time information on the health condition of bearings under steady-state or changing operating conditions [9–11]. Two accelerometers are placed on the ball bearing’s vertical and horizontal axes to collect signals as the system progresses towards failure. Table 2 lists its main features. The collected data has been used in the IEEE PHM 2012 Challenge.

The measured data is stored in separate "*.csv" files organized by time intervals, with each file being released every 10 seconds, sampled at a rate of 0.1 seconds, and a frequency of 25.6 kHz [12]. Subsequently, files generated from each individual tested bearing are organized in a folder. In total, 17 bearings underwent degradation tests under different speeds and load conditions. Fault modes in bearing balls, cages, and raceways may naturally coincide without prior assumptions.

The retrieved data is organized into three main sets. Each set contains training and testing subsets divided based on different operating conditions (i.e., rotational speed and load), as indicated in Table I. The raw vibration signals for the entire experiment are depicted in Fig. 8.

Table 1 Three types of operating condition

Fig. 8

Raw vibration signals.

B:Hyperparameters

The DAST model's parameters, such as Input Embedding, Feature Encoder, and other embedding parameters, were optimized through grid search, with a value of 20. As the Output FFN layer precedes a flatten layer, the parameter is 7 times the feature quantity. Detailed parameters are listed in Table II. The slide window setting remains consistent with the DAST paper[13], set to 40. The experimental environment for this paper is as follows: Adam Optimizer, with epochs set to 100, a learning rate of 0.001, and a batch size of 256. Python version 3.7, PyTorch version 1.8.1, with a random seed set to 42. The experiments were conducted on a workstation with 32GB RAM, an Intel Core (TM) i7-7800X CPU, and a GeForce GTX 1080 GPU running Ubuntu 18.04.

C:Comparison with other methods

We compared the performance of the multi-scale feature DAST(MFDAST) with a deep learning-based RUL prediction method [14]. This experiment was conducted with a random seed set to 42 to prevent duplicates.

3. Experimental results

The experimental results for Root Mean Square Error (RMSE) and training time using the multi-scale feature DAST (MFDAST) are presented in Table III. The RMSE values obtained using MFDAST are significantly better than those obtained using RNN for all datasets, with the RMSE reduced by 50%. The result represents a significant improvement　compared　to the RMSE reported in the reference paper [14]. These results suggest that MFDAST, combined with feature engineering, is a feasible and superior approach for predicting Remaining Useful Life (RUL). An example of DL-based RUL prediction and loss is shown in Fig. 9. and Fig. 10.

Table 3 Proposed MFDAST Hyperparameters

Fig. 9

Training and Testing loss example for Bearing1.

Fig. 10

RUL prediction example for Bearing1.

4. Conclusions

This paper presents a novel approach, called MFDAST, to analyze the FEMTO-ST bearing dataset by adopting the Transformer architecture combined with preprocessing techniques. We have extended the Dual Aspect Self-Attention Transformer (DAST) model with a multi-scale feature encoder layer, which enables a nuanced analysis of both global and local features of the dataset. This approach addresses the challenges associated with processing large datasets. In fact, our approach has shown a 50% reduction in RMSE when compared to the traditional Recurrent Neural Network (RNN)-based methodologies.

Acknowledgments

This paper was partly supported by the National Science and Technology Council, Taiwan, under Grant NSTC 112-2813-C-001-015-E.

Funding:

This research was funded by National Science and Technology Council, R. O. C. under

Program (NSTC 112-2813-C-001-015-E).

Conflicts of Interest:

The authors declare no conflicts of interest.

Data availability statement

The data that support the findings of this study are openly available in C-MAPSS at https://github.com/zzzsdu/dast.

References

Wang Y, Zhao Y (2020) Sri Addepalli, Remaining Useful Life Prediction using Deep Learning Approaches: A Review. Procedia Manuf 49:81–88

Jin R, Wu M, Wu K, Gao K, Chen Z, Li X (August 2022) Position Encoding Based Convolutional Neural Networks for Machine Remaining Useful Life Prediction. IEEE/CAA J Automatica Sinica 9(8):1427–1439

A. vaswani (2017) Attention Is All You Need. Advances in Neural Information Processing Systems

Wang B, Lei Y, Yan T, Li N, Guo L (2020) Recurrent convolutional neural network: A new framework for remaining useful life prediction of machinery, Neurocomputing 379, 117–129

Singh S, Singh K, Saxena A (2021) Remaining useful life (RUL) prediction for FDIA on IoT sensor data using CNN and GRU, 2021 International Conference on Advances in Technology, Management & Education (ICATME), Bhopal, India, pp. 112–116

Zhang Z et al (2022) Dual Aspect Self-Attention based on Transformer for Remaining Useful Life Prediction. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

Tarek berghout (2022) June). A Semi-Supervised Deep Transfer Learning Approach for Rolling- Element Bearing Remaining Useful Life Prediction. IEEE TRANSACTIONS ON ENERGY CONVERSION

T. berghout. (2021), April Leveraging Label Information in a Knowledge-Driven Approach for Rolling-Element Bearings Remaining Useful Life Prediction. Energies

H. liu. (2020) Remaining Useful Life Prediction Using a Novel Feature-Attention-Based End-to-End Approach. IEEE Transactions on Industrial Informatics

10.

P. nectoux. (2012) PRONOSTIA: An Experimental Platform for Bearings Accelerated Degradation Tests. Proc. IEEE Int. Conf. Prognostics Health Manage

11.

Y. wang. (2016) June). A Two-Stage Datadriven-Based Prognostic Approach for Bearing Degradation Problem. IEEE Trans. Ind. Informat

12.

Song Y et al (2021) Distributed Attention-Based Temporal Convolutional Network for Remaining Useful Life Prediction. IEEE Internet Things J 8:9594–9602

13.

Zhang Z et al (2022) Dual Aspect Self-Attention based on Transformer for Remaining Useful Life Prediction. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

14.

Y. yoo. (2018), July A Novel Image Feature for the Remaining Useful Lifetime Prediction of Bearings Based on Continuous Wavelet Transform and Convolutional Neural Network. Sci

Yes