1. Introduction
With the advancement of Industry 4.0 and the rapid development of intelligent manufacturing technologies, the manufacturing industry is undergoing a transformation in production modes. As an advanced technology, Metal Additive Manufacturing (AM) is becoming a vital component of advanced manufacturing systems. Due to its ability to form complex geometric structures without molds and its high material utilization efficiency, it has emerged as a key technology in the manufacturing field[1]. Among various metal AM technologies, Wire-Arc Directed Energy Deposition (WA-DED), which is based on the principle of DED, utilizes the arc as a high-energy heat source to melt metal wire, depositing molten metal layer by layer under computer numerical control. This capability to construct complex geometries without tooling, combined with low manufacturing costs and high deposition efficiency[2–4], has demonstrated broad industrial application prospects in high-end fields such as aerospace, marine engineering, and heavy mold manufacturing [5, 6].
Despite these advantages, WA-DED is inherently a complex thermal-mechanical process involving non-equilibrium rapid solidification and cyclic thermal loading[7–9]. Extreme temperature gradients induce incompatible thermal expansion and contraction within the material, leading to significant residual stress and distortion within the deposited layers[10]. From a mechanical perspective, this macroscopic distortion behavior is essentially the dynamic superposition of elastic, plastic, and thermal strains across spatiotemporal scales, where the evolution of internal stress is determined by constitutive relations[11]. Specifically, thermal strain undergoes transient reversible changes with temperature fluctuations, whereas irreversible plastic strain accumulates layer by layer along the deposition path. This complex strain interaction mechanism ultimately results in residual stress, permanent distortion, or even cracks in components[12–14]. Therefore, the ability to analyze and predict strain and stress fields in a decoupled manner forms the physical foundation for high-fidelity digital twins, while also being a prerequisite for accurately characterizing the component’s complex mechanical state. Precisely predicting and decoupling each strain component is required to overcome single-physics analysis limitations. This allows for an accurate thermodynamic characterization of the deposition process, which is critical for controlling final component dimensions and mechanical properties.
Currently, methods for addressing the complex thermal-mechanical evolution during AM generally fall into two categories: numerical methods and data-driven methods. Numerical methods are based on continuum mechanics theory, solving Partial Differential Equations (PDEs) governing heat transfer, fluid flow, and solid mechanics through discretization to determine the distribution of multi-physics fields within the spatiotemporal domain under specific Boundary Conditions (BCs)[15, 16]. In the field of AM, the most commonly used numerical methods are the FEM[17] and Computational Fluid Dynamics (CFD) [18]. The core principle of FEM involves discretizing the mesh and simulating the thermodynamic evolution of the AM process to solve for stress, strain, and distortion fields[19]. Unlike FEM, which focuses on macroscopic mechanical behavior, CFD emphasizes the fine-scale simulation of fluid physical behaviors, mainly used for analyzing metal flow, heat transfer, and solidification processes within the molten pool[20]. However, a complete, high-precision thermal-mechanical simulation often requires days to complete[21], making it difficult to meet the needs of real-time monitoring and online process optimization. Consequently, researchers are seeking more efficient surrogates, such as data-driven methods[22].
Unlike numerical simulation methods, data-driven methods bypass the tedious solution of physical equations, directly mining the non-linear mapping relationship between inputs and outputs from massive experimental or simulation data, effectively resolving the issues of high computational cost and low time efficiency associated with traditional FEM[23, 24]. In early studies, Farias et al. [25] utilized Artificial Neural Networks (ANN) to establish a rapid mapping between process parameters and interlayer temperatures. However, ANNs often ignore the spatial topological structure of data, making it difficult to characterize full-field distribution features. To address this, Xia et al. [26] utilized Convolutional Neural Networks (CNN), which possess strong spatial feature extraction capabilities, to achieve real-time state diagnosis based on molten pool images. Although the CNN performs well in the spatial dimension, WA-DED is essentially a dynamic process with strong historical dependence. Addressing this characteristic, Nalajam et al. [27] employed Recurrent Neural Networks (RNN) to capture the temporal correlation of thermal cycles, achieving dynamic tracking of temperature history at key points.
In addition, Scientific Machine Learning (SciML) methods, represented by Physics-Informed Neural Networks (PINN), Neural Ordinary Differential Equations (Neural ODE), and Deep Operator Networks (DeepONet), have provided new methodologies for solving data scarcity, high-dimensional, complex geometry, inverse solution, and multi-physical field decoupling problems in recent years. There are fundamental differences in the underlying mechanisms of these three paradigms: PINN and Neural ODE focus on instance solutions of equations[28, 29], while DeepONet is dedicated to operator learning[30]. When facing complex problems involving multi-scale, strong coupling mechanisms like WA-DED, PINN and Neural ODE reveal limitations. The governing equations of WA-DED exhibit stiffness, making the loss function optimization of PINN difficult[31], or causing the time-step integration of Neural ODE to be extremely slow or even divergent[32]. In contrast, by learning infinite-dimensional operators, DeepONet can establish an end-to-end mapping from complete thermal history to mechanical response, avoiding the tedious iterative solution process of differential equations in traditional methods[30], and effectively capturing the state evolution process relied upon by yield criteria. However, existing DeepONet architectures mostly employ a single network structure with parameter sharing to handle multi-output tasks[33], ignoring the differences in evolution mechanisms between different physical fields. This oversight easily induces feature interference, limiting the model's precision in characterizing complex multi-physics fields.
To this end, this paper introduces a PIOL surrogate model oriented towards real-time digital twins. This model designs a differentiated operator learning architecture based on the evolution characteristics of WA-DED multi-physics fields: a Shared Trunk network is used to learn universal basis functions of the geometric domain; in the design of branch networks, a lightweight CNN-MLP encoding is adopted for the transient, reversible elastic field, while a CNN-LSTM branch is constructed for the plastic and stress fields with strong historical dependencies. Through this hybrid architecture, the model ensures full-field prediction accuracy while achieving optimal computational adaptation for different physical constitutive behaviors. Therefore, The PIOL framework should have the following key features:
A
1.
Specific model architecture for predicted physics field: Design heterogeneous branch networks to adapt to the constitutive behaviors of each target, distinguishing between transient reversible responses and history-dependent cumulative effects.
2.
Physics constrained: A constitutive regularization term based on thermal expansion theory is explicitly embedded into the learning objective to ensure the predictions strictly comply with thermodynamic laws.
3.
Decouple the multi-output into single output questions: The complex multi-physics task is transformed into independent learning pathways within the parameter space, effectively mitigating feature interference between different physical quantities.
4.
Synergistic enhancement: A shared trunk network is employed to extract universal geometric basis functions, enabling the plastic strain to serve as auxiliary supervision for enhancing residual stress prediction.
The remainder of this paper is organized as follows: Section 2 details the methodology and the established benchmark. Section 3 describes the experimental calibration, dataset construction, and the setup for model comparison. The analysis in Section 4 focuses on prediction accuracy, multi-output synergy, and computational efficiency. Finally, Section 5 presents the conclusions.
2. Methodology
2.1 System overview
This paper establishes a systematic benchmark organized into three main modules: "data generation", "decoupling the physics field", and "PIOL scheme summary", as shown in Fig. 1. First, a WA-DED experimental platform based on DED is established, utilizing a thermal camera to collect transient temperature history during the deposition process. Second, experimental data are used to rigorously calibrate the heat source parameters and BCs of the offline FEM. Given the extreme difficulty of in-situ stress field measurement, and considering that the reliability of classical thermal-mechanical FEM models rigorously calibrated for thermal trends has been widely verified, this study establishes the calibrated FEM model as the digital ground truth. Finally, unlike traditional end-to-end learning, the established benchmark can provide a feasible PIOL scheme that explicitly decouples the learning pathways for transient and history-dependent physics.
2.2 Finite element model
To construct a high-fidelity dataset for verifying the multi-physics decoupling prediction model, this study utilized ANSYS software to perform thermal-mechanical coupled simulation of the WA-DED process, followed by extraction and preprocessing of relevant data. The heat source employs the Goldak double-ellipsoid model
[34]:
Where relevant parameters are calibrated by comparing simulated molten pool dimensions with experimental measurements. The substrate and filler wire are Q235B low carbon steel and ER70S-6 welding wire, respectively, with material properties set as temperature-dependent non-linear parameters. Figure
2(a) shows the established FEM model of the thin-wall structure and the meshing strategy. The model includes a substrate with dimensions of
and a deposited wall consisting of 20 layers with a deposition length of 100
. To ensure physical fidelity, the specific geometric dimensions (i.e., width and layer height) of the simulated wall were configured to be consistent with the experimental measurements corresponding to the specific process parameters for each case (as detailed in Section
3.2). The simulation employs element birth and death techniques to simulate the layer-by-layer material deposition process, achieving dynamic material addition by sequentially activating elements and applying corresponding thermal loads. Crucially, this FEM model outputs not only the temperature field but also generates high-fidelity full-field data including elastic strain, plastic strain, thermal strain, and Von Mises stress based on continuum mechanics theory, as shown in Fig.
2(c). These physical fields solved by governing equations constitute the high-quality labels for the surrogate model training.
2.3 Theoretical analysis of multi-physics field decomposition
Within the framework of continuum mechanics for metal AM assuming the material undergoes small distortions, the total strain
can be decoupled into three independent physical components according to the additive decomposition theorem
[35]:
Where
,
, and
represent elastic, plastic, and thermal strain, respectively. Although these three components are controlled by the same set of spatiotemporal BCs and thermal history constraints, they follow distinctly different physical evolution laws. This intrinsic difference in physical mechanisms constitutes the physical basis for the differentiated hybrid branch architecture adopted in this study: that is, achieving decoupled learning of different physical constitutive behaviors in parameter space through parallel network channels.
Elastic strain and thermal strain inherently manifest more as transient reversible responses based on current thermodynamic inputs. Specifically, thermal strain follows
[35]:
Where
is the temperature-dependent thermal expansion coefficient,
is the reference temperature, and
is the second-order identity tensor. Elastic strain is primarily determined by the current stress state and temperature-dependent material properties, exhibiting rapid reversible changes with temperature fluctuations. This implies that these two physical quantities are mainly determined by the temperature field state and geometric constraints at the current moment, possessing weak historical memory.
In sharp contrast, plastic strain and the stress field exhibit strong historical dependence and irreversible cumulative evolution characteristics. The constitutive relation between the Cauchy stress tensor
and the elastic strain tensor
can be expressed as
[36]:
Where
is the fourth-order elasticity tensor,
and
are the temperature-dependent Young's modulus and Poisson's ratio respectively, and
is the second-order identity tensor. Although this relation links stress to elastic strain, in the complex reciprocating thermal cycles of WA-DED, the macroscopic residual stress field is essentially the result of the accumulation of incompatible plastic distortion within the material over time and space. Plastic strain is an irreversible distortion produced after material yielding; its value is not defined solely by the current temperature or stress state but is the result of the time-domain integration of the plastic strain rate throughout the entire distortion history:
2.4 Physics field prediction via single-output models
Before constructing the WA-DED thermal-mechanical coupling surrogate model, it is essential to clarify the machine learning theories supporting spatiotemporal physical field prediction: the operator network for solving infinite-dimensional function space mapping problems, and neural ordinary differential equations for modeling continuous time evolution processes.
2.4.1 Neural operator learning
To achieve efficient prediction, this study adopts the neural operator framework
[30]. Its goal is to learn an operator
that maps an input function
(representing the temperature history) to an output function
, where
represents the spatiotemporal coordinates of the output. To realize this mapping, the branch network encodes the input function
through limited sensor points, generating a latent representation vector summarizing the global behavior of the input field. Meanwhile, the trunk network processes the query coordinates
, outputting a feature vector representing the local response basis at that location. The final prediction result is obtained by the inner product of these two feature vectors, expressed as:
Even when the geometric structure is relatively fixed, the thermal history
experienced by points at different spatial locations in a WA-DED component varies significantly. The core advantage of DeepONet lies in its learning of the universal constitutive mapping operator from thermal history to mechanical response, allowing the model to capture complex non-linear physical laws and perform inference at arbitrary continuous spatiotemporal coordinates.
2.4.2 Neural ODE
Neural ODE provide a theoretical perspective based on continuous systems for analyzing sequential evolution dynamics in WA-DED
[29]. Unlike traditional RNN that discretize the time axis into fixed sequence steps, Neural ODE models the trajectory of the hidden state
as a parameterized initial value problem of an ordinary differential equation:
At this point, the state
at any time
can be obtained by integrating from the initial state
via an integration solver. However, thermal history data in the WA-DED process exhibits numerical stiffness, characterized by severe thermal shocks within extremely short periods and slow cooling over long time scales. This multi-scale temporal characteristic forces the ODE solver to make difficult trade-offs between extremely small step sizes and numerical stability, resulting in high computational costs. This theoretical dilemma also constitutes the core motivation for the subsequent chapters of this study, which delve into the applicability boundaries of discrete gating mechanisms and continuous integration mechanisms in modeling stiff systems.
2.4.3 Baseline single-output model architectures
To systematically evaluate the effectiveness of temporal feature encoding mechanisms, five baseline models are constructed covering traditional ML and operator learning architectures, as shown in Fig. 3. These models are strictly divided into non-operator baselines and DeepONet operator variants based on their different handling of physical field mapping mechanisms, aiming to investigate optimal feature interaction methods.
The first category of architectures employs traditional feature concatenation and direct decoding strategies (Fig.
3a-b), characterized by a lack of explicit branch-trunk decoupling structures. Among them, the CNN-LSTM (Fig.
3a) model follows the classic sequence regression paradigm, first utilizing a multi-scale CNN encoder to extract high-dimensional features from thermal history, followed by LSTM layers to capture discrete time dependencies. The key is that the hidden state vector output by the LSTM is directly concatenated with spatiotemporal coordinates
and then input into an MLP decoder to regress physical field values in an end-to-end manner. As a contrast for continuous-time modeling, the CNN-Neural ODE (Fig.
3b) architecture retains the front-end CNN feature extractor but introduces a neural differential equation solver to replace the LSTM module. It models the continuous evolution of latent states by numerically integrating
, aiming to explore the performance of continuous dynamic systems in processing non-steady thermal history data.
The second category of architectures strictly follows the DeepONet theoretical operator inner product structure
(Fig.
3c-e). A common feature of these models is the use of a shared MLP trunk network to extract spatiotemporal features, but the branch network designs differ to test different encoding hypotheses. As a basic control group, CNN-MLP DeepONet (Fig.
3e) features a branch network composed solely of fully connected layers, treating thermal history as a static vector and ignoring the local correlation of time-series data. CNN-Neural ODE DeepONet (Fig.
3d) attempts to introduce continuous dynamics into operator learning, utilizing an ODE module in the branch network to generate dynamically changing basis function coefficients. The core architecture designed in this study, CNN-LSTM DeepONet (Fig.
3c), deeply integrates the feature compression capability of CNN and the long-term memory capability of LSTM within the branch network. Ultimately, by calculating the dot product of branch latent basis vectors and trunk latent coordinates, it achieves efficient operator mapping for the complex thermal-mechanical process of WA-DED.
2.5 Multi-output prediction via physics-informed DeepONet
As the core of the established PIOL framework in this study, a physics-informed hybrid multi-output DeepONet architecture has been developed to predict the spatiotemporal evolution of multi-physics fields, as shown in Fig. 4. The model employs heterogeneous temporal feature encoders specifically tailored to the constitutive characteristics of the physical quantities. Specifically, for the transient, reversible elastic strain, a lightweight CNN-MLP branch is adopted to efficiently encode the instantaneous thermal state lacking historical memory. Conversely, for plastic strain and stress, which exhibit strong historical cumulative effects and irreversible evolution characteristics, the model retains the CNN-LSTM architecture as the history-dependent branch. The gating mechanism of the LSTM is used to simulate the time-varying integration process, effectively capturing plastic hardening and residual stress accumulation behaviors throughout the thermal cycle history. This architecture is verified with the results of section 2.4 (as shown in section 4.1–4.2).
Considering that all mechanical field quantities are subject to the same geometric entity and thermal history constraints, the model introduces a “shared trunk” network strategy to extract universal geometric spatiotemporal modes. Finally, through multi-output collaborative dot-product operations, the explicit decoupling of unique evolution mechanisms of each physical field and the sharing of common spatial distribution features are analyzed. This architecture is fundamentally designed to enhance prediction accuracy and explainability regarding material thermal-mechanical behavior under variable temperature fields.
To systematically analyze the synergistic mechanisms, four typical task configurations were designed for comparative experiments, as illustrated in Fig. 5. These configurations, which include the “all-branch” synergistic model and three dual-task subsets, maintain the fixed network topology while varying the output task combinations. The primary objective is to explore how the shared spatial basis functions act as bridges between different physical fields, and to evaluate how the joint training of specific physical quantities affects the final prediction performance through inductive bias.
The rationale behind this experimental design lies in the common basis hypothesis implemented by the shared trunk network. Although the differential equations governing elastic strain, plastic strain, and stress differ substantially—where the former is controlled by the current thermal state and the latter involves complex historical accumulation—they are all constrained by the same geometric entity and thermal BCs. Therefore, the shared trunk utilizes geometric homology to introduce a physical constraint, based on the hypothesis that distinct physical fields, despite having different weight coefficients, are all linearly expanded within the same basis function space reflecting universal geometric characteristics.
2.6 Model training
The models are trained using a supervised learning paradigm, with the objective of achieving high-precision regression from temperature history to multi-physics fields. To balance the learning of multi-physics components and effectively apply constitutive constraints, the total objective function is defined as:
Where the data-driven loss
covers all collaborative prediction target variables:
The weight coefficients
are dynamically adjusted according to the magnitude of each physical quantity. The constitutive constraint term
is constructed based on Eq. 5 (thermal expansion constitutive relationship), forcing the network-predicted thermal strain to strictly adhere to the theoretical value determined by the thermal expansion coefficient and temperature variation. This constraint mechanism effectively embeds the material's thermophysical properties as prior knowledge into the neural network, enhancing the model's generalization robustness under sparse training data conditions.
Model training was performed under the PyTorch deep learning framework, using the Adam optimizer to adapt to the non-stationary objective function, with the initial learning rate set to
. All computational tasks were completed on a personal workstation equipped with an Intel Core i9 processor, 32GB RAM, and a single NVIDIA GeForce RTX 4060 (8GB) graphics card. Thanks to memory mapping techniques and an efficient parallel computing architecture, model training on the large-scale dataset containing millions of samples maintained an efficient convergence speed.
To strictly evaluate the surrogate model's generalization capability under unseen operating conditions, this study adopted a leave-one-out dataset partition strategy, rather than traditional random shuffling. The experimental dataset contains 5 typical cases with distinct thermal histories. Each case contains 1,071 spatial nodes, with each node recording an evolution history of 1,500 time steps. In the experiment, 4 cases (approximately 3.2 million samples) were used as the training set to cover diverse thermodynamic characteristics; the remaining 1 case (approximately 0.8 million samples) was strictly isolated as an independent test set. This partition method is highly challenging as it requires the model not merely to memorize data distributions but to truly learn the physical operator laws governing strain evolution, thereby achieving high-precision extrapolation prediction under entirely new operating conditions.
3. Experimental setup
To ensure the FEM model used as digital ground truth possesses sufficient physical credibility, this study employs a hybrid strategy of experimental calibration and numerical generation. First, a physical experimental platform for WA-DED is established, utilizing collected in-situ thermal-mechanical data to rigorously calibrate the parameters of the FEM model described in Section 2.2. Subsequently, multi-physics training data covering 5 operating conditions is generated based on the validated numerical model.
3.1 Experimental settings and model calibration
To obtain true data for calibrating the numerical model, an integrated robotic WA-DED system was constructed. The system uses an ABB IRB 2600 six-axis industrial robot as the motion execution unit, coordinated with Cold Metal Transfer (CMT) welding power source for material deposition. The experiment selected a Q235B low carbon steel plate of
as the substrate, with ER70S-6 solid wire of 1.2
diameter as the filler material. To prevent high-temperature oxidation and ensure arc stability, a shielding gas mixture of 80% Ar + 20% CO
2 with a flow rate of 17
was used. Regarding data acquisition, to comprehensively capture thermal evolution features during the deposition process, the platform was equipped with a high-resolution thermal camera, FLIR A655sc, recording the surface temperature field distribution of the molten pool and heat-affected zone in real-time at a sampling rate of 6.25
. The experimental system is shown in Fig.
6.
Based on the aforementioned experimental platform, single-pass multi-layer thin-walled samples were prepared, and experimental measurement results were used to calibrate key heat source parameters in the offline FEM model. Since the shape parameters
of the Goldak double-ellipsoid heat source model directly determine the spatial distribution of heat input, geometric morphology comparison was performed. By fine-tuning the heat source parameters, good agreement was achieved between the simulated and experimental molten pool dimensions. Furthermore, thermocouple temperature measurement data from key characteristic points on the substrate and infrared thermal imaging data were extracted and aligned with the time-temperature curves of corresponding nodes in the simulation model.
3.2 Dataset construction strategy
To endow the deep learning model with generalization robustness under different energy inputs, a single set of process parameters is insufficient to cover the possible thermal history space. Based on the strictly experimentally validated FEM model mentioned above, this study designed and calculated a library of 5 simulation cases with distinct thermal histories. These cases were generated by systematically varying the Wire Feed Speed (WFS) and Travel Speed (TS), resulting in distinct layer geometries (i.e., width and height) for each case. The coupling of these two parameters directly determines the deposition amount per unit length and the degree of heat accumulation. Specifically, a higher WFS is typically accompanied by greater welding current and heat input. This results in a wider molten pool and deeper heat-affected zone, thereby inducing more severe thermal softening effects and high-temperature plastic flow. Meanwhile, changes in TS directly alter the cooling rate, consequently affecting the accumulation mode of residual stress. The parameter settings for these 5 cases cover a typical process window from low to high heat input, with specific parameters detailed in Table 1.
Table 1
Process parameters and corresponding bead geometries for the thermal-mechanical FEM of WA-DED
| |
WFS (m/min)
|
TS (m/min)
|
Width (mm)
|
Height (mm)
|
Heat input (J/cm)
|
|
Case 1
|
6
|
0.4
|
8
|
3
|
2266.87
|
|
Case 2
|
6
|
0.48
|
6
|
2
|
1889.06
|
|
Case 3
|
5
|
0.4
|
8
|
2.5
|
1881.00
|
|
Case 4
|
5
|
0.45
|
6
|
2.5
|
1672.00
|
|
Case 5
|
5
|
0.48
|
5
|
2
|
1567.50
|
The final constructed training dataset contains 4 training cases and 1 completely independent test case, totaling approximately 4 million spatiotemporal sample points. For each sample point, the dataset extracts not only the input temperature history vector
and spatiotemporal coordinates
, but also the four target components explicitly decoupled and calculated by the finite element constitutive equations: elastic strain
, plastic strain
, thermal strain
, and Von Mises stress
.
This refined data structure, containing full-field stress and decoupled strains, is the foundation of the collaborative prediction method introduced in this study. It allows DeepONet to break through the black-box limitations of traditional end-to-end models, utilizing the shared basis function mechanism to capture implicit constitutive correlations between different physical quantities through supervised learning. In particular, using the unseen 5th case as an independent test set forces the model to learn the underlying physical operator laws rather than simply memorizing data distributions, thereby validating its generalized prediction capability under unknown process parameters.
4. Results and discussion
The feasibility of the benchmark is evaluated through comparing its resulting PIOL architecture in the synergistic prediction of multi-physics fields for WA-DED. To systematically verify the effectiveness of the framework and reveal its underlying physical mechanisms, this chapter establishes the superiority of the differentiated operator learning strategies through the benchmark with single time-series models and various operator variants: specifically, verifying the necessity and effectiveness of adopting a CNN-MLP branch for transient reversible elastic fields, and a CNN-LSTM branch for plastic and stress fields with strong historical dependencies. Furthermore, through multi-output combination experiments, it deeply analyzes how the shared trunk mechanism leverages simple physical quantities to assist in the prediction of complex non-conservative fields, revealing the synergistic enhancement mechanism among physical fields. Finally, using transient evolution curves of key characteristic points, the model's capability to capture peak responses and residual accumulation during complex thermal cycles is visually demonstrated.
4.1 Quantitative analysis of single-output model performances
To establish the optimal operator learning architecture, this study conducted comparative evaluations of five baseline models in independent single-task settings (listed in Table 2). Prediction accuracy was quantified using Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). Experimental results reveal the decisive influence of feature extraction mechanisms and time-series modeling strategies on prediction accuracy of complex thermal-mechanical fields.
Table 2
Comparative evaluation of prediction performance in terms of MAE and RMSE across five baseline models under independent single-task settings.
|
Model
|
Elastic Strain (‰)
|
Plastic Strain
(‰)
|
Thermal Strain
(‰)
|
Von Mises Stress
(MPa)
|
|
|
MAE
|
RMSE
|
MAE
|
RMSE
|
MAE
|
RMSE
|
MAE
|
RMSE
|
|
CNN-LSTM
|
0.192
|
0.306
|
0.248
|
0.389
|
0.316
|
0.388
|
13.646
|
18.364
|
|
CNN-Neural ODE
|
0.186
|
0.313
|
0.283
|
0.447
|
0.433
|
0.512
|
14.944
|
21.505
|
|
CNN-LSTM DeepONet
|
0.201
|
0.331
|
0.245
|
0.359
|
0.421
|
0.500
|
12.766
|
17.995
|
|
CNN-Neural ODE DeepONet
|
0.218
|
0.340
|
0.389
|
0.582
|
0.278
|
0.371
|
18.080
|
24.906
|
|
CNN-MLP DeepONet
|
0.171
|
0.294
|
0.264
|
0.359
|
0.363
|
0.433
|
16.807
|
23.167
|
For elastic strain, which is primarily governed by the current thermodynamic state and exhibits transient reversible characteristics, the structurally simpler CNN-MLP DeepONet achieved the lowest error among all models. With an MAE of 0.171‰ and an RMSE of 0.294‰, it outperformed the more complex LSTM variants. This indicates that for physical quantities lacking significant historical dependence, fully connected networks possess sufficient non-linear mapping capabilities and effectively mitigate the risk of overfitting potentially introduced by complex temporal modules.
In the plastic strain prediction task, the performance of the CNN-MLP DeepONet declined, whereas the CNN-LSTM DeepONet, incorporating Long Short-Term Memory mechanisms, demonstrated superior performance. By achieving the lowest RMSE of 0.359‰ and MAE of 0.245‰ across all categories, it outperformed the traditional end-to-end CNN-LSTM model. This demonstrates the necessity of gating mechanisms in capturing material non-linear hardening and historical cumulative effects.
Regarding the prediction of Von Mises stress, which comprehensively reflects thermo-mechanical coupling effects, the CNN-LSTM DeepONet exhibited the highest predictive accuracy. Its MAE of 12.766 MPa and RMSE of 17.995 MPa were both superior to those of the traditional end-to-end CNN-LSTM model, which recorded an MAE of 13.646 MPa and an RMSE of 18.364 MPa, as well as the Neural ODE and MLP-based baselines. This result highlights the core advantage of the LSTM-integrated operator architecture in capturing complex plastic evolution and residual stress characteristics.
It is worth noting that the inclusion of thermal strain prediction in this study aims to verify the feasibility of decoupling spatiotemporal coordinates from temperature features within the model. Given that the physical nature of thermal strain is equivalent to fitting the thermal expansion equations of materials under varying temperature fields, which represents a strongly deterministic physical law, it will not be the primary focus of the subsequent in-depth discussion on decoupling mechanisms.
4.2 Qualitative analysis of single-output model performances
To further dissect the specific performance of each model in capturing the dynamic evolution of complex physical fields in WA-DED, this section conducts visualization analysis from both time-domain evolution and spatial distribution dimensions.
4.2.1 Time-domain evolution characteristics analysis
Figures 7–10 display the prediction curves of different models at four representative node locations (Node 200, 400, 600, 800). Based on the mesh topology of 51 nodes per layer, these nodes correspond spatially to the 4th, 8th, 12th, and 16th layers along the build direction, respectively. This selection evaluates model robustness across varying thermal history lengths: Node 200 (4th layer) in the bottom region undergoes the most extensive cyclic thermal shocks and plastic accumulation, whereas Node 800 (16th layer) in the top region represents newly deposited material with a shorter history.
Among the physical quantities, plastic strain (Fig. 8), as a typical history-dependent variable, exhibits an irreversible cumulative evolution characteristic that increases layer by layer, manifesting as a stepped growth. Observing the curves for Node 200, which requires the longest temporal integration, it is evident that the CNN-Neural ODE DeepONet exhibits non-physical oscillations and numerically irrational decreases during the cooling phase. This specific failure at the bottom layer reveals the limitation of continuous ODE solvers: measuring the evolution of Node 200 requires integrating over a long time horizon with severe stiffness, leading to significant cumulative numerical drift.
In contrast, curves generated by CNN-LSTM DeepONet and CNN-LSTM are smooth and closely adhere to true values, confirming that the gating mechanism effectively captures the plastic increment even after prolonged thermal cycles.In Von Mises stress prediction (Fig. 10), stress presents pronounced sawtooth fluctuations as deposition layers increase. Neural ODE-based models show clearly insufficient fitting capabilities at peaks and valleys, especially exhibiting significant cumulative drift in the later stages (t > 1000s) at Node 200. This further corroborates that continuous differential equations struggle to adapt to such non-smooth dynamic processes over long sequences. Conversely, CNN-LSTM DeepONet reconstructs the rapid response details of stress variations with temperature, confirming its capability to capture high-frequency thermal-mechanical responses. Additionally, predictions for elastic strain (Fig. 7) and thermal strain (Fig. 9) further verify the capability of the DeepONet architecture in handling reversible thermodynamic behaviors.
4.2.2 Spatial physical field distribution analysis
This section selected the Von Mises stress field at the end of deposition (t = 1460s) as the evaluation object (Fig. 11), as this indicator can comprehensively reflect the model's overall modeling capability for thermal-mechanical coupling mechanisms. At this point, the ground truth stress field displays a clear layered stripe structure, reflecting the periodic gradient distribution brought by layer-by-layer deposition, and there is a significant stress concentration area at the junction of the substrate and the deposited layer. Comparing the prediction contours of various models, it can be found that the prediction of CNN-Neural ODE is the most blurred, with the largest error value, and the high gradient information at the interlayer interface is almost completely lost, presenting an over-smoothed state that fails to reflect the mesoscopic features of the WA-DED process. Although CNN-MLP DeepONet recovers some layered structures, its predicted values in the stress concentration area at the bottom (Z < 10mm) are low, failing to capture the high-stress core region. In contrast, CNN-LSTM DeepONet achieves the best reconstruction effect, not only accurately restoring the stress concentration amplitude at the bottom but also clearly preserving the high-frequency interlayer stress gradient stripes along the Z-axis. This result indicates that the model possesses not only memory capabilities in time sequence but also effectively approximates the high-frequency geometric features of the physical field in spatial operator mapping, further corroborating its reliability as a high-performance surrogate model.
The temporal feature extraction mechanism has a significant impact on the reconstruction quality of the spatial field. The prediction results of CNN-Neural ODE and its DeepONet variants are the most blurred, with high-gradient information at interlayer interfaces almost completely lost and the largest overall error, indicating the limitations of continuous integration solvers in processing such non-continuous fields with drastic spatial distribution changes. CNN-MLP DeepONet, while recovering layered structures to some extent, shows significant numerical underestimation in the stress concentration area at the bottom (Z < 10mm) as seen in the error contour, failing to accurately capture the high-stress core zone. CNN-LSTM DeepONet demonstrated the best reconstruction effect, with the most uniform error distribution. This result proves that for deposition process with strong historical dependence characteristics like WA-DED, long-short term memory mechanisms should be introduced into the operator learning framework to achieve precise approximation of complex physical field high-frequency geometric features and historical cumulative distributions.
4.3 Physics-informed DeepONet performance in multi-output prediction
This section delves into the core role of the shared trunk network in multi-physics prediction and its underlying mathematical mechanism. In the independent branch, shared trunk multi-output structure constructed in this study, its mathematical expression is
. Here,
constitutes a set of universal geometric spatiotemporal basis functions reused across tasks. This formula clarifies the mechanism of the model: the specificity of physical evolution is represented by independent coefficients
, while the commonality of geometric space is represented by unified basis functions
.
4.3.1 Quantitative evaluation and comparison of multi-output configurations
The experimental data in Table 3 illustrate the specific impact of multi-output synergy on prediction accuracy from a quantitative dimension.
Table 3
Quantitative comparison of prediction performance in terms of MAE and RMSE across different task configuration strategies.
|
Model
|
Elastic Strain
(‰)
|
Plastic Strain
(‰)
|
Von Mises Stress
(MPa)
|
|
MAE
|
RMSE
|
MAE
|
RMSE
|
MAE
|
RMSE
|
|
All-Branch
|
0.305
|
0.412
|
0.364
|
0.506
|
20.030
|
26.722
|
|
Elastic-Plastic
|
0.254
|
0.339
|
0.303
|
0.462
|
—
|
—
|
|
Elastic-Stress
|
0.193
|
0.325
|
—
|
—
|
16.986
|
23.946
|
|
Plastic-Stress
|
—
|
—
|
0.266
|
0.397
|
12.935
|
18.424
|
Comparison reveals that the Plastic-Stress combination achieved the lowest error among all experimental groups in Von Mises stress prediction, with an MAE of 12.935 MPa and an RMSE of 18.424 MPa. In contrast, the Elastic-Stress combination resulted in an MAE of 16.986 MPa. The accuracy gap of approximately 24% between the two proves the synergistic effect at the physical level: since the generation of residual stress fundamentally stems from the accumulation of incompatible plastic deformation within the material, introducing the plastic branch provides key auxiliary information for stress prediction, thereby improving prediction accuracy.
4.3.2 Synergistic enhancement and feature competition
The above quantitative results can be explained by the mathematical essence of the DeepONet operator. In the independent branch, shared trunk architecture, the prediction result is essentially the inner product of independent branch coefficients and universal trunk basis functions.
Comparing the error fields in Fig. 13, it is evident that in the Elastic-Stress model lacking plastic branch assistance, the stress prediction error surface exhibits fluctuations, with high error peaks at the edges. Combined with the temporal error curve analysis in Fig. 14, the error of the Elastic-Stress model tends to generate spikes during the transient process of interlayer switching. When the Plastic-Stress combination is adopted, the error trend becomes significantly smoother. This reflects the physical essence of multi-output synergy: the plastic branch actually acts as an explicit constraint for historical cumulative features, forcing the shared Trunk network to prioritize encoding those spatial modes highly relevant to historical accumulation and non-linear hardening. This mechanism allows the learning process of the plastic branch to provide key supervision for stress prediction, helping the model more accurately reconstruct stress fields with historical dependencies.
On the other hand, a feature competition phenomenon also exists in the shared trunk network. The All-Branch model, which includes all physical quantities, did not show the expected advantage; its stress prediction error (MAE 20.030 MPa) was actually higher than that of the Plastic-Stress combination. This is because plastic strain and stress are both irreversible quantities influenced by historical processes, and they can facilitate each other when sharing the trunk network; in contrast, elastic strain mainly changes rapidly with current temperature and belongs to reversible transient quantities. If the same Trunk network is forced to simultaneously adapt to both elastic and plastic fields, the optimization directions of the network parameters will conflict, leading to the model's inability to balance features of different time scales, ultimately dragging down overall performance.
Therefore, the sharing strategy of DeepONet should follow the similarity of physical evolution laws. Joint training of plasticity and stress, which share similar laws, yields the best results, while forcibly introducing the elastic field with different properties will cause interference.
4.4 Computational efficiency comparison
Beyond prediction accuracy, computational efficiency is a key indicator of whether a surrogate model can support industrial digital twin systems. This section compares the essential difference in time consumption between traditional FEM and the developed PIOL in this study. The computational bottleneck of traditional FEM simulation lies in its high online cost; every time step requires constructing a massive stiffness matrix and performing complex non-linear iterative solutions. The computational volume increases sharply with the number of meshes, causing the simulation of a single deposition layer to often take hours, making it difficult to meet online monitoring needs.
This model adopts an offline training plus online inference mode, transforming heavy physical equation solving into efficient neural network weight matrix multiplication. In particular, the hybrid architecture designed in this study adopts a lightweight CNN-MLP branch for the transient elastic field, avoiding unnecessary recurrent calculation overhead found in full LSTM architectures. Under the same hardware environment, the inference time for full-field physical quantities is compressed to the millisecond level. This acceleration of several orders of magnitude relative to traditional numerical methods breaks through the time bottleneck of physical solving. The extremely low response latency allows this model to be embedded in digital twin systems, making up for the shortcoming of FEM being limited to pre-planning. It can instantly calculate the invisible stress and plastic states inside the component based on real-time collected temperature data, thereby providing a real-time basis for online process adjustment of industrial robots, truly realizing closed-loop manufacturing control.
4.5 Discussion and PIOL framework summary
This study establishes a PIOL framework for strain-stress decoupling prediction in WA-DED. When predicting transient reversible fields (i.e., elastic strain), structural decoupling from history-dependent fields (i.e., plastic strain and stress) is required to avoid feature interference. Conversely, when predicting residual stress, plastic strain should be introduced as a synergistic auxiliary task to enhance physical consistency. This framework is supported by the differentiated operator learning strategy validated in our experiments. For memoryless physical processes governed by instantaneous thermodynamic states (such as elastic strain), the CNN-MLP architecture demonstrates superiority, as introducing recurrent units would unnecessarily increase model complexity and overfitting risks. In sharp contrast, for plastic strain and stress, the CNN-LSTM architecture is essential. Its gating mechanism effectively captures the irreversible cumulative effects during thermal cycles, successfully overcoming the numerical oscillation and divergence issues encountered by continuous Neural ODE solvers when processing stiff thermal history data.
The benchmark constructed in this paper reveals the profound mechanism of feature interaction within the shared operator space. Although different physical quantities follow distinct evolution laws, they share universal basis functions constrained by the same geometric domain. The superior performance of the "Plastic-Stress" configuration indicates that the plastic branch functions as a constraint, forcing the shared trunk network to prioritize encoding spatial modes relevant to historical accumulation, thereby reducing stress prediction error. However, the failure of the "All-Branch" configuration highlights a critical limitation: feature competition. Forcing the shared trunk network to simultaneously encode high-frequency transient elastic modes and low-frequency cumulative plastic trends leads to conflicting optimization directions. This results in an over-smoothed basis space that degrades overall accuracy. Therefore, the sharing strategy in DeepONet is suggested follow the similarity of physical evolution laws.
Based on these findings, a physically decoupled dual-trunk DeepONet is established as the optimal PIOL architecture, as summarized in Fig. 15. This topology implements a mechanism of selective structural coupling. Distinct from the single-trunk sharing strategy, this architecture employs two independent trunk networks: one dedicated to the elastic branch to capture transient reversible modes, and another shared exclusively between the plastic and stress branches to encode history-dependent cumulative modes. This design avoids the interference of transient elastic features with cumulative history learning, while forcing the plastic-stress shared operator to prioritize the encoding of spatial modes governing irreversible material hardening and residual distortion. Consequently, the shared trunk for the history-dependent task acts as a physical regularization term, utilizing the high-fidelity plastic strain gradients to guide the accurate reconstruction of the stress field, thereby effectively filtering out high-frequency elastic noise and achieving the highest physical consistency.