Introduction
In the wood processing industry, solid wood lumber is a key raw material, and its surface quality directly determines the quality of the final product. As the market demand for high-quality wood products continues to grow, the accuracy and efficiency of surface defect inspection of solid wood panels are increasingly demanding. The traditional manual inspection method is susceptible to the subjective factors of inspectors such as fatigue, experience differences, etc., there are leakage, misdiagnosis, and the inspection speed is slow, it is difficult to meet the needs of large-scale industrialized production, and it has become a bottleneck that restricts the production efficiency of the enterprise and product quality and stability of the bottleneck 1.
Surface defect detection of solid wood panels faces many difficulties. First of all, the existing detection model performance is poor, it is difficult to accurately identify the wood surface defects of various types (such as cracks, knots, insect eyes, etc.), resulting in insufficient detection accuracy2. Secondly, the detection efficiency is low, the traditional method can’t meet the needs of large-scale production, and it is difficult to complete the detection of a large number of panels in a short period of time In addition, the number of models for the detection of wood surface defects is scarce, and most of the existing deep learning models are targeted at other domains (e.g., medical images, industrial parts, etc.), and there is a lack of optimized models specifically for wood surface defects3. Traditional detection methods usually rely on hand-designed feature extraction algorithms, which often perform poorly in the face of complex wood texture and diverse defect types, and are difficult to adapt to the complex scenarios in actual production4.
Based on such problems, an improved TL-ResNet50-SE surface defect detection network for solid wood panels is proposed in this paper. This paper presents the following contributions:
• Aiming at the problem of inefficient detection of surface defects on solid wood panels, this paper innovatively proposes an improved TL-ResNet50 model incorporating the attention mechanism. Based on ResNet50, the model is constructed by introducing transfer learning and different attention modules SE and CBAM. The experimental results demonstrate the improved performance of the improved model, among which the TL-ResNet50 model with the introduction of the SE module has a particularly outstanding performance and improves the accuracy of defect detection.
• Aiming at the problem of insufficient defect detection models applied to wood, this paper systematically compares and analyzes the classical models such as ResNet50, VGG16, AlexNet, EfficientNetV2, and the TL-ResNet50 related model which incorporates the attention mechanism. The development of each model in the field of wood defect detection is discussed in detail, and the experimental data such as accuracy rate and loss rate are analyzed in depth to present the advantages and shortcomings of different models in this field, so as to provide a comprehensive and targeted reference basis for the selection of models in practical applications.
The current state of research on the mechanisms of attention
The Attention Mechanism is a method that mimics the human attention mechanism, enabling models to focus on relevant parts and ignore irrelevant information when processing data such as sequences or images 11. In the field of images, for example, in the detection of surface defects of wood panels, images contain a large amount of information, and not all parts are relevant to defect detection. When traditional neural network models process images, they treat all regions equally, making it difficult to accurately extract defect features against complex backgrounds. After the introduction of the attention mechanism, due to its flexibility, the attention mechanism can be added to any deep learning architecture that models complex systems in various ways 12. The model can calculate the importance weights of different regions or features, focus on the regions where defects may exist, and ignore irrelevant background information. In this way, it can more effectively extract features related to the surface defects of wood panels and improve the ability to detect various defects.
The SE attention mechanism can effectively enhance the model's ability to focus on the defect area by adaptively adjusting the weights of feature channels. For example, Jun et al. proposed an improved residual block based on the SE module. The SE module adaptively rescales features by considering the interdependencies between feature channels, improving the representational ability of the network 13. The CBAM attention mechanism can more comprehensively capture the local and global features of the defect area by simultaneously introducing channel attention and spatial attention mechanisms 14. For example, Fu et al. added a CBAM module to each branch of the YOLOv4 feature fusion network. By integrating the CBAM module, the weights of the channel features and spatial features of the feature map are assigned. While suppressing the weights of invalid features, the weights of useful features are increased, paying more attention to the target area containing important information, suppressing irrelevant information, and improving the overall accuracy of target detection 15.
In order to further improve the performance of wood panel defect detection, researchers have begun to explore the integration of SE and CBAM attention mechanisms. However, although the SE and CBAM attention mechanisms have shown great potential in other image fields, in the specific field of wood defect detection, the application of this integration method is still in a relatively primary stage. Due to the diversity, complexity, and particularity of wood defects, there are many technical challenges in successfully integrating these two attention mechanisms and applying them to wood defect detection. Currently, relevant research work is still limited, and no satisfactory and widely influential research results have been achieved yet.
The current state of research on Transfer learning
Transfer learning aims to transfer knowledge learned on one or more source tasks to a target task. In deep learning, pre-trained models learn rich image features and patterns on large-scale generalized datasets. Migrating this pre-trained knowledge to a specific target task can reduce the need for large-scale labeled data, accelerate the convergence speed of the model on new tasks, and improve the generalization ability of the model
16–18. In practical applications, migration learning usually utilizes the model pre-trained on the source task as a starting point, and fine-tunes the model according to the demands of the target task. Setting
are the model parameters, the
is the loss function, the
and
denote the source dataset and the target dataset, respectively. When pre-training the model on the source dataset, by minimizing the loss function
to update the parameters
When fine-tuning on the target dataset to pre-trained parameters
is the initial value,by minimizing the loss function
to further optimize the parameters,which can be expressed as,
, of which
is the optimal parameter after fine-tuning .
The application of transfer learning in wood panel defect detection has significantly improved the performance of the model. For example, in 2021, Gao et al. proposed a new method, TL - ResNet34, which combines ResNet − 34 with transfer learning. The results show that the detection accuracy of the dataset trained by TL - ResNet34 is significantly higher than that of other methods. This indicates that transfer learning can improve the final prediction accuracy of knot defect detection 19. When faced with differences in different environments, lighting conditions, or wood panel materials, the model based on transfer learning can rely on the general features learned in other related fields to better adapt to these changes. Thus, it can maintain a stable and high detection accuracy in various practical scenarios, further enhancing the reliability and practicality of wood panel defect detection.
In this paper, migration learning is introduced in wood defect detection, aiming to utilize the rich features learned by the pre-trained model on large-scale generalized datasets, reduce the demand for large-scale labeled data, accelerate the convergence speed of the model on the task of wood defect detection, and improve the generalization ability of the model. Through migration learning, the TL-ResNet50 model proposed in this paper is able to more accurately recognize multiple types of defects on the surface of wood boards under complex backgrounds, which significantly improves the accuracy and efficiency of defect detection.
The TL-ResNet50 model incorporating the attention mechanism
The TL-ResNet50-SE model
The TL-ResNet50-SE model introduces the Squeeze - Excitation (SE) module on top of TL-ResNet50, which performs a global average pooling operation on the feature maps, compresses the features of each channel into a single value, and obtains the global information between channels. Then, through the bottleneck structure composed of two fully connected layers, it recalibrates the importance between channels, enhances the channel features related to the board surface defects, and suppresses the responses of irrelevant channels. Figure 4. below shows the SE attention module diagram.
The cube X on the left side of the figure represents the input feature map with dimensions H' × W' × C', where H' and W' are the height and width of the feature map, respectively, and C' is the number of channels; and then, by means of a transformation operation
(usually a convolution operation), the input feature map X is converted into a feature map U with dimensions H × W × C, H and W are the new height and width, and C is the number of channels; then a Squeeze operation is performed on the feature map U, denoted as, and the number of channels is the number of channels
This step is usually realized by Global Average Pooling. This step is usually realized by Global Average Pooling (GAP), which compresses the 2D spatial information of each channel into a scalar to obtain a 1 × 1 × C vector; next, the Excitation operation is performed, denoted as
. This step learns the dependencies between channels through a fully connected layer (usually containing two fully connected layers, possibly with activation functions in between), and the output is still a 1 × 1 × C vector; finally, a Scale operation is performed, denoted as, and
This step multiplies the channel weight vectors obtained from the Excitation operation with the original feature map U on a channel-by-channel basis. In this step, the channel weight vectors obtained from the Excitation operation are multiplied channel-by-channel with the original feature map U to obtain the final output feature map
,whose dimensions are H × W × C
22.
The SE module obtains the global information among channels through global average pooling, and then recalibrates the channel importance by using the fully connected layer, so that the model can highlight the channel features related to defects more prominently.The SE module recalibrates the channel weights of the feature map through the channel attention mechanism. The computation process can be divided into the following steps.
1.Squeeze operation (global average pooling):
Among them,
is the value of the feature map at position (i,j) on the th channel.
is the global feature of the th channel.
2.Excitation operation (channel weight calculation):
Among them,
and
is the weight matrix of the two fully connected layers; the
is the RELU activation function; the
is the Sigmoid activation function which is used to normalize the weights to between [0,1].
3.Scale operation (feature map recalibration):
Among them,
is the weight of the th channel; the
is the feature map after recalibration.
Analyzing from the principle, the SE module can adaptively adjust the weights of different channels through the attention mechanism of channel dimensions. In the detection of defects on the surface of solid wood panels, different types of defects may have more significant characteristics in specific channels, and the SE module can enhance the signals of these key channels while weakening the influence of irrelevant channels. Taking the detection of wormhole defects on the surface of wood as an example, the wormhole region may show specific changes in certain color channels, and the SE module can enhance the weights of these color channels related to the wormholes, making it easier for the model to identify the wormhole defects, and maintaining a high detection accuracy even in the case of defects with inconspicuous features or large background interference.
The TL-ResNet50-CBAM model
TL-ResNet50-CBAM model integrates Convolutional Block Attention Module (CBAM). The processing of the input feature map by the Convolutional Block Attention Module (CBAM) is divided into two parts: channel attention and spatial attention. In the channel attention part, the maximum pooling and average pooling are performed on the input feature map, and then the pooling results are fed into the multilayer perceptron respectively, after which the two sets of results are fused and the sigmoid activation function generates the channel attention weights to highlight the important channels; in the spatial attention part, the feature map processed by the channel attention is compressed in the channel dimensions, and then convolved, and then the spatial attention function generates the spatial attention weights by the sigmoid activation function. In the spatial attention part, the feature map is first compressed in channel dimension after channel attention processing, then convolution operation is performed, and finally the spatial attention weights are generated by sigmoid activation function to focus on the key spatial locations. Finally, the channel attention weights and spatial attention weights are multiplied with the input feature map element by element to obtain the output feature map, which realizes the recalibration of features from channel and spatial dimensions, helps the model to better capture the key information, and improves the performance in target detection and other tasks. The module diagram is shown in Figs. 5. below.
In a complex wood panel surface defect detection scenario, the TL-ResNet50-CBAM model can focus on both the channel characteristics and spatial location information of the defects, and accurately detect the defect types of cracks, dead knots, live knots, wormholes, and resins.
The CBAM module calculates the attention from both channel and spatial dimensions. In the channel attention part, the channel weights are obtained through different pooling operations and multilayer perceptron, and in the spatial attention part, the spatial attention map is generated by convolution after channel compression of the feature map, so as to more accurately locate the defects in the image, and to enhance the ability to detect the defects. The calculation process can be divided into the following steps.
1. channel attention
Suppose the input feature map is F ∈
, where C is the number of channels, H is the height and W is the width.
(1)Global Average Pooling and Maximum Pooling:
An average pooling operation is performed on the input feature map F in the spatial dimension (H, W) to obtain a global average feature vector in the channel dimension
.
whose F(c, i, j) denotes the value of the feature map F at channel c, position (i, j).
be
at the value of the channel c, and
∈
.
Maximum pooling operation is performed on the input feature map F in the spatial dimension (H, W) to obtain the global maximum feature vector in the channel dimension
.
(2)Channel Attention Map Generation:
where,
and
are the weight matrices of the fully connected layer; the
is the Sigmoid activation function.
2. The space attention module
(1) Channel compression:
(2)Spatial weighting calculation:
where Conv is a convolution operation.
is the Sigmoid activation function.
(3)Feature map recalibration:
The study shows that the CBAM module is outstanding in improving the model's ability to adapt to complex scenes, and its channel and spatial attention mechanisms effectively improve the model's detection accuracy of different defects on the surface of wood boards, and accurately recognize the defects under complex lighting and background conditions. Through the spatial attention mechanism, the model can accurately lock the defective region in the complex background and reduce the influence of background noise; the channel attention mechanism further optimizes the extraction of different defective features to improve the accuracy and reliability of detection. From the perspective of computational efficiency, although the CBAM module increases a certain amount of computation, but thanks to its efficient attention mechanism, the model can quickly focus on the key information during the actual operation, and the overall detection speed is not significantly affected. In the real-time detection of a large number of solid wood panel images, the TL-ResNet50-CBAM model can quickly focus on the key features of the target object, and improve the detection accuracy while maintaining a high detection speed, with good practicality 23.
TL-ResNet50-SE + CBAM model
The TL-ResNet50-SE + CBAM model integrates the advantages of the SE module and CBAM module to fully recalibrate the features from both channel and spatial dimensions. When dealing with the task of solid wood panel surface defect detection, the model firstly pools the global average of the feature map through the SE module to obtain the global information between channels, and then recalibrates the channel importance by using the bottleneck structure composed of two fully connected layers to highlight the defect-related channel features and suppress the irrelevant channel responses.
However, some studies have shown that integrating multiple attention modules does not always lead to accuracy improvement. In the image detection experiments, when integrating both the SE module and the CBAM module, the increase in model complexity and the possible conflicts between the modules resulted in the detection accuracy of the model being lower than that of one of the modules alone in some cases. Although the model theoretically integrates the advantages of the two modules, in the actual surface defect detection of solid wood panels, the accuracy of the fusion module is not effectively improved. Combining the SE and CBAM modules in the convolutional neural network model for material defect detection, it was found that although the model increased in the diversity of feature extraction, due to the increase in the number of parameters and the rise in the amount of computation, the model appeared to be overfitting phenomenon in the case of limited training data, resulting in the detection accuracy could not be improved, or even declined.
Experimental results and analysis
Experimental setup
The experimental environment
The hardware and software configurations used in this experiment are shown in Table 1.
Table 1
Experimental environment configuration
Name | Configuration Information |
|---|
Operating Systems | Windows11 |
CPU | 12th Gen Intel Core i7-12700H |
Memory | 16GB |
GPU | NVIDIA GeForce RTX 3050 |
Deep Learning Frameworks | Pytorch 2.5.1 |
Programming Languages | Python3.9 |
Data processing
The dataset used in this study is primarily sourced from the large-scale image dataset of wood surface defects available on Kaggle, which encompasses various types of solid wood boards and a range of surface defects under different production stages and conditions. To ensure the model's generalization capability, the dataset includes images captured under various lighting conditions, shooting angles, and levels of defect severity.
The dataset was expanded through data augmentation, employing specific methods such as: geometric transformations, which involve translating, rotating, scaling, and flipping the original images to simulate different postures and positional changes of the boards on the production line; color transformations, adjusting image brightness, contrast, saturation, and hue to mimic different lighting and imaging environments; noise addition, incorporating Gaussian noise, salt-and-pepper noise, etc., to simulate interference factors in actual production environments; and cropping and splicing, randomly cropping and combining images to increase the variety of defect combinations and contextual information24. The augmented dataset comprises a total of 4016 images, divided into training and validation sets in an 8:2 ratio25. The effect is illustrated in Fig. 6.
Parameter settings
The initial learning rate is set between 0.001 and 0.1, for example, 0.01. A learning rate decay strategy is employed, where after a certain number of training epochs (e.g., 10 epochs), the learning rate is multiplied by a decay factor less than 1 (e.g., 0.1). This gradually adjusts the learning step size during training, preventing the learning rate from being too large in the later stages, which could hinder convergence, or too small in the early stages, which could slow down training26.
In terms of training settings, the number of training epochs is set to 100, and this is adjusted based on the dataset size, complexity, and model convergence. The batch size is set to 64, and this is balanced against hardware constraints. A larger batch size can leverage the parallel computing capabilities of GPUs but consumes more memory. During training, stochastic gradient descent (SGD) or its variants (e.g., the Adam optimizer) are used to update the model parameters. The Adam optimizer has the advantage of adaptive learning rates, automatically adjusting the learning rate for each parameter, and often performs well in many scenarios.
Evaluation metrics
Accuracy, Loss, Precision, Recall, and F1 score are adopted as evaluation metrics to comprehensively assess the model's performance in the task of solid wood surface defect detection. Accuracy: This metric intuitively reflects the overall performance of the model in classification tasks, calculated as the ratio of correctly predicted samples to the total number of samples27. The formula is:
At the end of each training epoch or during the testing phase, the accuracy is calculated on the validation or test set using this formula to observe the model's training effectiveness and generalization capability.
Loss: his study employs the Cross-Entropy Loss to measure the difference between the model's predictions and the true labels. For multi-class classification problems, the formula is:
where 𝑁 is the number of samples, 𝐶 is the number of classes,
represents the true label of sample 𝑖 for class 𝑗 (typically one-hot encoded), and 𝑝𝑖𝑗 represents the predicted probability of sample 𝑖 belonging to class 𝑗. During model training, the loss function is minimized to adjust the model parameters, bringing the predictions closer to the true labels.
Precision: Precision reflects the proportion of correctly predicted positive samples among all samples predicted as positive, indicating the accuracy of the model's predictions. The formula is:
Recall: Recall measures the model's ability to detect actual positive samples, i.e., the proportion of actual positive samples that are correctly detected. The formula is.
F1 Score (F1 - Score): The F1 score comprehensively considers both precision and recall, providing a more balanced evaluation of the model's performance. The formula is:
A higher F1 score indicates a better balance between precision and recall, reflecting superior model performance.
Here, 𝑇𝑃 (True Positives) represents the number of samples where the model correctly detects wood defects; 𝐹𝑃 (False Positives) represents the number of samples where the model incorrectly identifies normal regions as defects; 𝑇𝑁 (True Negatives) represents the number of samples where the model correctly identifies normal regions; and 𝐹𝑁 (False Negatives) represents the number of samples where the model fails to detect actual defects. These evaluation metrics enable a quantitative assessment of the performance of different models in the task of solid wood surface defect detection.
Ablation experiments
To validate the effectiveness of transfer learning in the ResNet50 model and the attention mechanism in the TL-ResNet50 model, the following ablation experiments were designed. The experiments use ResNet50 as the baseline model. First, transfer learning (TL-ResNet50) is applied to enhance the model's performance. Subsequently, based on TL-ResNet50, the SE module, CBAM module, and a combination of SE + CBAM modules are introduced separately to further explore the impact of attention mechanisms on the model's performance.
The following Table 2. shows the effects of the TL, SE module and CBAM module on the ResNet50 model, using the ResNet50 model as a baseline.
Table 2
Evaluation metrics of ablation experiments in wood defect detection
TL | SE | CBAM | Accuracy | loss ratio | Accuracy | Recall | F1 score |
|---|
| | | | 0.747 | 0.632 | 0.747 | 0.748 | 0.746 |
√ | | | 0.849 | 0.598 | 0.848 | 0.852 | 0.846 |
√ | √ | | 0.882 | 0.473 | 0.881 | 0.884 | 0.881 |
√ | | √ | 0.860 | 0.393 | 0.858 | 0.861 | 0.859 |
√ | √ | √ | 0.774 | 0.619 | 0.793 | 0.776 | 0.772 |
After adding the SE module, the accuracy of the model in the classification task is improved, especially in the extraction and classification of small target features. This is because the SE module can adaptively adjust the importance of the feature channels by weighting the channel dimensions, which strengthens the learning of the key features and suppresses the irrelevant information, and thus improves the recognition ability of the model for small targets.
After adding the CBAM module, the model shows better boundary localization ability in the image segmentation task, and the segmentation accuracy is improved. The CBAM module applies the attention mechanism simultaneously in both the channel and spatial dimensions. It not only focuses on the importance of different feature channels but also screens the spatial positions of the feature maps. This enables the model to more accurately focus on the boundaries of the target objects, thereby improving the accuracy of defect detection 28.
After adding the module of SE + CBAM, theoretically, the model has the strongest feature extraction and analysis ability, and it can focus on all aspects of defects more comprehensively than the model with the module of SE or CBAM alone. However, from the experimental results, the accuracy of the model fluctuates between 0.78 and 0.84 although it can reach a high level, which indicates that the model has some problems in stability, and further study reveals that this stability problem may be due to the insufficient synergy mechanism between the two attention modules. When dealing with different types of defects and diverse background situations, the SE module and the CBAM module sometimes conflict, resulting in unstable model decisions. For example, when detecting boards with both minor discoloration and complex texture interference, the SE module focuses on the discoloration-related channels, while the CBAM module, when locating and enhancing the defect features, may deviate from the discoloration region due to the texture interference, thus affecting the stability of the overall detection results. In order to solve this problem, some studies have tried to adjust the connection order and weight allocation of the two modules, but the ideal solution has not yet been obtained and still needs to be explored in depth.
In summary, the ablation experiments fully validate the importance of incorporating transfer learning and the SE and CBAM modules in enhancing the performance of the TL-ResNet50 model, providing a strong basis for subsequent model improvements. They also further emphasize the core value of attention mechanisms in complex image defect detection tasks, helping researchers gain a deeper understanding of the model's working principles and enabling targeted optimization of the model structure to improve detection accuracy and stability. However, it is also important to note that excessive use of attention mechanisms may have negative impacts.
Experimental results and analysis
In the study of solid wood surface defect detection, the performance and selection of models play a crucial role in the accuracy and efficiency of detection. This research conducts experimental analysis on various models, comparing their performance. First, the TL-ResNet50 model, obtained through transfer learning, is compared with the baseline ResNet50 model to demonstrate the advantages of transfer learning. Subsequently, the experimental results of multiple models incorporating different attention mechanisms, as well as other classic models such as EfficientNetV2, VGG16, and AlexNet, are compared and analyzed in terms of accuracy, loss rate, and stability. Finally, based on the predictive performance of different models for various types of defects, the capability differences of each model in detecting different defects are explored, providing specific guidance for model selection in actual production29.
Comparison of results between ResNet50 and TL-ResNet50 models
In this study, the TL-ResNet50 model was obtained by applying transfer learning to the ResNet50 model. Transfer learning utilizes the parameters of the ResNet50 model pre-trained on a large-scale general dataset as initialization, followed by fine-tuning on the solid wood surface defect detection dataset. This approach allows the model to reuse general image features, such as edges and textures, learned from the source task, reducing the need for large-scale annotated data and accelerating the model's convergence speed on the new task. As a result, the model's accuracy is effectively improved, the loss rate is reduced, and its generalization capability is enhanced, making it better suited for the task of solid wood surface defect detection.
As shown in Figs. 7 and 8., from the experimental results, the validation accuracy of ResNet50 fluctuates roughly around 0.75, and the loss rate of the validation set fluctuates roughly around 0.7.
As shown in Figs. 9. and 10., the accuracy of TL - ResNet50 is improved to between 0.8–0.85, and the loss rate is also decreased from 0.632 to 0.598. The results show that the transfer learning plays an important role in the task of detecting surface defects on solid wood boards.
Prediction results of various models
As shown in Fig. 11., t the defect prediction results of different models reveal variations in their detection capabilities for different types of defects. The TL-ResNet50-CBAM and TL-ResNet50-SE models exhibit outstanding performance in detecting small and irregular defects. For instance, these models can accurately identify minor cracks on wood surfaces and irregularly shaped wormhole defects, with high prediction probabilities. VGG16 performs relatively well in detecting defects with distinct texture features, such as live knots on wood surfaces. Its structured convolutional layer stacking aids in extracting texture features of live knots, enabling accurate detection. AlexNet demonstrates faster detection speeds for larger defects, but its accuracy needs improvement. While it can quickly identify larger dead knots, its prediction probability is relatively lower compared to other models in some cases. EfficientNetV2 can rapidly locate defects in complex backgrounds but falls short in defect classification accuracy. For example, when detecting boards with both complex textures and defects, it may misclassify defect types. These results provide specific guidance for selecting appropriate models based on common defect types in actual production. Enterprises can choose detection models tailored to the predominant defect types in their production processes, thereby improving detection efficiency and accuracy.
Comparison of experimental results for each model
As shown in Figs. 12. and 13., the line graphs compare the accuracy and loss rates of different models. By comparing the accuracy and loss rates of various models, it is evident that the TL-ResNet50-CBAM and TL-ResNet50-SE models, which incorporate attention mechanisms, perform exceptionally well in most training phases. Their validation accuracy stabilizes around 0.85 with an upward trend, while the loss function values steadily decrease. This is attributed to the attention mechanisms enabling the models to automatically focus on the most critical parts of the input data, recalibrating features from both channel and spatial dimensions, thereby enhancing the ability to extract defect features. The TL-ResNet50 model itself demonstrates reliable performance, with accuracy fluctuating between 0.8 and 0.85. However, the TL-ResNet50-SE + CBAM model, while achieving relatively high accuracy, shows significant fluctuations between 0.78 and 0.84, indicating room for improvement in stability. This instability may stem from insufficient coordination between the two attention modules, leading to conflicts when handling different types of defects and background conditions. The generalization capability of ResNet50 is slightly inferior to that of the TL-ResNet50 series models, with accuracy fluctuating around 0.75. EfficientNetV2 exhibits lower accuracy and greater fluctuations, ranging between 0.65 and 0.8, and demonstrates unstable performance when generalizing to new data. Despite its innovative design, it underperforms in this experimental task and requires further tuning and optimization. VGG16 and AlexNet show relatively lower accuracy, ranging between 0.7–0.75 and 0.65–0.7, respectively, with slow upward trends. This is because VGG16 has a large number of parameters and high computational complexity, while AlexNet has a relatively shallow structure, limiting their ability to handle complex data.
In the detection of surface defects in solid wood panels, the performance of different models varies. Table 3. below shows the evaluation indexes of different models for wood defect detection.
Table 3
Evaluation metrics of different models for wood defect detection
Model | Accuracy | loss ratio | Accuracy | Recall | F1 score |
|---|
AlexNet | 0.662 | 1.800 | 0.661 | 0.660 | 0.661 |
VGG16 | 0.740 | 0.671 | 0.777 | 0.772 | 0.774 |
EfficientNetV2 | 0.792 | 0.509 | 0.799 | 0.796 | 0.795 |
TL-ResNet50-SE | 0.882 | 0.473 | 0.881 | 0.884 | 0.881 |
According to the table, the overall performance of the model based on TL-ResNet50 and incorporating the attention mechanism is excellent, which can identify defects more accurately, and has good training effect and stability; EfficientNetV2 has a certain detection ability, but its generalization and stability are insufficient; VGG16 and AlexNet, constrained by their structural limitations, underperform when handling complex data. The TL-ResNet50-SE + CBAM model with integrated dual-attention module is theoretically powerful, but its stability is poor in practical application due to the module synergy problem.
Conclusion
This study focuses on the detection of surface defects in solid wood boards, systematically evaluating the performance of various deep learning models in this task. The research reveals that different models have their own strengths and weaknesses when handling surface defect detection in solid wood boards.
Traditional models such as VGG16 and AlexNet exhibit certain limitations. The VGG16 model, with its large number of parameters and high computational complexity, is prone to overfitting. The AlexNet model, with its relatively shallow structure, has limited feature extraction capabilities, making it difficult to handle complex and variable defect scenarios. Although the EfficientNetV2 model incorporates innovative design concepts, its generalization ability and stability in this experiment need improvement. When faced with special defect types or defects arising from new production processes, its detection accuracy fluctuates significantly.
In contrast, models based on TL-ResNet50, which combine transfer learning and attention mechanisms, demonstrate superior performance. Transfer learning enables the model to leverage knowledge acquired from large-scale general datasets, allowing it to converge quickly on a smaller dataset of wood board images, thereby improving training efficiency and generalization ability. Attention mechanisms enable the model to focus on defect regions, enhancing its ability to extract defect features and thus improving detection accuracy. Experimental results show that TL-ResNet50 models incorporating attention modules such as SE and CBAM achieve significant improvements in metrics like accuracy. Specifically, the TL-ResNet50-CBAM and TL-ResNet50-SE models achieve stable validation accuracy around 0.85, with strong generalization capabilities. Ablation experiments further confirm the critical role of attention mechanisms in enhancing model performance. After adding the SE or CBAM modules, the model's accuracy significantly improves, and the loss function value decreases. However, although the TL-ResNet50-SE + CBAM model theoretically has the strongest feature extraction capability, it suffers from stability issues, with significant fluctuations in accuracy, necessitating further optimization for practical applications.
Nevertheless, this study has some limitations. On the one hand, the scale and diversity of the dataset need further expansion. The current dataset may not fully represent all possible defect scenarios, leading to degraded detection performance when encountering unseen defect types or boards under special production conditions. In the future, the dataset should be expanded to include images of wood boards from different origins, tree species, and processing techniques, as well as more types and severity levels of defect samples, to enhance the model's generalization ability and better adapt to complex real-world production scenarios. On the other hand, the application of multimodal data fusion techniques in surface defect detection for solid wood boards is still in its infancy. Integrating multimodal data such as infrared images and texture depth information could provide richer feature information for the model, further improving detection accuracy. Additionally, most current deep learning models are black-box models, making it difficult for production personnel to understand the decision-making process, which limits their practical application to some extent. Therefore, enhancing research on model interpretability and developing visualization tools and explanatory algorithms to help production personnel intuitively understand how the model identifies defects will improve the model's credibility and applicability. Furthermore, the deep integration of detection technology with actual production processes requires further research to achieve real-time and efficient online detection.
In the future, we will focus on optimizing model architectures, exploring more advanced network structures and improvement strategies, such as incorporating Transformer structures to enhance the model's ability to process long-sequence features, thereby further improving its performance in detecting complex defects. At the same time, we will strengthen research and application of multimodal data fusion techniques, promote the deep integration of detection technology with production processes, and further enhance detection performance and practicality, contributing to the intelligent development of the wood processing industry.
Data availability
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.