A
TL-ResNet50-SE: An Attention-Enhanced Transfer Learning Model for Surface Defect Detection in Solid Wood Panels
HaojieChai1
JiayuXu1
YanruQiao1
YuwenLu1
RuiqiShi1
YanyanWang1✉Email
RanXue1
JianfengZheng1
1School of Artificial IntelligenceHenan Institute of Science and Technology453003XinxiangChina
Haojie Chai1, Jiayu Xu1, Yanru Qiao1, Yuwen Lu1, Ruiqi Shi1, Yanyan Wang1,*, Ran Xue1, and Jianfeng Zheng1
1School of Artificial Intelligence, Henan Institute of Science and Technology, Xinxiang 453003, China
* wang_yy@hist.edu.cn
ABSTRACT
Traditional surface defect detection methods for solid wood panels have gradually exposed many drawbacks, such as poor performance of the detection model, low detection efficiency, and the scarcity of defect detection models applied to wood, which are difficult to meet the increasing production demand. In order to address the above problems, this paper proposes a TL-ResNet50 (Transfer Learning, abbreviated as TL) method for surface defect detection of solid wood panels by integrating the attention mechanism. First, the TL-ResNet50 model is constructed by introducing transfer learning, which improves the feature extraction capability and detection accuracy of the model compared with the ResNet50 model. Secondly, the SE (Squeeze-and-Excitation) and CBAM (Convolutional Block Attention Module) modules were integrated, and through ablation experiments, it was found that the TL-ResNet50-SE model was able to effectively capture the local and global features of the wood surface, thus improving the defect detection accuracy. The accuracy of defect detection is improved. Finally, the performance of classical models ResNet50, VGG16, AlexNet, EfficientNetV2, and the improved TL-ResNet50 model is systematically evaluated by comparing and analyzing the performance of each model in the wood defect detection task. The detection accuracy of the proposed model is 88.2%, which significantly improves the accuracy of surface defect detection of solid wood panels compared with other models.
A
A
Introduction
In the wood processing industry, solid wood lumber is a key raw material, and its surface quality directly determines the quality of the final product. As the market demand for high-quality wood products continues to grow, the accuracy and efficiency of surface defect inspection of solid wood panels are increasingly demanding. The traditional manual inspection method is susceptible to the subjective factors of inspectors such as fatigue, experience differences, etc., there are leakage, misdiagnosis, and the inspection speed is slow, it is difficult to meet the needs of large-scale industrialized production, and it has become a bottleneck that restricts the production efficiency of the enterprise and product quality and stability of the bottleneck 1.
Surface defect detection of solid wood panels faces many difficulties. First of all, the existing detection model performance is poor, it is difficult to accurately identify the wood surface defects of various types (such as cracks, knots, insect eyes, etc.), resulting in insufficient detection accuracy2. Secondly, the detection efficiency is low, the traditional method can’t meet the needs of large-scale production, and it is difficult to complete the detection of a large number of panels in a short period of time In addition, the number of models for the detection of wood surface defects is scarce, and most of the existing deep learning models are targeted at other domains (e.g., medical images, industrial parts, etc.), and there is a lack of optimized models specifically for wood surface defects3. Traditional detection methods usually rely on hand-designed feature extraction algorithms, which often perform poorly in the face of complex wood texture and diverse defect types, and are difficult to adapt to the complex scenarios in actual production4.
Based on such problems, an improved TL-ResNet50-SE surface defect detection network for solid wood panels is proposed in this paper. This paper presents the following contributions:
• Aiming at the problem of inefficient detection of surface defects on solid wood panels, this paper innovatively proposes an improved TL-ResNet50 model incorporating the attention mechanism. Based on ResNet50, the model is constructed by introducing transfer learning and different attention modules SE and CBAM. The experimental results demonstrate the improved performance of the improved model, among which the TL-ResNet50 model with the introduction of the SE module has a particularly outstanding performance and improves the accuracy of defect detection.
• Aiming at the problem of insufficient defect detection models applied to wood, this paper systematically compares and analyzes the classical models such as ResNet50, VGG16, AlexNet, EfficientNetV2, and the TL-ResNet50 related model which incorporates the attention mechanism. The development of each model in the field of wood defect detection is discussed in detail, and the experimental data such as accuracy rate and loss rate are analyzed in depth to present the advantages and shortcomings of different models in this field, so as to provide a comprehensive and targeted reference basis for the selection of models in practical applications.
Related work
The current state of research on deep learning for defect detection on wood panel surfaces
Deep learning has made significant progress in the field of target detection, and many models are widely used in various scenarios. In the detection of surface defects of solid wood panels, different deep learning models show their own characteristics and advantages.
A
The structure of VGG16 model is relatively simple and regular, by stacking multiple smaller convolutional layers and pooling layers to build a deep network, which can effectively extract the texture, edge and other features of the image.The application of The VGG16 model features a relatively simple and regular structure, constructed by stacking multiple small convolutional layers and pooling layers to build a deep network. It effectively extracts image features such as textures and edges. While existing research has demonstrated VGG16's application in wood surface defect detection, where its multi-layer convolutional operations progressively extract effective feature representations, its substantial parameter size and high computational complexity result in low efficiency in practical applications 5.
AlexNet extended the network depth based on LeNet's architecture, enabling it to learn richer, higher-dimensional features. This model employs data augmentation and dropout regularization to prevent overfitting, thereby enhancing its data fitting capability. It demonstrates superior performance with limited sample sizes 6. As the first model to introduce the ReLU activation function, it addresses the vanishing gradient problem and achieves faster training speeds. The incorporated local response normalization layer improves model generalization, showing good adaptability when processing wood surface images under varying lighting and texture conditions 7. For instance, Urbonas et al. utilized pre-trained AlexNet for veneer surface defect detection, finding it capable of rapidly identifying various surface defects. However, its shallow network structure limits feature extraction capacity, making it challenging to capture subtle defects on wood surfaces 8.
The EfficientNetV2 model innovatively combines training-aware neural architecture search with scaling strategies, along with progressive learning approaches, significantly improving training efficiency while maintaining detection accuracy 9. It achieves superior performance in defect detection against complex backgrounds. For example, Zhang et al. applied EfficientNetV2 to printed circuit board (PCB) defect detection, enhancing detailed information extraction capabilities for PCB components 10.
In summary, different deep learning models show their respective features and advantages in wood surface defect detection, but there are still some limitations. Future research can combine the advantages of multiple models to further improve the accuracy and efficiency of wood surface defect detection.
The current state of research on the mechanisms of attention
The Attention Mechanism is a method that mimics the human attention mechanism, enabling models to focus on relevant parts and ignore irrelevant information when processing data such as sequences or images 11. In the field of images, for example, in the detection of surface defects of wood panels, images contain a large amount of information, and not all parts are relevant to defect detection. When traditional neural network models process images, they treat all regions equally, making it difficult to accurately extract defect features against complex backgrounds. After the introduction of the attention mechanism, due to its flexibility, the attention mechanism can be added to any deep learning architecture that models complex systems in various ways 12. The model can calculate the importance weights of different regions or features, focus on the regions where defects may exist, and ignore irrelevant background information. In this way, it can more effectively extract features related to the surface defects of wood panels and improve the ability to detect various defects.
The SE attention mechanism can effectively enhance the model's ability to focus on the defect area by adaptively adjusting the weights of feature channels. For example, Jun et al. proposed an improved residual block based on the SE module. The SE module adaptively rescales features by considering the interdependencies between feature channels, improving the representational ability of the network 13. The CBAM attention mechanism can more comprehensively capture the local and global features of the defect area by simultaneously introducing channel attention and spatial attention mechanisms 14. For example, Fu et al. added a CBAM module to each branch of the YOLOv4 feature fusion network. By integrating the CBAM module, the weights of the channel features and spatial features of the feature map are assigned. While suppressing the weights of invalid features, the weights of useful features are increased, paying more attention to the target area containing important information, suppressing irrelevant information, and improving the overall accuracy of target detection 15.
In order to further improve the performance of wood panel defect detection, researchers have begun to explore the integration of SE and CBAM attention mechanisms. However, although the SE and CBAM attention mechanisms have shown great potential in other image fields, in the specific field of wood defect detection, the application of this integration method is still in a relatively primary stage. Due to the diversity, complexity, and particularity of wood defects, there are many technical challenges in successfully integrating these two attention mechanisms and applying them to wood defect detection. Currently, relevant research work is still limited, and no satisfactory and widely influential research results have been achieved yet.
The current state of research on Transfer learning
Transfer learning aims to transfer knowledge learned on one or more source tasks to a target task. In deep learning, pre-trained models learn rich image features and patterns on large-scale generalized datasets. Migrating this pre-trained knowledge to a specific target task can reduce the need for large-scale labeled data, accelerate the convergence speed of the model on new tasks, and improve the generalization ability of the model1618. In practical applications, migration learning usually utilizes the model pre-trained on the source task as a starting point, and fine-tunes the model according to the demands of the target task. Setting
are the model parameters, the
is the loss function, the
and
denote the source dataset and the target dataset, respectively. When pre-training the model on the source dataset, by minimizing the loss function
to update the parameters
When fine-tuning on the target dataset to pre-trained parameters
is the initial value,by minimizing the loss function
to further optimize the parameters,which can be expressed as,
, of which
is the optimal parameter after fine-tuning .
The application of transfer learning in wood panel defect detection has significantly improved the performance of the model. For example, in 2021, Gao et al. proposed a new method, TL - ResNet34, which combines ResNet − 34 with transfer learning. The results show that the detection accuracy of the dataset trained by TL - ResNet34 is significantly higher than that of other methods. This indicates that transfer learning can improve the final prediction accuracy of knot defect detection 19. When faced with differences in different environments, lighting conditions, or wood panel materials, the model based on transfer learning can rely on the general features learned in other related fields to better adapt to these changes. Thus, it can maintain a stable and high detection accuracy in various practical scenarios, further enhancing the reliability and practicality of wood panel defect detection.
In this paper, migration learning is introduced in wood defect detection, aiming to utilize the rich features learned by the pre-trained model on large-scale generalized datasets, reduce the demand for large-scale labeled data, accelerate the convergence speed of the model on the task of wood defect detection, and improve the generalization ability of the model. Through migration learning, the TL-ResNet50 model proposed in this paper is able to more accurately recognize multiple types of defects on the surface of wood boards under complex backgrounds, which significantly improves the accuracy and efficiency of defect detection.
The proposed method
The method proposed in this chapter consists of three key parts, and the specific flow is shown in Fig. 1. Firstly, the dataset preprocessing, the defective images in the plank dataset are finely preprocessed, and the processed plank dataset is divided according to the ratio of 80% for training and 20% for validation, so as to provide reasonable data support for the training and performance evaluation of the model. Next is the model construction part, which takes the classic ResNet50 model as the basic architecture, and introduces the migration learning technology on top of it, so that the model can quickly utilize the parameters and feature representations of the pre-trained model to reduce the training time and data requirements. In order to further enhance the ability of the model to capture and analyze the key features, different attention mechanism modules are added in the corresponding positions of the ResNet50 model. On the one hand, the SE attention mechanism is embedded to construct the TL-ResNet50-SE model. On the other hand, CBAM is added to form the TL-ResNet50-CBAM model. The TL-ResNet50-SE model and the TL-ResNet50-CBAM model constructed above are used to process the input preprocessed plank image data by deep computing. The network structure of each layer inside the model, including convolutional layer, pooling layer, attention module, etc., work together to gradually extract the deep features in the image, and make classification judgment through the fully connected layer. Finally, the model outputs the prediction results, which clearly identifies whether there are defects on the surface of solid wood panels and the types of defects, and displays them in an intuitive way, providing an accurate and reliable basis for the quality inspection of solid wood panels.
Fig. 1
General flowchart of the method for detecting surface defects in wood panels
Click here to Correct
TL-ResNet50 network model
ResNet50 network architecture
ResNet50 is a deep convolutional neural network. Its core innovation lies in the introduction of the residual block structure, which is used to address the degradation problem in deep networks. Meanwhile, this structure enables the network to be deepened as much as possible, allowing it to learn more representative features 20.
Figure 2. shows the architecture of the ResNet50 residual network and the structure of the Residual Block (RESBLOCK) in it.The basic module of ResNet is the Residual Block, which is stacked to form the complete Reset network .ResNet is a deep convolutional neural network used for computer vision tasks such as image recognition. ResNet is a deep convolutional neural network for computer vision tasks such as image recognition. The overall process starts from the input of image_data, and goes through a series of convolutional layers (COV), batch normalization (BN), activation function (Relu), max_pooling, and multiple residual blocks to gradually extract the features of the image. The residual block solves the problems of gradient disappearance and training difficulty in deep neural networks by shortcut connection, and has different processing paths according to whether the input and output dimensions are the same. Finally, the average pooling (AveragePool), fully connected layer (FC1000) and softmax layer are used to complete the classification task.
Fig. 2
Diagram of the ResNet50 network structure
Click here to Correct
Through residual connectivity, ResNet50 is able to maintain the effective propagation of gradients during the training process, thus supporting deeper network structure and enhancing the expressive power of the model. This design enables ResNet50 to perform well in tasks such as image classification and target detection, especially in capturing more detailed features when processing wood surface images with complex textures and diverse defects 21.
TL-ResNet50 network architecture
This model is based on migration learning with the help of pre-trained ResNet50, as shown in Fig. 3. the image data is obtained from Kaggle, the data is first processed by convolutional layer (COV_7X7_64(s2)), and then by batch normalization (BN) and ReLU activation function, followed by initial extraction of the features by using maximum pooling (Max_pooling, 3X3(s2)). Some layers were frozen to prevent overfitting during fine-tuning. Subsequently, multiple residual blocks (RESBLOCK) are relayed to extract features. Stages 1 to 4 all extract features by residual blocks. Residual blocks are introduced in stage 1 to solve the gradient problem and improve the feature extraction efficiency; stage 2 continues to deepen the feature mining; stage 3 further extracts more abstract features; stage 4 extracts the most advanced abstract features, which are mainly used for target classification. Finally, after the global average pooling (Global AveragePool) to reduce the dimensionality, connected to the Fully Connected Layer (Fully Connected Layer), the output dimension is adjusted to 5, and then through the softmax function to complete the 5 classification task.
Fig. 3
TL-ResNet50 network structure diagram
Click here to Correct
This model utilizes migration learning, combines pre-training parameters with task-specific fine-tuning to efficiently achieve image classification, quickly adapt to new detection tasks, and improve the training efficiency and detection accuracy of the model. For example, when processing images of solid wood panels with different textures and colors, the pre-training parameters can help the model capture the defect-related features faster, reduce the training time, and improve the recognition ability of different types of defects. Through fine-tuning, the model can further optimize the parameters according to the characteristics of defects on the surface of solid wood panels to improve the detection ability of specific defects.
The TL-ResNet50 model incorporating the attention mechanism
The TL-ResNet50-SE model
The TL-ResNet50-SE model introduces the Squeeze - Excitation (SE) module on top of TL-ResNet50, which performs a global average pooling operation on the feature maps, compresses the features of each channel into a single value, and obtains the global information between channels. Then, through the bottleneck structure composed of two fully connected layers, it recalibrates the importance between channels, enhances the channel features related to the board surface defects, and suppresses the responses of irrelevant channels. Figure 4. below shows the SE attention module diagram.
Fig. 4
Diagram of the Squeeze-and-Excitation (SE) Attention Module
Click here to Correct
The cube X on the left side of the figure represents the input feature map with dimensions H' × W' × C', where H' and W' are the height and width of the feature map, respectively, and C' is the number of channels; and then, by means of a transformation operation
(usually a convolution operation), the input feature map X is converted into a feature map U with dimensions H × W × C, H and W are the new height and width, and C is the number of channels; then a Squeeze operation is performed on the feature map U, denoted as, and the number of channels is the number of channels
This step is usually realized by Global Average Pooling. This step is usually realized by Global Average Pooling (GAP), which compresses the 2D spatial information of each channel into a scalar to obtain a 1 × 1 × C vector; next, the Excitation operation is performed, denoted as
. This step learns the dependencies between channels through a fully connected layer (usually containing two fully connected layers, possibly with activation functions in between), and the output is still a 1 × 1 × C vector; finally, a Scale operation is performed, denoted as, and
This step multiplies the channel weight vectors obtained from the Excitation operation with the original feature map U on a channel-by-channel basis. In this step, the channel weight vectors obtained from the Excitation operation are multiplied channel-by-channel with the original feature map U to obtain the final output feature map
,whose dimensions are H × W × C 22.
The SE module obtains the global information among channels through global average pooling, and then recalibrates the channel importance by using the fully connected layer, so that the model can highlight the channel features related to defects more prominently.The SE module recalibrates the channel weights of the feature map through the channel attention mechanism. The computation process can be divided into the following steps.
1.
Squeeze operation (global average pooling):
(1)
Among them,
is the value of the feature map at position (i,j) on the th channel.
is the global feature of the th channel.
2.
Excitation operation (channel weight calculation):
(2)
Among them,
and
is the weight matrix of the two fully connected layers; the
is the RELU activation function; the
is the Sigmoid activation function which is used to normalize the weights to between [0,1].
3.
Scale operation (feature map recalibration):
(3)
Among them,
is the weight of the th channel; the
is the feature map after recalibration.
Analyzing from the principle, the SE module can adaptively adjust the weights of different channels through the attention mechanism of channel dimensions. In the detection of defects on the surface of solid wood panels, different types of defects may have more significant characteristics in specific channels, and the SE module can enhance the signals of these key channels while weakening the influence of irrelevant channels. Taking the detection of wormhole defects on the surface of wood as an example, the wormhole region may show specific changes in certain color channels, and the SE module can enhance the weights of these color channels related to the wormholes, making it easier for the model to identify the wormhole defects, and maintaining a high detection accuracy even in the case of defects with inconspicuous features or large background interference.
The TL-ResNet50-CBAM model
TL-ResNet50-CBAM model integrates Convolutional Block Attention Module (CBAM). The processing of the input feature map by the Convolutional Block Attention Module (CBAM) is divided into two parts: channel attention and spatial attention. In the channel attention part, the maximum pooling and average pooling are performed on the input feature map, and then the pooling results are fed into the multilayer perceptron respectively, after which the two sets of results are fused and the sigmoid activation function generates the channel attention weights to highlight the important channels; in the spatial attention part, the feature map processed by the channel attention is compressed in the channel dimensions, and then convolved, and then the spatial attention function generates the spatial attention weights by the sigmoid activation function. In the spatial attention part, the feature map is first compressed in channel dimension after channel attention processing, then convolution operation is performed, and finally the spatial attention weights are generated by sigmoid activation function to focus on the key spatial locations. Finally, the channel attention weights and spatial attention weights are multiplied with the input feature map element by element to obtain the output feature map, which realizes the recalibration of features from channel and spatial dimensions, helps the model to better capture the key information, and improves the performance in target detection and other tasks. The module diagram is shown in Figs. 5. below.
Fig. 5
CBAM Attention Mechanisms
Click here to Correct
In a complex wood panel surface defect detection scenario, the TL-ResNet50-CBAM model can focus on both the channel characteristics and spatial location information of the defects, and accurately detect the defect types of cracks, dead knots, live knots, wormholes, and resins.
The CBAM module calculates the attention from both channel and spatial dimensions. In the channel attention part, the channel weights are obtained through different pooling operations and multilayer perceptron, and in the spatial attention part, the spatial attention map is generated by convolution after channel compression of the feature map, so as to more accurately locate the defects in the image, and to enhance the ability to detect the defects. The calculation process can be divided into the following steps.
1. channel attention
Suppose the input feature map is F ∈
, where C is the number of channels, H is the height and W is the width.
(1)
Global Average Pooling and Maximum Pooling:
An average pooling operation is performed on the input feature map F in the spatial dimension (H, W) to obtain a global average feature vector in the channel dimension
.
(4)
whose F(c, i, j) denotes the value of the feature map F at channel c, position (i, j).
be
at the value of the channel c, and
.
Maximum pooling operation is performed on the input feature map F in the spatial dimension (H, W) to obtain the global maximum feature vector in the channel dimension
.
(5)
Among them,
.
(2)
Channel Attention Map Generation:
(6)
where,
and
are the weight matrices of the fully connected layer; the
is the Sigmoid activation function.
(3)
Channel weighting:
(7)
Among them,
.
2. The space attention module
(1) Channel compression:
(8)
Among them,
.
(9)
Among them,
.
(2)
Spatial weighting calculation:
(10)
where Conv is a convolution operation.
is the Sigmoid activation function.
(3)
Feature map recalibration:
(11)
The study shows that the CBAM module is outstanding in improving the model's ability to adapt to complex scenes, and its channel and spatial attention mechanisms effectively improve the model's detection accuracy of different defects on the surface of wood boards, and accurately recognize the defects under complex lighting and background conditions. Through the spatial attention mechanism, the model can accurately lock the defective region in the complex background and reduce the influence of background noise; the channel attention mechanism further optimizes the extraction of different defective features to improve the accuracy and reliability of detection. From the perspective of computational efficiency, although the CBAM module increases a certain amount of computation, but thanks to its efficient attention mechanism, the model can quickly focus on the key information during the actual operation, and the overall detection speed is not significantly affected. In the real-time detection of a large number of solid wood panel images, the TL-ResNet50-CBAM model can quickly focus on the key features of the target object, and improve the detection accuracy while maintaining a high detection speed, with good practicality 23.
TL-ResNet50-SE + CBAM model
The TL-ResNet50-SE + CBAM model integrates the advantages of the SE module and CBAM module to fully recalibrate the features from both channel and spatial dimensions. When dealing with the task of solid wood panel surface defect detection, the model firstly pools the global average of the feature map through the SE module to obtain the global information between channels, and then recalibrates the channel importance by using the bottleneck structure composed of two fully connected layers to highlight the defect-related channel features and suppress the irrelevant channel responses.
However, some studies have shown that integrating multiple attention modules does not always lead to accuracy improvement. In the image detection experiments, when integrating both the SE module and the CBAM module, the increase in model complexity and the possible conflicts between the modules resulted in the detection accuracy of the model being lower than that of one of the modules alone in some cases. Although the model theoretically integrates the advantages of the two modules, in the actual surface defect detection of solid wood panels, the accuracy of the fusion module is not effectively improved. Combining the SE and CBAM modules in the convolutional neural network model for material defect detection, it was found that although the model increased in the diversity of feature extraction, due to the increase in the number of parameters and the rise in the amount of computation, the model appeared to be overfitting phenomenon in the case of limited training data, resulting in the detection accuracy could not be improved, or even declined.
Experimental results and analysis
Experimental setup
The experimental environment
The hardware and software configurations used in this experiment are shown in Table 1.
Table 1
Experimental environment configuration
Name
Configuration Information
Operating Systems
Windows11
CPU
12th Gen Intel Core i7-12700H
Memory
16GB
GPU
NVIDIA GeForce RTX 3050
Deep Learning Frameworks
Pytorch 2.5.1
Programming Languages
Python3.9
Data processing
The dataset used in this study is primarily sourced from the large-scale image dataset of wood surface defects available on Kaggle, which encompasses various types of solid wood boards and a range of surface defects under different production stages and conditions. To ensure the model's generalization capability, the dataset includes images captured under various lighting conditions, shooting angles, and levels of defect severity.
The dataset was expanded through data augmentation, employing specific methods such as: geometric transformations, which involve translating, rotating, scaling, and flipping the original images to simulate different postures and positional changes of the boards on the production line; color transformations, adjusting image brightness, contrast, saturation, and hue to mimic different lighting and imaging environments; noise addition, incorporating Gaussian noise, salt-and-pepper noise, etc., to simulate interference factors in actual production environments; and cropping and splicing, randomly cropping and combining images to increase the variety of defect combinations and contextual information24. The augmented dataset comprises a total of 4016 images, divided into training and validation sets in an 8:2 ratio25. The effect is illustrated in Fig. 6.
Fig. 6
Rotated and luminance-varied images
Click here to Correct
Parameter settings
The initial learning rate is set between 0.001 and 0.1, for example, 0.01. A learning rate decay strategy is employed, where after a certain number of training epochs (e.g., 10 epochs), the learning rate is multiplied by a decay factor less than 1 (e.g., 0.1). This gradually adjusts the learning step size during training, preventing the learning rate from being too large in the later stages, which could hinder convergence, or too small in the early stages, which could slow down training26.
In terms of training settings, the number of training epochs is set to 100, and this is adjusted based on the dataset size, complexity, and model convergence. The batch size is set to 64, and this is balanced against hardware constraints. A larger batch size can leverage the parallel computing capabilities of GPUs but consumes more memory. During training, stochastic gradient descent (SGD) or its variants (e.g., the Adam optimizer) are used to update the model parameters. The Adam optimizer has the advantage of adaptive learning rates, automatically adjusting the learning rate for each parameter, and often performs well in many scenarios.
Evaluation metrics
Accuracy, Loss, Precision, Recall, and F1 score are adopted as evaluation metrics to comprehensively assess the model's performance in the task of solid wood surface defect detection. Accuracy: This metric intuitively reflects the overall performance of the model in classification tasks, calculated as the ratio of correctly predicted samples to the total number of samples27. The formula is:
(12)
At the end of each training epoch or during the testing phase, the accuracy is calculated on the validation or test set using this formula to observe the model's training effectiveness and generalization capability.
Loss: his study employs the Cross-Entropy Loss to measure the difference between the model's predictions and the true labels. For multi-class classification problems, the formula is:
(13)
where 𝑁 is the number of samples, 𝐶 is the number of classes,
​ represents the true label of sample 𝑖 for class 𝑗 (typically one-hot encoded), and 𝑝𝑖𝑗​ represents the predicted probability of sample 𝑖 belonging to class 𝑗. During model training, the loss function is minimized to adjust the model parameters, bringing the predictions closer to the true labels.
Precision: Precision reflects the proportion of correctly predicted positive samples among all samples predicted as positive, indicating the accuracy of the model's predictions. The formula is:
(14)
Recall: Recall measures the model's ability to detect actual positive samples, i.e., the proportion of actual positive samples that are correctly detected. The formula is.
(15)
F1 Score (F1 - Score): The F1 score comprehensively considers both precision and recall, providing a more balanced evaluation of the model's performance. The formula is:
(16)
A higher F1 score indicates a better balance between precision and recall, reflecting superior model performance.
Here, 𝑇𝑃 (True Positives) represents the number of samples where the model correctly detects wood defects; 𝐹𝑃 (False Positives) represents the number of samples where the model incorrectly identifies normal regions as defects; 𝑇𝑁 (True Negatives) represents the number of samples where the model correctly identifies normal regions; and 𝐹𝑁 (False Negatives) represents the number of samples where the model fails to detect actual defects. These evaluation metrics enable a quantitative assessment of the performance of different models in the task of solid wood surface defect detection.
Ablation experiments
To validate the effectiveness of transfer learning in the ResNet50 model and the attention mechanism in the TL-ResNet50 model, the following ablation experiments were designed. The experiments use ResNet50 as the baseline model. First, transfer learning (TL-ResNet50) is applied to enhance the model's performance. Subsequently, based on TL-ResNet50, the SE module, CBAM module, and a combination of SE + CBAM modules are introduced separately to further explore the impact of attention mechanisms on the model's performance.
The following Table 2. shows the effects of the TL, SE module and CBAM module on the ResNet50 model, using the ResNet50 model as a baseline.
Table 2
Evaluation metrics of ablation experiments in wood defect detection
TL
SE
CBAM
Accuracy
loss ratio
Accuracy
Recall
F1 score
   
0.747
0.632
0.747
0.748
0.746
  
0.849
0.598
0.848
0.852
0.846
 
0.882
0.473
0.881
0.884
0.881
 
0.860
0.393
0.858
0.861
0.859
0.774
0.619
0.793
0.776
0.772
After adding the SE module, the accuracy of the model in the classification task is improved, especially in the extraction and classification of small target features. This is because the SE module can adaptively adjust the importance of the feature channels by weighting the channel dimensions, which strengthens the learning of the key features and suppresses the irrelevant information, and thus improves the recognition ability of the model for small targets.
After adding the CBAM module, the model shows better boundary localization ability in the image segmentation task, and the segmentation accuracy is improved. The CBAM module applies the attention mechanism simultaneously in both the channel and spatial dimensions. It not only focuses on the importance of different feature channels but also screens the spatial positions of the feature maps. This enables the model to more accurately focus on the boundaries of the target objects, thereby improving the accuracy of defect detection 28.
After adding the module of SE + CBAM, theoretically, the model has the strongest feature extraction and analysis ability, and it can focus on all aspects of defects more comprehensively than the model with the module of SE or CBAM alone. However, from the experimental results, the accuracy of the model fluctuates between 0.78 and 0.84 although it can reach a high level, which indicates that the model has some problems in stability, and further study reveals that this stability problem may be due to the insufficient synergy mechanism between the two attention modules. When dealing with different types of defects and diverse background situations, the SE module and the CBAM module sometimes conflict, resulting in unstable model decisions. For example, when detecting boards with both minor discoloration and complex texture interference, the SE module focuses on the discoloration-related channels, while the CBAM module, when locating and enhancing the defect features, may deviate from the discoloration region due to the texture interference, thus affecting the stability of the overall detection results. In order to solve this problem, some studies have tried to adjust the connection order and weight allocation of the two modules, but the ideal solution has not yet been obtained and still needs to be explored in depth.
In summary, the ablation experiments fully validate the importance of incorporating transfer learning and the SE and CBAM modules in enhancing the performance of the TL-ResNet50 model, providing a strong basis for subsequent model improvements. They also further emphasize the core value of attention mechanisms in complex image defect detection tasks, helping researchers gain a deeper understanding of the model's working principles and enabling targeted optimization of the model structure to improve detection accuracy and stability. However, it is also important to note that excessive use of attention mechanisms may have negative impacts.
Experimental results and analysis
In the study of solid wood surface defect detection, the performance and selection of models play a crucial role in the accuracy and efficiency of detection. This research conducts experimental analysis on various models, comparing their performance. First, the TL-ResNet50 model, obtained through transfer learning, is compared with the baseline ResNet50 model to demonstrate the advantages of transfer learning. Subsequently, the experimental results of multiple models incorporating different attention mechanisms, as well as other classic models such as EfficientNetV2, VGG16, and AlexNet, are compared and analyzed in terms of accuracy, loss rate, and stability. Finally, based on the predictive performance of different models for various types of defects, the capability differences of each model in detecting different defects are explored, providing specific guidance for model selection in actual production29.
Comparison of results between ResNet50 and TL-ResNet50 models
In this study, the TL-ResNet50 model was obtained by applying transfer learning to the ResNet50 model. Transfer learning utilizes the parameters of the ResNet50 model pre-trained on a large-scale general dataset as initialization, followed by fine-tuning on the solid wood surface defect detection dataset. This approach allows the model to reuse general image features, such as edges and textures, learned from the source task, reducing the need for large-scale annotated data and accelerating the model's convergence speed on the new task. As a result, the model's accuracy is effectively improved, the loss rate is reduced, and its generalization capability is enhanced, making it better suited for the task of solid wood surface defect detection.
As shown in Figs. 7 and 8., from the experimental results, the validation accuracy of ResNet50 fluctuates roughly around 0.75, and the loss rate of the validation set fluctuates roughly around 0.7.
Fig. 7
Line graph of ResNet50 model accuracy
Click here to Correct
Fig. 8
Line graph of loss rates for the ResNet50 model
Click here to Correct
As shown in Figs. 9. and 10., the accuracy of TL - ResNet50 is improved to between 0.8–0.85, and the loss rate is also decreased from 0.632 to 0.598. The results show that the transfer learning plays an important role in the task of detecting surface defects on solid wood boards.
Fig. 9
Line graph of TL-ResNet50 model accuracy
Click here to Correct
A
Fig. 10
Line graph of loss rates for the TL-ResNet50 model
Click here to Correct
Prediction results of various models
As shown in Fig. 11., t the defect prediction results of different models reveal variations in their detection capabilities for different types of defects. The TL-ResNet50-CBAM and TL-ResNet50-SE models exhibit outstanding performance in detecting small and irregular defects. For instance, these models can accurately identify minor cracks on wood surfaces and irregularly shaped wormhole defects, with high prediction probabilities. VGG16 performs relatively well in detecting defects with distinct texture features, such as live knots on wood surfaces. Its structured convolutional layer stacking aids in extracting texture features of live knots, enabling accurate detection. AlexNet demonstrates faster detection speeds for larger defects, but its accuracy needs improvement. While it can quickly identify larger dead knots, its prediction probability is relatively lower compared to other models in some cases. EfficientNetV2 can rapidly locate defects in complex backgrounds but falls short in defect classification accuracy. For example, when detecting boards with both complex textures and defects, it may misclassify defect types. These results provide specific guidance for selecting appropriate models based on common defect types in actual production. Enterprises can choose detection models tailored to the predominant defect types in their production processes, thereby improving detection efficiency and accuracy.
Fig. 11
Plot of defect prediction effect of different models
Click here to Correct
Comparison of experimental results for each model
As shown in Figs. 12. and 13., the line graphs compare the accuracy and loss rates of different models. By comparing the accuracy and loss rates of various models, it is evident that the TL-ResNet50-CBAM and TL-ResNet50-SE models, which incorporate attention mechanisms, perform exceptionally well in most training phases. Their validation accuracy stabilizes around 0.85 with an upward trend, while the loss function values steadily decrease. This is attributed to the attention mechanisms enabling the models to automatically focus on the most critical parts of the input data, recalibrating features from both channel and spatial dimensions, thereby enhancing the ability to extract defect features. The TL-ResNet50 model itself demonstrates reliable performance, with accuracy fluctuating between 0.8 and 0.85. However, the TL-ResNet50-SE + CBAM model, while achieving relatively high accuracy, shows significant fluctuations between 0.78 and 0.84, indicating room for improvement in stability. This instability may stem from insufficient coordination between the two attention modules, leading to conflicts when handling different types of defects and background conditions. The generalization capability of ResNet50 is slightly inferior to that of the TL-ResNet50 series models, with accuracy fluctuating around 0.75. EfficientNetV2 exhibits lower accuracy and greater fluctuations, ranging between 0.65 and 0.8, and demonstrates unstable performance when generalizing to new data. Despite its innovative design, it underperforms in this experimental task and requires further tuning and optimization. VGG16 and AlexNet show relatively lower accuracy, ranging between 0.7–0.75 and 0.65–0.7, respectively, with slow upward trends. This is because VGG16 has a large number of parameters and high computational complexity, while AlexNet has a relatively shallow structure, limiting their ability to handle complex data.
Fig. 12
Line graph comparing the accuracy of different models
Click here to Correct
A
Fig. 13
Line graph comparing loss rates of different models
Click here to Correct
In the detection of surface defects in solid wood panels, the performance of different models varies. Table 3. below shows the evaluation indexes of different models for wood defect detection.
Table 3
Evaluation metrics of different models for wood defect detection
Model
Accuracy
loss ratio
Accuracy
Recall
F1 score
AlexNet
0.662
1.800
0.661
0.660
0.661
VGG16
0.740
0.671
0.777
0.772
0.774
EfficientNetV2
0.792
0.509
0.799
0.796
0.795
TL-ResNet50-SE
0.882
0.473
0.881
0.884
0.881
According to the table, the overall performance of the model based on TL-ResNet50 and incorporating the attention mechanism is excellent, which can identify defects more accurately, and has good training effect and stability; EfficientNetV2 has a certain detection ability, but its generalization and stability are insufficient; VGG16 and AlexNet, constrained by their structural limitations, underperform when handling complex data. The TL-ResNet50-SE + CBAM model with integrated dual-attention module is theoretically powerful, but its stability is poor in practical application due to the module synergy problem.
Conclusion
This study focuses on the detection of surface defects in solid wood boards, systematically evaluating the performance of various deep learning models in this task. The research reveals that different models have their own strengths and weaknesses when handling surface defect detection in solid wood boards.
Traditional models such as VGG16 and AlexNet exhibit certain limitations. The VGG16 model, with its large number of parameters and high computational complexity, is prone to overfitting. The AlexNet model, with its relatively shallow structure, has limited feature extraction capabilities, making it difficult to handle complex and variable defect scenarios. Although the EfficientNetV2 model incorporates innovative design concepts, its generalization ability and stability in this experiment need improvement. When faced with special defect types or defects arising from new production processes, its detection accuracy fluctuates significantly.
In contrast, models based on TL-ResNet50, which combine transfer learning and attention mechanisms, demonstrate superior performance. Transfer learning enables the model to leverage knowledge acquired from large-scale general datasets, allowing it to converge quickly on a smaller dataset of wood board images, thereby improving training efficiency and generalization ability. Attention mechanisms enable the model to focus on defect regions, enhancing its ability to extract defect features and thus improving detection accuracy. Experimental results show that TL-ResNet50 models incorporating attention modules such as SE and CBAM achieve significant improvements in metrics like accuracy. Specifically, the TL-ResNet50-CBAM and TL-ResNet50-SE models achieve stable validation accuracy around 0.85, with strong generalization capabilities. Ablation experiments further confirm the critical role of attention mechanisms in enhancing model performance. After adding the SE or CBAM modules, the model's accuracy significantly improves, and the loss function value decreases. However, although the TL-ResNet50-SE + CBAM model theoretically has the strongest feature extraction capability, it suffers from stability issues, with significant fluctuations in accuracy, necessitating further optimization for practical applications.
Nevertheless, this study has some limitations. On the one hand, the scale and diversity of the dataset need further expansion. The current dataset may not fully represent all possible defect scenarios, leading to degraded detection performance when encountering unseen defect types or boards under special production conditions. In the future, the dataset should be expanded to include images of wood boards from different origins, tree species, and processing techniques, as well as more types and severity levels of defect samples, to enhance the model's generalization ability and better adapt to complex real-world production scenarios. On the other hand, the application of multimodal data fusion techniques in surface defect detection for solid wood boards is still in its infancy. Integrating multimodal data such as infrared images and texture depth information could provide richer feature information for the model, further improving detection accuracy. Additionally, most current deep learning models are black-box models, making it difficult for production personnel to understand the decision-making process, which limits their practical application to some extent. Therefore, enhancing research on model interpretability and developing visualization tools and explanatory algorithms to help production personnel intuitively understand how the model identifies defects will improve the model's credibility and applicability. Furthermore, the deep integration of detection technology with actual production processes requires further research to achieve real-time and efficient online detection.
In the future, we will focus on optimizing model architectures, exploring more advanced network structures and improvement strategies, such as incorporating Transformer structures to enhance the model's ability to process long-sequence features, thereby further improving its performance in detecting complex defects. At the same time, we will strengthen research and application of multimodal data fusion techniques, promote the deep integration of detection technology with production processes, and further enhance detection performance and practicality, contributing to the intelligent development of the wood processing industry.
Data availability
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.
A
Author Contribution
All authors reviewed the manuscript and contributed to the research direction and ideas. H.C, Y.W, R.X, and J.Z reviewed and edited the original document. J.X, Y.Q, Y.L, and R.S participated in the general draft preparation and were involved in the experimental process. All authors have read and agreed to the submitted version of the manuscript.
References
1.
Shen Y, Ying L. Surface Defect Detection of Solid Wood Panels Based on the Improved YOLOv5 Algorithm. Forestry Mach Woodworking Equip. 2024;52(3):24–9.
2.
Qiang R, Qichuan T. Real-time Detection Method of Wood Surface Defects Based on Improved Yolov5s[J]. China For Prod Ind. 2025;62(01):64–71.
3.
Wang M, Xiaoyang X, Wenyan C, et al. Research Progress and Prospect of Intelligent Detection of Wood Defects Based on Deep Learning[J]. China For Prod Ind. 2024;61(03):38–44.
4.
Wang Zheng J, Ying Y, Fei, et al. Research on the Wood Defect Detection Model Wood-Net Based on YOLOv7[J]. J Forestry Eng. 2024;9(01):132–40.
5.
Ergun H. Wood identification based on macroscopic images using deep and transfer learning approaches[J]. PeerJ. 2024;12:e17021.
6.
Zhang H, Ahmad W, Rong Y, et al. A gas sensors detection system for real-time monitoring of changes in volatile organic compounds during Oolong tea processing. Foods. 2024;13(11):1721.
7.
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Adv Neural Inf Process Syst. 2012;25:1097–105.
8.
Urbonas A, Raudonis V, Maskeliūnas R, et al. Automated identification of wood veneer surface defects using faster region-based convolutional neural network with data augmentation and transfer learning[J]. Appl Sci. 2019;9(22):4898.
9.
TAN M, LE QV. EfficientNetV2: Smaller models and faster training[J]. Proceedings of the 38th International Conference on Machine Learning, 2021: 10096–10106.
10.
Jiachao Y, Yaowen L, Ke S. Huang Xi. PCB Defect Detection Algorithm Based on EfficientNetV2[J]. J Computer-Aided Des Comput Graphics, 1–10.
11.
Huan R, Xuguang W. A Review of the Attention Mechanism[J]. J Comput Appl. 2021;41(S1):1–6.
12.
Hernández A, Amigó JM. Attention Mechanisms and Their Applications to Complex Systems. Entropy. 2021;23(3):283.
13.
Gu J, Sun X, Zhang Y, Fu K, Wang L. Deep Residual Squeeze and Excitation Network for Remote Sensing Image Super-Resolution. Remote Sens. 2019;11(15):1817.
14.
Ji W, Pan Y, Xu B, Wang JA, Real-Time. Apple Targets Detection Method for Picking Robot Based on ShufflenetV2-YOLOX. Agriculture. 2022;12(6):856.
15.
Fu H, Song G, Wang Y. Improved YOLOv4 Marine Target Detection Combined with CBAM. Symmetry. 2021;13(4):623.
16.
Zhu H, Wang D, Wei Y, Zhang X, Li L. Combining Transfer Learning and Ensemble Algorithms for Improved Citrus Leaf Disease Classification. Agriculture. 2024;14(9):1549.
17.
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Adv Neural Inf Process Syst. 2012;25:1097–105.
18.
Shafiq M, Gu Z. Deep Residual Learning for Image Recognition: A Survey. Appl Sci. 2022;12(18):8972.
19.
Gao M, Qi D, Mu H, Chen J. A Transfer Residual Neural Network Based on ResNet-34 for Detection of Wood Knot Defects. Forests. 2021;12(2):212.
20.
Peng Y, Zhao S, Liu J. Fused Deep Features-Based Grape Varieties Identification Using Support Vector Machine [J]. Agriculture. 2021;11(09):869.
21.
Zhang R, Zhu Y, Ge Z, Mu H, Qi D, Ni H. Transfer Learning for Leaf Small Dataset Using Improved ResNet50 Network with Mixed Activation Functions. Forests. 2022;13(12):2072.
22.
Zhao S, Peng Y, Liu J, Wu S. Tomato Leaf Disease Diagnosis Based on Improved Convolution Neural Network by Attention Module. Agriculture. 2021;11(7):651.
23.
Islam W, Jones M, Faiz R, Sadeghipour N, Qiu Y, Zheng B. Improving Performance of Breast Lesion Classification Using a ResNet50 Model Optimized with a Novel Attention Mechanism. Tomography. 2022;8(5):2411–25.
24.
Tao K, Wang A, Shen Y, Lu Z, Peng F, Wei X. Peach Flower Density Detection Based on an Improved CNN Incorporating Attention Mechanism and Multi-Scale Feature Fusion[J]. Horticulturae. 2022;8(10):904.
25.
Liu J, Abbas I, Noor RS. Development of Deep Learning-Based Variable Rate Agrochemical Spraying System for Targeted Weeds Control in Strawberry Crop[J]. Agronomy. 2021;11(8):1480.
26.
Zhengben S, Jing N, Liang W. Applied Research on an Improved ResNet Network Based on Transfer Learning[J]. Comput Inform Technol. 2024;32(06):50–5.
27.
Sun J, He X, Ge X, Wu X, Shen J, Song Y. Detection of Key Organs in Tomato Based on Deep Migration Learning in a Complex Background[J]. Agriculture. 2018;8(12):196.
28.
Wang Xiliang W, Runqi, Qu Zunhao. Research on Pavement Crack Segmentation Method Based on Res2Unet-CBAM Network[J]. J Shijiazhuang Tiedao Univ (Natural Sci Edition). 2024;37(03):69–74.
29.
Du X, Si L, Li P, Yun Z. A method for detecting the quality of cotton seeds based on an improved ResNet50 model. PLoS ONE. 2023;18(2):e0273057.
Acknowledgements
This work was supported by the Science and Technology Project of Henan Province [252102210107、242102210057、252102111186].
Author contributions statement
All authors reviewed the manuscript and contributed to the research direction and ideas. H.C, Y.W, R.X, and J.Z reviewed and edited the original document. J.X, Y.Q, Y.L, and R.S participated in the general draft preparation and were involved in the experimental process. All authors have read and agreed to the submitted version of the manuscript.
Competing interests
The authors declare no competing interests.
Declarations
A
This study is not a clinical trial, so clinical trial registration is not applicable. Consent to Publish declaration: not applicable. Ethics and Consent to Participate declarations: not applicable.
Total words in MS: 7534
Total words in Title: 14
Total words in Abstract: 212
Total Keyword count: 0
Total Images in MS: 13
Total Tables in MS: 19
Total Reference count: 29