Introduction
Detection of road cracks is crucial to prevent their deterioration into more severe damage \cite{ref1,ref2}. Initial crack segmentation methods primarily relied on traditional image processing techniques, including threshold segmentation \cite{ref3}, edge detection \cite{ref4}, and mathematical morphology \cite{ref5}, while their robustness in complex real-world scenarios remains limited \cite{ref6,ref7}.
A
In recent years, deep learning has gradually replaced traditional segmentation methods owing to its powerful feature extraction capability \cite{ref8,ref9}. Crack segmentation techniques can be broadly categorized into CNN-based models and segmentation combining CNNs with Transformer methods. CNN-based approaches have been widely adopted because of their strong ability to capture local features, achieving relatively satisfactory segmentation results
ref10,
ref11.
ref12 first introduced fully convolutional networks (FCNs) for semantic segmentation. Subsequently, U-Net
ref13 and its variants, through encoder-decoder architectures and skip connections, alleviated the spatial information loss caused by downsampling
ref14,
ref15.
ref16 proposed skip-level round-trip sampling blocks for cross-level feature interaction, while
ref17 replaced the U-Net encoder with diverse CNNs to reduce computation and memory while maintaining accuracy. Multi-scale feature fusion was further employed to aggregate contextual information from different receptive fields, balancing the capture of fine crack details and overall structure. For instance, For example,
ref18 proposed DeepCrack, which fuses multi-scale convolutional features to enhance perception of crack details and global structure. However, multi-scale fusion still relies on local convolution, limiting long-range dependency modeling
ref19. Hence, attention mechanisms have been introduced to enhance global context and suppress noise.
ref20 proposed MST-Net, jointly modeling spatial, channel, and pixel dimensions to improve global context awareness and segmentation accuracy, while
ref21 introduced AHC-Net, employing convolutional block attention in the encoder and criss-cross attention in the decoder. Despite their effectiveness in capturing global correlations, attention mechanisms often have weaker capability in modeling fine crack details
ref22,
ref23.
With the development of Transformers in visual tasks, self-attention mechanisms have been applied to crack segmentation to enhance long-range dependency modeling ref24,ref25. ref26 proposed MSDCrack, which enhances global dependency modeling through self-attention pooling while preserving local feature extraction. ref27 introduced ISTD-CrackNet, a hierarchical Transformer-based model that strengthens local detail representation using deformable convolutions and multi-scale convolutions. Although these hybrid models balance local and global representation ref28, they lack specialized modules for two key challenges: false detections arising from crack-like textures and discontinuous predictions due to inadequate continuity modeling.
Therefore, this paper proposes a Complementary Synergistic Fusion Network (CSF-Net). It includes a Multi-Scale Attention Residual module (MSAR) that processes features with multi-directional convolutions, followed by a crack-like filtering unit to suppress interference, and employs a residual connection to preserve crack details. In addition, a Context-Enhanced Attention Module (CEAM) optimizes self-attention via an asymmetric attention structure and dual-path feature enhancement mechanism, effectively addressing broken or blurry predictions while modeling crack continuity and suppressing background noise. This design mitigates false detection and improves global structural continuity, preventing cracks from being segmented into fragments.
The main contributions of this paper are summarized as follows:
1.We propose CSF-Net, which achieves collaborative modeling of crack details and overall structure. CSF-Net adopts a dual-branch encoder architecture consisting of a local texture branch and a global structure branch, and employs a cross-branch fusion block (CFB) to enable semantic complementarity and feature fusion between the two branches.
2.We design the MSAR module for the local texture branch. MSAR combines multi-directional convolutions with crack-like filtering units, effectively reducing false detections caused by crack-like texture interference and enhancing the model's ability to distinguish genuine crack.
3.We propose CEAM for the global structure branch. CEAM optimizes the self-attention mechanism by employing an asymmetric attention structure and a dual-path feature enhancement mechanism. This enhances the model's ability to model the connectivity of the overall crack structure, thereby preventing continuous cracks from being segmented into fragmented segments.
Declarations
smallFunding Not applicable.
vspace{0.5em}Competing Interests The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
vspace{0.5em}Ethics Approval Not applicable.
vspace{0.5em}Consent to Participate Not applicable.
vspace{0.5em}Consent for Publication Not applicable.
vspace{0.5em}Data Availability Data associated with this research are available and can be obtained by contacting the corresponding author.
vspace{0.5em}Materials Availability Not applicable.
vspace{0.5em}Code Availability Not applicable.
vspace{0.5em}Author Contributions J.L. designed the research framework, developed the CSF-Net model, and conducted the experiments. Y.L. contributed to methodology refinement, result validation, and manuscript revision. Y.T.L. prepared Figures 1–5 and assisted with data preprocessing and visualization. All authors discussed the results and reviewed the final manuscript.
normalsize\bibliography{sn-bibliography}
References:
Zumrawi, Magdi ME (2016) Investigating causes of pavement deterioration in Khartoum State. Int J Civ Eng Technol 7(2): 203--214
Shi, Yong and Cui, Limeng and Qi, Zhiquan and Meng, Fan and Chen, Zhensong (2016) Automatic road crack detection using random structured forests. IEEE Transactions on Intelligent Transportation Systems 17(12): 3434--3445 IEEE
Tsai, Yi-Chang and Chatterjee, Anirban (2017) Comprehensive, quantitative crack detection algorithm performance evaluation system. Journal of Computing in Civil Engineering 31(5): 04017047 American Society of Civil Engineers
Salman, Muhammad and Mathavan, Senthan and Kamal, Khurram and Rahman, Mujib (2013) Pavement crack detection using the Gabor filter. IEEE, 2039--2044, 16th international IEEE conference on intelligent transportation systems (ITSC 2013)
Hu, Yong and Zhao, Chun-xia (2010) A novel LBP based methods for pavement crack detection. Journal of pattern Recognition research 5(1): 140--147 Journal of Pattern Recognition Research
Jiang, Chenglong and Tsai, Yichang James (2016) Enhanced crack segmentation algorithm using 3D pavement data. Journal of Computing in Civil Engineering 30(3): 04015050 American Society of Civil Engineers
Gopalakrishnan, Kasthurirangan and Khaitan, Siddhartha K and Choudhary, Alok and Agrawal, Ankit (2017) Deep convolutional neural networks with transfer learning for computer vision-based data-driven pavement distress detection. Construction and building materials 157: 322--330 Elsevier
Panella, Fabio and Lipani, Aldo and Boehm, Jan (2022) Semantic segmentation of cracks: Data challenges and architecture. Automation in Construction 135: 104110 Elsevier
Kheradmandi, Narges and Mehranfar, Vida (2022) A critical review and comparative study on image segmentation-based techniques for pavement crack detection. Construction and Building Materials 321: 126162 Elsevier
Liu, Chuanqi and Zhu, Chengguang and Xia, Xuan and Zhao, Jiankang and Long, Haihui (2022) FFEDN: Feature fusion encoder decoder network for crack detection. IEEE Transactions on Intelligent Transportation Systems 23(9): 15546--15557 IEEE
Qu, Zhong and Chen, Wen and Wang, Shi-Yan and Yi, Tu-Ming and Liu, Ling (2021) A crack detection algorithm for concrete pavement based on attention mechanism and multi-features fusion. IEEE Transactions on Intelligent Transportation Systems 23(8): 11710--11719 IEEE
Long, Jonathan and Shelhamer, Evan and Darrell, Trevor (2015) Fully convolutional networks for semantic segmentation. 3431--3440, Proceedings of the IEEE conference on computer vision and pattern recognition
Ronneberger, Olaf and Fischer, Philipp and Brox, Thomas (2015) U-net: Convolutional networks for biomedical image segmentation. Springer, 234--241, International Conference on Medical image computing and computer-assisted intervention
Ren, Yupeng and Huang, Jisheng and Hong, Zhiyou and Lu, Wei and Yin, Jun and Zou, Lejun and Shen, Xiaohua (2020) Image-based concrete crack detection in tunnels using deep fully convolutional networks. Construction and Building Materials 234: 117367 Elsevier
Li, Yongshang and Ma, Ronggui and Liu, Han and Cheng, Gaoli (2023) Real-time high-resolution neural network with semantic guidance for crack segmentation. Automation in Construction 156: 105112 Elsevier
Han, Chengjia and Ma, Tao and Huyan, Ju and Huang, Xiaoming and Zhang, Yanning (2021) CrackW-Net: A novel pavement crack image segmentation convolutional neural network. IEEE Transactions on Intelligent Transportation Systems 23(11): 22135--22144 IEEE
Liu, Fangyu and Wang, Linbing (2022) UNet-based model for crack detection integrating visual explanations. Construction and Building Materials 322: 126265 Elsevier
Zou, Qin and Zhang, Zheng and Li, Qingquan and Qi, Xianbiao and Wang, Qian and Wang, Song (2018) Deepcrack: Learning hierarchical convolutional features for crack detection. IEEE transactions on image processing 28(3): 1498--1512 IEEE
Lee, JDMCK and Toutanova, K (2018) Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 3(8): 4171--4186
Yang, Lei and Bai, Suli and Liu, Yanhong and Yu, Hongnian (2023) Multi-scale triple-attention network for pixelwise crack segmentation. Automation in Construction 150: 104853 Elsevier
Shi, Lin and Zhang, Ruijun and Wu, Yafeng and Cui, Dongyan and Yuan, Na and Liu, Jinyun and Ji, Zhanlin (2024) AHC-Net: a road crack segmentation network based on dual attention mechanism and multi-feature fusion. Signal, Image and Video Processing 18(6): 5311--5322 Springer
Yang, Lei and Bai, Suli and Liu, Yanhong and Yu, Hongnian (2023) Multi-scale triple-attention network for pixelwise crack segmentation. Automation in Construction 150: 104853 Elsevier
Sun, Xinzi and Xie, Yuanchang and Jiang, Liming and Cao, Yu and Liu, Benyuan (2022) DMA-Net: DeepLab with multi-scale attention for pavement crack segmentation. IEEE Transactions on Intelligent Transportation Systems 23(10): 18392--18403 IEEE
Srinivas, Aravind and Lin, Tsung-Yi and Parmar, Niki and Shlens, Jonathon and Abbeel, Pieter and Vaswani, Ashish (2021) Bottleneck transformers for visual recognition. 16519--16529, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Neil, Houlsby and Dirk, Weissenborn (2020) Transformers for image recognition at scale. Online: https://ai. googleblog. com/2020/12/transformers-for-image-recognitionat. html
Wang, Jing and Yao, Haizhou and Hu, Jinbin and Ma, Yafei and Wang, Jin (2025) Dual-encoder network for pavement concrete crack segmentation with multi-stage supervision. Automation in Construction 169: 105884 Elsevier
Zhang, Zaiyan and Zhuang, Yangyang and Song, Weidong and Wu, Jiachen and Ye, Xin and Zhang, Hongyue and Xu, Yanli and Shi, Guoli (2025) ISTD-CrackNet: Hybrid CNN-transformer models focusing on fine-grained segmentation of multi-scale pavement cracks. Measurement 251: 117215 Elsevier
Srinivas, Aravind and Lin, Tsung-Yi and Parmar, Niki and Shlens, Jonathon and Abbeel, Pieter and Vaswani, Ashish (2021) Bottleneck transformers for visual recognition. 16519--16529, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Zhang, Jianming and Li, Dianwen and Zeng, Zhigao and Zhang, Rui and Wang, Jin (2025) Dual-branch crack segmentation network with multi-shape kernel based on convolutional neural network and Mamba. Engineering Applications of Artificial Intelligence 150: 110536 Elsevier
Chollet, Fran{\c{c}}ois (2017) Xception: Deep learning with depthwise separable convolutions. 1251--1258, Proceedings of the IEEE conference on computer vision and pattern recognition
Wang, Jin and Zeng, Zhigao and Sharma, Pradip Kumar and Alfarraj, Osama and Tolba, Amr and Zhang, Jianming and Wang, Lei (2024) Dual-path network combining CNN and transformer for pavement crack segmentation. Automation in Construction 158: 105217 Elsevier
Hu, Jie and Shen, Li and Sun, Gang (2018) Squeeze-and-excitation networks. 7132--7141, Proceedings of the IEEE conference on computer vision and pattern recognition
Liu, Yahui and Yao, Jian and Lu, Xiaohu and Xie, Renping and Li, Li (2019) DeepCrack: A deep hierarchical feature learning architecture for crack segmentation. Neurocomputing 338: 139--153 Elsevier
Shi, Yong and Cui, Limeng and Qi, Zhiquan and Meng, Fan and Chen, Zhensong (2016) Automatic road crack detection using random structured forests. IEEE Transactions on Intelligent Transportation Systems 17(12): 3434--3445 IEEE
Yang, Fan and Zhang, Lei and Yu, Sijia and Prokhorov, Danil and Mei, Xue and Ling, Haibin (2019) Feature pyramid and hierarchical boosting network for pavement crack detection. IEEE Transactions on Intelligent Transportation Systems 21(4): 1525--1535 IEEE
Goo, June Moh and Milidonis, Xenios and Artusi, Alessandro and Boehm, Jan and Ciliberto, Carlo (2025) Hybrid-Segmentor: Hybrid approach for automated fine-grained crack segmentation in civil infrastructure. Automation in Construction 170: 105960 Elsevier
Ronneberger, Olaf and Fischer, Philipp and Brox, Thomas (2015) U-net: Convolutional networks for biomedical image segmentation. Springer, 234--241, International Conference on Medical image computing and computer-assisted intervention
Gupta, Shreyansh and Shrivastwa, Shivam and Kumar, Sunny and Trivedi, Ashutosh Self-attention-based efficient U-Net for crack segmentation. Computer Vision and Robotics: Proceedings of CVR 2022, Cham, Springer, 2023, 103--114
Byun, Hoon and Kim, Jineon and Yoon, Dongyoung and Kang, Il-Seok and Song, Jae-Joon (2021) A deep convolutional neural network for rock fracture image segmentation. Earth science informatics 14(4): 1937--1951 Springer