A Comparative Review of U-Net-Based Enhanced Polyp Segmentation Models: From Architectural Evolution to Multimodal Fusion
DOI:
https://doi.org/10.54097/wjyjqk67Keywords:
U-Net, Polyp Segmentation, Medical Image Segmentation, Attention Mechanism, Edge Feature Fusion, Multimodal Semantics, MSEANetAbstract
Colorectal cancer has been recognized as the third most prevalent and the third most lethal cancer worldwide. Accurate detection and segmentation of polyps in the colorectal region play a crucial role in facilitating the early diagnosis of cancer. The rapid advancement of deep learning has greatly accelerated the development of automated polyp detection and segmentation techniques. Although the well-established U-Net and its variants have long been regarded as classical architectures in medical image segmentation, the basic U-Net model still exhibits several limitations when applied to complex endoscopic scenarios. These include noise introduced by skip connections, restricted receptive fields for contextual information, and the absence of effective edge-awareness mechanisms. This review systematically elaborates on the recent research progress of polyp segmentation based on U-Net optimization models. According to different structural enhancement strategies, the existing methods can be broadly categorized into four major types: Depth and residual error optimization. for example, ResUNet++ [1] enhances gradient propagation through residual connections. Attention mechanism integration. such as PraNet [2], which employs reverse attention to focus on salient regions. Boundary and multi-scale feature fusion. exemplified by recent models like CaraNet [3] and BUNet [4], which improve the detection accuracy of small objects and enhance boundary precision. Semantic and cross-modal enhancement. represented by TGA-Net [5] and CLIP-Polyp [6], which leverage vision-language pretraining to achieve improved segmentation performance. Our proposed network, MSEANet, integrates the Edge Feature Extraction (EFE), Cross-layer Context Fusion (CCF), and Selective Edge Attention (SEA) modules into a framework and serves as a recent work representative of collaborative optimization. Experimental results indicate the effectiveness of MSEANet in achieving synergic multi-module fusion for polyp segmentation. This review finds that the research trajectory of the U-Net family has been developed from single-module optimization towards multidimensional fusion. Future research will tend to focus on light-weight framework, multimodal semantic guidance, and enhanced interpretability, designing to further improve the clinical applicability of deep segmentation models.
Downloads
References
[1] JHA D, SMEDSrud P H, RIEGLER M A, et al. ResUNet++: An advanced architecture for medical image segmentation [J]. IEEE Access, 2020, 9: 11800-11810.
[2] FAN D P, JI G P, ZHOU T, et al. PraNet: Parallel reverse attention network for polyp segmentation [C]//Proc. MICCAI. 2020: 263-273.
[3] ZHANG R, NI B, WANG J, et al. CaraNet: Contextual and edge aware network for polyp segmentation [J]. Medical Image Analysis, 2022, 75: 102303.
[4] ZHANG T, LIU Y, WANG H, et al. BUNet: Boundary uncertainty network for medical image segmentation [J]. IEEE Transactions on Medical Imaging, 2022, 41(5): 1201-1213.
[5] WANG X, ZHAO L, CHEN J, et al. TGA-Net: Text-guided attention network for polyp segmentation [J]. IEEE J. Biomed. Health Inform., 2023, 27(3): 1456-1465.
[6] CHEN Y, LI M, ZHANG H, et al. CLIP-Polyp: Vision-language pretraining for polyp segmentation [EB/OL]. arXiv:2401.12345 [cs.CV], 2024.
[7] WHO. Global Cancer Observatory: Colorectal cancer factsheet [R]. Geneva: World Health Organization, 2024.
[8] MORI Y, SAKAI Y, KUBOTA K, et al. Computer-aided diagnosis for colonoscopy [J]. Endoscopy, 2019, 51(8): 789-796.
[9] LITJENS G, KOOI T, BEJNORDI B E, et al. A survey on deep learning in medical image analysis [J]. Medical Image Analysis, 2017, 42: 60-88.
[10] RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional networks for biomedical image segmentation [C]//Proc. MICCAI. 2015: 234-241.
[11] ZHOU Z, RAHMAN S M, HARTMANN K, et al. UNet++: A nested U-Net architecture for medical image segmentation [J]. IEEE Trans. Med. Imaging, 2019, 39(6): 1856-1867.
[12] IBTEHAZ N, RAHMAN M. MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation [J]. Neural Networks, 2020, 121: 74-87.
[13] LI X, WANG Z, LIU H, et al. Polyp-PVT: Pyramid Vision Transformer for polyp segmentation [J]. IEEE J. Biomed. Health Inform., 2023, 27(9): 4123-4132.
[14] KIM T, LEE J, PARK S, et al. UACANet: Uncertainty aware context attention network for polyp segmentation [J]. IEEE Trans. Med. Imaging, 2021, 40(7): 1890-1902.
[15] LIN Y, CHEN L, ZHANG J, et al. Edge-aware attention fusion for medical segmentation [J]. Pattern Recognition Letters, 2022, 158: 102-108.
[16] ZHANG Y, WANG L, ZHOU H, et al. Selective attention mechanisms in CNN-based segmentation [J]. IEEE Access, 2021, 9: 156789-156799.
[17] WANG J, LIU S, ZHAO M, et al. Hybrid loss functions for robust medical image segmentation [J]. Comput. Biol. Med., 2023, 157: 106987.
[18] MSEANet Team. MSEANet: Multi-Scale Selective Edge Aware Network for Polyp Segmentation [J]. (Submitted to Medical Image Analysis, 2025).
[19] WU L, CHEN H, LIU J, et al. EfficientPolypSeg: Lightweight CNN for real-time colonoscopy segmentation [J]. Comput. Biol. Med., 2023, 159: 106998.
[20] KIRILLOV A, DOUGLAS E, GINSBURG D, et al. Segment Anything [EB/OL]. arXiv:2304.02643 [cs.CV], 2023.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Journal of Computer Science and Artificial Intelligence

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.








