A Comparative Review of the Next-Generation YOLO Models: YOLOv10 and YOLO11
DOI:
https://doi.org/10.54097/22zsmc10Keywords:
Object detection, YOLO, YOLOv10, YOLO11Abstract
In recent years, the YOLO (You Only Look Once) series has remained a mainstream framework in object detection, continuously driving the balance between lightweight design and high accuracy. This paper focuses on two pivotal versions—YOLOv10 and YOLO11—and provides a systematic comparison and analysis in terms of architectural design, core modules, performance characteristics, and application scenarios. YOLOv10 introduces a unified end-to-end architecture that eliminates anchors and post-processing steps, thereby simplifying the detection pipeline and significantly improving deployment efficiency. In contrast, YOLO11 builds upon YOLOv8 by reconstructing its modules through the introduction of the C3k structure and an optimized feature fusion pathway, further enhancing detection accuracy and model representation capabilities. This review outlines the similarities and differences in structural philosophy and application orientation between the two models, summarizes their technological evolution, and explores the potential future directions of the YOLO series in multi-task integration, adaptive modeling, and lightweight deployment. The findings of this study aim to serve as a reference for the design and selection of object detection systems.
Downloads
References
[1] Zhao Z Q, Zheng P, Xu S, et al. Object detection with deep learning: A review [J]. IEEE transactions on neural networks and learning systems, 2019, 30(11): 3212-3232.
[2] Wu J. Introduction to convolutional neural networks [J]. National Key Lab for Novel Software Technology. Nanjing University. China, 2017, 5(23): 495.
[3] Hussain M. YOLO-v1 to YOLO-v8, the rise of YOLO and its complementary nature toward digital manufacturing and industrial defect detection [J]. Machines, 2023, 11(7): 677.
[4] Terven J, Córdova-Esparza D M, Romero-González J A. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas [J]. Machine learning and knowledge extraction, 2023, 5(4): 1680-1716.
[5] Zhao L, Li S. Object detection algorithm based on improved YOLOv3 [J]. Electronics, 2020, 9(3): 537.
[6] Yang L, Chen G, Ci W. Multiclass objects detection algorithm using DarkNet-53 and DenseNet for intelligent vehicles [J]. EURASIP Journal on Advances in Signal Processing, 2023, 2023(1): 85.
[7] Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection [J]. arXiv preprint arXiv:2004.10934, 2020.
[8] Guo G, Zhang Z. Road damage detection algorithm for improved YOLOv5 [J]. Scientific reports, 2022, 12(1): 15523.
[9] Norkobil Saydirasulovich S, Abdusalomov A, Jamil M K, et al. A YOLOv6-based improved fire detection approach for smart city environments [J]. Sensors, 2023, 23(6): 3161.
[10] Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors [C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 7464-7475.
[11] Zhai X, Huang Z, Li T, et al. YOLO-Drone: an optimized YOLOv8 network for tiny UAV object detection [J]. Electronics, 2023, 12(17): 3664.
[12] Chen Y, Zhan S, Cao G, et al. C2f-Enhanced YOLOv5 for Lightweight Concrete Surface Crack Detection [C]//Proceedings of the 2023 International Conference on Advances in Artificial Intelligence and Applications. 2023: 60-64.
[13] Wang A, Chen H, Liu L, et al. Yolov10: Real-time end-to-end object detection [J]. Advances in Neural Information Processing Systems, 2024, 37: 107984-108011.
[14] He L, Zhou Y, Liu L, et al. Research on object detection and recognition in remote sensing images based on YOLOv11 [J]. Scientific Reports, 2025, 15(1): 14032.
[15] Hosang J, Benenson R, Schiele B. Learning non-maximum suppression [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4507-4515.
[16] Chen X, Jiang N, Yu Z, et al. Citrus leaf disease detection based on improved YOLO11 with C3K2 [C]//International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2024). SPIE, 2025, 13560: 746-751.
[17] Aflaki P, Hannuksela M M, Häkkinen J, et al. Impact of downsampling ratio in mixed-resolution stereoscopic video [C]//2010 3DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video. IEEE, 2010: 1-4.
[18] Mutlag W K, Ali S K, Aydam Z M, et al. Feature extraction methods: a review [C]//Journal of Physics: Conference Series. IOP Publishing, 2020, 1591(1): 012028.
[19] Zheng Z, Wang P, Liu W, et al. Distance-IoU loss: Faster and better learning for bounding box regression [C]//Proceedings of the AAAI conference on artificial intelligence. 2020, 34(07): 12993-13000.
[20] Zhang C, Bengio S, Hardt M, et al. Understanding deep learning (still) requires rethinking generalization [J]. Communications of the ACM, 2021, 64(3): 107-115.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Journal of Computer Science and Artificial Intelligence

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.