Efficient Visual Region Recognition in the Open World Scenarios

Authors

  • Jing Wang
  • Yonghua Cao

DOI:

https://doi.org/10.54097/j9maj395

Keywords:

Open world object detection, Multimodal visual detection, Incremental learning

Abstract

Open-World Object Detection (OWOD) aims to address the core challenges that traditional detection models cannot dynamically adapt to unknown objects and continuously learn new categories. Most existing methods rely on visual similarity to discover new objects, yet they still face issues such as semantic disconnection and catastrophic forgetting in complex cross-domain scenarios, such as medical imaging and autonomous robotics. This paper proposes a causality-driven open-world object detection framework. By decoupling the functional causal features and the appearance features of objects, it realizes the discovery of unknown objects and incremental learning based on physical property reasoning. The core contributions include: constructing a feature enhancement model, which significantly improves the quality and expressiveness of image features; establishing a similarity calculation mechanism. By accurately calculating the similarity between text and images, it effectively avoids the over-confidence phenomenon in model prediction, thus improving the accuracy and reliability of detection; adopting a multi-granularity visual stream to conduct multi-dimensional and refined feature processing on image features, fully exploring the multi-level information in images. To comprehensively evaluate the performance of the proposed method, we conduct in-depth research on existing open-world benchmarks and extensively validate it on public benchmark datasets. The experimental results clearly demonstrate that this method achieves absolute performance gains in the detection tasks of unknown categories, fully demonstrating its strong generalization ability. This research result provides valuable reference and inspiration for the further development of the open-world object detection field, and is expected to promote technological progress and innovation in this field.

Downloads

Download data is not yet available.

References

[1] S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," Advances in neural information processing systems, vol. 28, 2015.

[2] Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

[3] Bharadwaj R, Naseer M, Khan S, et al. Enhancing Novel Object Detection via Cooperative Foundational Models [J]. arxiv preprint arxiv: 2311.12068, 2023.

[4] Qi H, Huang Z, ** B, et al. SAM-GAN: An improved DCGAN for rice seed viability determination using near-infrared hyperspectral imaging [J]. Computers and Electronics in Agriculture, 2024, 216: 108473.

[5] Heng A, Soh H. Selective amnesia: A continual learning approach to forgetting in deep generative models [J]. Advances in Neural Information Processing Systems, 2024, 36.

[6] Wu X, Sahoo D, Hoi S C H. Recent advances in deep learning for object detection [J]. Neurocomputing, 2020, 396: 39-64.

[7] Pu Y, Liang W, Hao Y, et al. Rank-DETR for high quality object detection [J]. Advances in Neural Information Processing Systems, 2024, 36.

[8] Lee, Kimin, et al. "A simple unified framework for detecting out-of-distribution samples and adversarial attacks." Advances in neural information processing systems 31 (2018).

[9] Wang, Haoqi, et al. "Vim: Out-of-distribution with virtual-logit matching." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.

[10] Han, Jiaming, et al. "Expanding low-density latent regions for open-set object detection." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

[11] Liu, Weitang, et al. "Energy-based out-of-distribution detection." Advances in neural information processing systems 33 (2020): 21464-21475.

[12] Liang, Wenteng, et al. "Unknown Sniffer for Object Detection: Don't Turn a Blind Eye to Unknown Objects." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

[13] Joseph, K. J., et al. "Towards open world object detection." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.

[14] Xie, Jinheng, et al. "Open-World Weakly-Supervised Object Localization." arXiv preprint arXiv:2304.08271 (2023).

[15] Yang, Haosen, et al. "Recognize any regions." arXiv preprint arXiv:2311.01373 (2023).

[16] Long, Yanxin, et al. "Capdet: Unifying dense captioning and open-world detection pretraining." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

[17] Waswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//NIPS. 2017.

[18] Zhao X, Li X, Duan H, et al. Mg-llava: Towards multi-granularity visual instruction tuning [J]. arxiv preprint arxiv:2406.17770, 2024.

[19] Chen Y. Convolutional neural network for sentence classification [J]. 2015.

[20] Devlin J. Bert: Pre-training of deep bidirectional transformers for language understanding [J]. arxiv preprint arxiv:1810.04805, 2018.

[21] Lin T Y, Maire M, Belongie S, et al. Microsoft coco: Common objects in context[C]//Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer International Publishing, 2014: 740-755.

[22] Everingham M, Van Gool L, Williams C K I, et al. The pascal visual object classes (voc) challenge [J]. International journal of computer vision, 2010, 88: 303-338.

[23] Dhamija, Akshay, et al. "The overlooked elephant of object detection: Open set." Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2020.

[24] Gupta, Akshita, et al. "Ow-detr: Open-world detection transformer." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

[25] Wu, Yan, et al. "Two-branch objectness-centric open world detection." Proceedings of the 3rd International Workshop on Human-Centric Multimedia Analysis. 2022.

[26] Zohar Orr, Kuan-Chieh Wang, and Serena Yeung. "Prob: Probabilistic objectness for open world object detection." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

[27] Ma, Shuailei, et al. "Cat: Localization and identification cascade detection transformer for open-world object detection." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

[28] Fang, Ruohuan, et al. "Unsupervised Recognition of Unknown Objects for Open-World Object Detection." arXiv preprint arXiv:2308.16527 (2023).

[29] Wang, Yanghao, et al. "Random boxes are open-world object detectors." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.

[30] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2117-2125.

[31] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.

[32] Xu B, Wang N, Chen T, et al. Empirical evaluation of rectified activations in convolutional network (2015) [J]. arxiv preprint

[33] Ross T Y, Dollár G. Focal loss for dense object detection[C]//proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2980-2988.

Downloads

Published

27-03-2025

Issue

Section

Articles