Infrared and Visible Image Fusion Algorithm Based on Cross-Modal Attention Mechanism

Zhiyuan Wang

doi:10.54097/7j2r3z20

Authors

Zhiyuan Wang

DOI:

https://doi.org/10.54097/7j2r3z20

Keywords:

Image fusion, attention mechanism, Swin Transformer

Abstract

As a critical branch of multi-modal image processing, infrared and visible image fusion boasts high application value in intelligent security and has drawn widespread global research attention. Enhancing model feature extraction is a core scientific challenge in this field. This paper presents a dual-branch infrared and visible image fusion algorithm based on feature decomposition. In fusion tasks, shared features characterize global information while private features focus on local details. To boost feature representation, we design a feature decomposition module that splits shallow features into shared and private components: a coarse-grained branch with medium-to-large receptive fields handles global shared features, and a fine-grained branch with small receptive fields extracts local private features. Parallel dual-branch processing of decomposed features enables precise data structure mining, reduces redundancy, and efficiently captures key information. Experiments on four mainstream public datasets validate that the proposed algorithm surpasses state-of-the-art methods in information extraction, detail preservation and fusion performance.

Downloads

Download data is not yet available.

References

[1] Xu H, Ma J, Le Z, et al. FusionDN: A Unified Densely Connected Network for Image Fusion [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(07): 12484-12491.

[2] Zhang Q, Fu Y, Li H, et al. Dictionary learning method for joint sparse representation-based image fusion [J]. Optical Engineering, 2013, 52(5): 057006.

[3] Li H, Wu X J, Kittler J. MDLatLRR: A Novel Decomposition Method for Infrared and Visible Image Fusion [J]. IEEE Transactions on Image Processing, 2020, 29: 4733-4746.

[4] Bin Y, Chao Y, Guoyu H. Efficient image fusion with approximate sparse representation [J]. International Journal of Wavelets, Multiresolution and Information Processing, 2016, 14(04): 1650024.

[5] Song C, Gao X, Qiao Y L, et al. Infrared and visible image fusion based on oversampled graph filter banks [J]. Journal of Electronic Imaging, 2020, 29(2): 023016.

[6] He K, Zhou D, Zhang X, et al. Infrared and visible image fusion based on target extraction in the nonsubsampled contourlet transform domain [J]. Journal of Applied Remote Sensing, 2017, 11(1): 015011.

[7] Zhi-She W, Feng-Bao Y, Zhi-Hao P, et al. Multi-sensor image enhanced fusion algorithm based on NSST and top-hat transformation [J]. Optik, 2015, 126(23): 4184-4190.

[8] Lu M, Jiang M, Kong J, et al. LDRepFM: A Real-Time End-to-End Visible and Infrared Image Fusion Model Based on Layer Decomposition and Re-Parameterization [J]. IEEE Transactions on Instrumentation and Measurement, 2023, 72: 1-12.

[9] Liu Y, Chen X, Peng H, et al. Multi-focus image fusion with a deep convolutional neural network [J]. Information Fusion, 2017, 36: 191-207.

[10] Zhang Y, Liu Y, Sun P, et al. IFCNN: A general image fusion framework based on convolutional neural network [J]. Information Fusion, 2020, 54: 99-118.

[11] Xu H, Ma J, Jiang J, et al. U2Fusion: A Unified Unsupervised Image Fusion Network [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(1): 502-518.

[12] Li H, Wu X J. DenseFuse: A Fusion Approach to Infrared and Visible Images [J]. IEEE Transactions on Image Processing, 2019, 28(5): 2614-2623.

[13] Li H, Wu X J, Kittler J. RFN-Nest: An end-to-end residual fusion network for infrared and visible images [J]. Information Fusion, 2021, 73: 72-86.

[14] Hong Y, Wu X J, Xu T. MEFuse: end-to-end infrared and visible image fusion method based on multibranch encoder [J]. Journal of Electronic Imaging, 2022, 31(3): 033043.

[15] Ma J, Yu W, Liang P, et al. FusionGAN: A generative adversarial network for infrared and visible image fusion [J]. Information Fusion, 2019, 48: 11-26.

[16] Ma J, Xu H, Jiang J, et al. DDcGAN: A Dual-Discriminator Conditional Generative Adversarial Network for Multi-Resolution Image Fusion [J]. IEEE Transactions on Image Processing, 2020, 29: 4980-4995.

[17] Le Z, Huang J, Xu H, et al. UIFGAN: An unsupervised continual-learning generative adversarial network for unified image fusion [J]. Information Fusion, 2022, 88: 305-318.

[18] Vs V, Jose Valanarasu J M, Oza P, et al. Image Fusion Transformer [C]//2022 IEEE International Conference on Image Processing (ICIP). 2022: 3566-3570.

[19] Li H, Wu X J, Durrani T. NestFuse: An Infrared and Visible Image Fusion Architecture Based on Nest Connection and Spatial/Channel Attention Models [J]. IEEE Transactions on Instrumentation and Measurement, 2020, 69(12): 9645-9656.

[20] Wang Z, Wu Y, Wang J, et al. Res2Fusion: Infrared and Visible Image Fusion Based on Dense Res2net and Double Nonlocal Attention Models [J]. IEEE Transactions on Instrumentation and Measurement, 2022, 71: 1-12.

[21] Tang W, He F, Liu Y, et al. DATFuse: Infrared and Visible Image Fusion via Dual Attention Transformer [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(7): 3159-3172.

[22] Liu Z, Lin Y, Cao Y, et al. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 10012-10022.

[23] Wang Z, Chen Y, Shao W, et al. SwinFuse: A Residual Swin Transformer Fusion Network for Infrared and Visible Images [J]. IEEE Transactions on Instrumentation and Measurement, 2022, 71: 1-12.

[24] Ma J, Tang L, Fan F, et al. SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer [J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9(7): 1200-1217.

[25] Li H, Wu X J. DenseFuse: A Fusion Approach to Infrared and Visible Images [J]. IEEE Transactions on Image Processing, 2019, 28(5): 2614-2623.

[26] Ma J, Yu W, Liang P, et al. FusionGAN: A generative adversarial network for infrared and visible image fusion [J]. Information Fusion, 2019, 48: 11-26.

[27] Li H, Wu X J, Durrani T. NestFuse: An Infrared and Visible Image Fusion Architecture Based on Nest Connection and Spatial/Channel Attention Models [J]. IEEE Transactions on Instrumentation and Measurement, 2020, 69(12): 9645-9656.

[28] Zhang Y, Liu Y, Sun P, et al. IFCNN: A general image fusion framework based on convolutional neural network [J]. Information Fusion, 2020, 54: 99-118.

[29] Zhang H, Ma J. SDNet: A Versatile Squeeze-and-Decomposition Network for Real-Time Image Fusion [J]. International Journal of Computer Vision, 2021, 129(10): 2761-2785.

[30] Wang Z, Wu Y, Wang J, et al. Res2Fusion: Infrared and Visible Image Fusion Based on Dense Res2net and Double Nonlocal Attention Models [J]. IEEE Transactions on Instrumentation and Measurement, 2022, 71: 1-12.

[31] Xu H, Ma J, Jiang J, et al. U2Fusion: A Unified Unsupervised Image Fusion Network [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(1): 502-518.

[32] Ma J, Tang L, Fan F, et al. SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer [J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9(7): 1200-1217.

[33] Tang W, He F, Liu Y, et al. DATFuse: Infrared and Visible Image Fusion via Dual Attention Transformer [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(7): 3159-3172.

[34] Tang L, Xiang X, Zhang H, et al. DIVFusion: Darkness-free infrared and visible image fusion [J]. Information Fusion, 2023, 91: 477-493.