深度特征的实例图像检索算法综述

doi:10.3778/j.issn.1673-9418.2210125

摘要/Abstract

摘要： 基于内容的图像检索算法（CBIR）目标是在数量庞大的图像数据库中通过分析视觉内容，找出与查询图像在语义上匹配或相近的图像。其中通过特征提取获得具有判别性的图像表示对检索结果至关重要。随着深度学习的不断发展，图像检索中使用的图像特征表示方法也逐渐由原来的基于手工特征的方法转变为基于深度特征的方法。通过从特征提取的不同方法角度出发，回顾并追踪了最近基于深度特征的图像检索算法。对基于深度特征的图像检索算法分为基于深度全局特征与基于深度局部特征的图像检索算法两方面进行综述，其中在基于深度局部特征算法中重点关注了深度卷积特征聚合技术。并对现在广泛应用的深度全局与局部特征融合的图像检索方法进行归纳。探讨了深度特征的实例图像检索技术在遥感图像检索、电子商务产品检索和医疗图像检索领域中的实际应用，并比较这些特征提取算法在图像检索精度方面的表现。最后展望了深度特征提取技术在实例图像检索领域的未来研究趋势。

关键词: 实例图像检索, 深度学习, 深度全局特征, 深度局部特征

Abstract: Content-based image retrieval algorithm (CBIR) aims to find semantically matching or similar images with query images. It analyzes visual content in a large number of image databases. It is important to obtain discriminant image representation by feature extraction. With the continuous development of deep learning, the image feature representation method used in image retrieval has gradually changed. The original extraction method is based on manual features. Now it is based on deep features. From the perspective of different feature extraction methods, the recent image retrieval algorithms based on depth feature are reviewed and traced. The image retrieval algorithms based on depth feature are divided into two aspects: depth global feature and depth local feature. The deep convolution feature aggregation technique is emphasized in the deep local feature algorithm. The widely used image retrieval methods of deep global and local feature fusion are summarized. This paper discusses the practical application of deep feature image retrieval technology in remote sensing image retrieval, e-commerce product retrieval and medical image retrieval. And it compares the performance of these feature extraction algorithms in image retrieval accuracy. Finally, the future research trend of depth feature extraction in case image retrieval is forecasted.

Key words: instance level image retrieval, deep learning, depth global feature, depth local feature

季长清, 王兵兵, 秦静, 汪祖民. 深度特征的实例图像检索算法综述[J]. 计算机科学与探索, 2023, 17(7): 1565-1575.

JI Changqing, WANG Bingbing, QIN Jing, WANG Zumin. Survey of Deep Feature Instance Level Image Retrieval Algorithms[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(7): 1565-1575.

参考文献

[1] DUBEY S R. A decade survey of content based image retrieval using deep learning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 32(5): 2687-2704.
[2] GORDO A, LARLUS D. Beyond instance-level image retrieval: leveraging captions to learn a global visual representation for semantic retrieval[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6589-6598.
[3] BARZ B, DENZLER J. Content-based image retrieval and the semantic gap in the deep learning era[C]//LNCS 12662:Proceedings of the 2021 ICPR International Workshops and Challenges Pattern Recognition, Jan 10-15, 2021. Cham: Springer, 2021: 245-260.
[4] WANG X, HAN X T, HUANG W L, et al. Multi-similarity loss with general pair weighting for deep metric learning[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 5022-5030.
[5] FENG Q, WEI Y, YI Y, et al. Local ternary cross structure pattern: a color LBP feature extraction with applications in CBIR[J]. Applied Sciences, 2019, 9(11): 2211.
[6] LI X, YANG J, MA J. Recent developments of content-based image retrieval (CBIR)[J]. Neurocomputing, 2021, 452: 675-689.
[7] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
[8] LIU Y, GUO Y Y, FANG J, et al. Review of cross-modal image and text retrieval of deep learning[J]. Computer Science and Exploration, 2022, 16(3): 489.
[9] BAI C, CHEN J, HUANG L, et al. Saliency-based multi-feature modeling for semantic image retrieval[J]. Journal of Visual Communication and Image Representation, 2018, 50: 199-204.
[10] LAI C C, CHEN Y C. A user-oriented image retrieval system based on an interactive genetic algorithm[J]. IEEE Transactions on Instrumentation and Measurement, 2011, 60(10): 3318-3325.
[11] ZHAO Z, TIAN Q, SUN H, et al. Content based image retrieval scheme using color, texture and shape features[J]. International Journal of Signal Processing, Image Processing and Pattern Recognition, 2016, 9(1): 203-212.
[12] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 26th Annual Conference on Neural Information Processing Systems 2012, Lake Tahoe, Dec 3-6, 2012. Red Hook: Curran Associates, 2012: 1106-1114.
[13] BABENKO A, SLESAREV A, CHIGORIN A, et al. Neural codes for image retrieval[C]//LNCS 8689: Proceedings of the 13th European Conference on Computer Vision, Zurich, Sep 6-12, 2014. Cham: Springer, 2014: 584-599.
[14] SHARIF RAZAVIAN A, AZIZPOUR H, SULLIVAN J, et al. CNN features off-the-shelf: an astounding baseline for recognition[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 23-28, 2014. Washington: IEEE Computer Society, 2014: 512-519.
[15] PIRAS L, GIACINTO G. Information fusion in content based image retrieval: a comprehensive overview[J]. Information Fusion, 2017, 37: 50-60.
[16] WANG J, ZHANG T, SEBE N, et al. A survey on learning to hash[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 769-790.
[17] 张皓, 吴建鑫. 基于深度特征的无监督图像检索研究综述[J]. 计算机研究与发展, 2018, 55(9): 1829-1842.
ZHANG H, WU J X. A survey on unsupervised image retrieval using deep features[J]. Journal of Computer Research and Development, 2018, 55(9): 1829-1842.
[18] CHEN W, LIU Y, WANG W, et al. Deep learning for instance retrieval: a survey[J]. arXiv:2101.11282, 2021.
[19] 杨慧, 施水才. 基于内容的图像检索技术研究综述[J/OL].软件导刊[2023-02-07]. http://kns.cnki.net/kcms/detail/42. 1671.TP.20230118.1313.002.html.
YANG H, SHI S C. Review of content-based image retrieval technology[J/OL]. Software Guide[2023-02-07]. http://kns.cnki.net/kcms/detail/42.1671.TP.20230118.1313.002.html.
[20] JAIN A K, VAILAYA A. Image retrieval using color and shape[J]. Pattern Recognition, 1996, 29(8): 1233-1244.
[21] MANJUNATH B S, MA W Y. Texture features for browsing and retrieval of image data[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1996, 18(8): 837-842.
[22] KUMAR P S S, KUMAR N U, USHASREE A, et al. Keypoint oriented shape features and SVM classifier for content-based image retrieval[J]. Materials Today: Proceedings, 2020.
[23] SIVIC J, ZISSERMAN A. Video Google: a text retrieval approach to object matching in videos[C]//Proceedings of the 9th IEEE International Conference on Computer Vision,Nice, Oct 14-17, 2003. Washington: IEEE Computer Society, 2003: 1470-1470.
[24] JING L, TIAN Y. Self-supervised visual feature learning with deep neural networks: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(11): 4037-4058.
[25] NG J Y H, YANG F, DAVIS L S. Exploiting local features from deep networks for image retrieval[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Washington: IEEE Computer Society, 2015: 53-61.
[26] NG T, BALNTAS V, TIAN Y, et al. SOLAR: second-order loss and attention for image retrieval[C]//LNCS 12370:Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 253-270.
[27] ABDI H, WILLIAMS L J. Principal component analysis[J]. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(4): 433-459.
[28] SONG J, HE T, GAO L, et al. Deep region hashing for efficient large-scale instance search from images[J]. arXiv:1701.07901, 2017.
[29] 易军凯, 何潇然, 姜大光. 图像内容理解的深度学习方法[J]. 计算机工程与设计, 2017, 38(3): 756-760.
YI J K, HE X R, JIANG D G. Deep learning method for image content understanding[J]. Computer Engineering and Design, 2017, 38(3): 756-760.
[30] 葛芸, 马琳, 江顺亮, 等. 基于高层特征图组合及池化的高分辨率遥感图像检索[J]. 电子与信息学报, 2019, 41(10): 2487-2494.
GE Y, MA L, JIANG S L, et al. The combination and pooling based on high-level feature map for high-resolution remote sensing image retrieval[J]. Journal of Electronics & Information Technology, 2019, 41(10): 2487-2494.
[31] ANAVI Y, KOGAN I, GELBART E, et al. A comparative study for chest radiograph image retrieval using binary texture and deep learning classification[C]//Proceedings of the 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Milan, Aug 25-29, 2015. Piscataway: IEEE, 2015: 2940-2943.
[32] QAYYUM A, ANWAR S M, AWAIS M, et al. Medical image retrieval using deep convolutional neural network[J]. Neurocomputing, 2017, 266: 8-20.
[33] GORDO A, ALMAZáN J, REVAUD J, et al. Deep image retrieval: learning global representations for image search[C]//LNCS 9910: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 241-257.
[34] LIN K, YANG H F, LIU K H, et al. Rapid clothing retrieval via deep learning of binary codes and hierarchical search[C]//Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, Shanghai, Jun 23-26, 2015. New York: ACM, 2015: 499-502.
[35] HUANG J, FERIS R S, CHEN Q, et al. Cross-domain image retrieval with a dual attribute-aware ranking network[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 1062-1070.
[36] LI Y, LEI H, LIN S, et al. A new sketch-based 3D model retrieval method by using composite features[J]. Multimedia Tools and Applications, 2018, 77(2): 2921-2944.
[37] WEI L H, ZHANG S L, YAO H T, et al. GLAD: global-local-alignment descriptor for pedestrian retrieval[C]//Proceedings of the 2017 ACM on Multimedia Conference,Mountain View, Oct 23-27, 2017. New York: ACM, 2017: 420-428.
[38] XIANG X L, WANG Z P, ZHAO Z C, et al. Multiple saliency and channel sensitivity network for aggregated convolutional feature[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence, the 31st Innovative Applications of Artificial Intelligence Conference, the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, Honolulu, Jan 27-Feb 1, 2019. Menlo Park:AAAI, 2019: 9013-9020.
[39] RAZAVIAN A S, SULLIVAN J, CARLSSON S, et al. Visual instance retrieval with deep convolutional networks[J]. ITE Transactions on Media Technology and Applications, 2016, 4(3): 251-258.
[40] PANG S, XUE J, ZHU J, et al. Unifying sum and weighted aggregations for efficient yet effective image representation computation[J]. IEEE Transactions on Image Processing, 2018, 28(2): 841-852.
[41] TOLIAS G, SICRE R, JéGOU H. Particular object retrieval with integral max-pooling of CNN activations[J]. arXiv:1511.05879, 2015.
[42] BABENKO A, LEMPITSKY V. Aggregating deep convolutional features for image retrieval[J]. arXiv:1510.07493, 2015.
[43] ISCEN A, AVRITHIS Y, TOLIAS G, et al. Fast spectral ranking for similarity search[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7632-7641.
[44] KALANTIDIS Y, MELLINA C, OSINDERO S. Cross-dimensional weighting for aggregated deep convolutional features[C]//LNCS 9913: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 8-10, 2016. Cham: Springer, 2016: 685-701.
[45] JéGOU H, DOUZE M, SCHMID C. On the burstiness of visual elements[C]//Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Miami, Jun 20-25, 2009. Washington:IEEE Computer Society, 2009: 1169-1176.
[46] WEI X S, LUO J H, WU J, et al. Selective convolutional descriptor aggregation for fine-grained image retrieval[J]. IEEE Transactions on Image Processing, 2017, 26(6): 2868-2881.
[47] XU J, SHI C, QI C, et al. Unsupervised part-based weighting aggregation of deep convolutional features for image retrieval[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence, the 30th Innovative Applications of Artificial Intelligence, and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, Feb 2-7, 2018. Menlo Park: AAAI, 2018: 7436-7443.
[48] XU J, WANG C, QI C, et al. Unsupervised semantic-based aggregation of deep convolutional features[J]. IEEE Tran-sactions on Image Processing, 2018, 28(2): 601-611.
[49] ZHU J, WANG J, PANG S, et al. Co-weighting semantic convolutional features for object retrieval[J]. Journal of Visual Communication and Image Representation, 2019, 62: 368-380.
[50] LIU G H, YANG J Y. Deep-seated features histogram: a novel image retrieval method[J]. Pattern Recognition, 2021, 116: 107926.
[51] ZHOU J, GAN J, GAO W, et al. Image retrieval based on aggregated deep features weighted by regional significance and channel sensitivity[J]. Information Sciences, 2021, 577: 69-80.
[52] LU F, LIU G H. Image retrieval using contrastive weight aggregation histograms[J]. Digital Signal Processing, 2022, 123: 103457.
[53] LI Y, HE Z, MA J, et al. A novel feature aggregation approach for image retrieval using local and global features[J]. Computer Modeling in Engineering & Sciences, 2022, 131(1): 239-262.
[54] SONG C H, HAN H J, AVRITHIS Y. All the attention you need: global-local, spatial-channel attention for image retri-eval[C]//Proceedings of the 2022 IEEE/CVF Winter Con-ference on Applications of Computer Vision, Waikoloa, Jan 3-8, 2022. Piscataway: IEEE, 2022: 439-448.
[55] ZHANG Z, LUO C, WU H, et al. From individual to whole: reducing intra-class variance by feature aggregation[J]. International Journal of Computer Vision, 2022, 130(3): 800-819.
[56] CAO B, ARAUJO A, SIM J. Unifying deep local and global features for image search[C]//LNCS 12365: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 726-743.
[57] YANG M, HE D, FAN M, et al. DOLG: single-stage image retrieval with deep orthogonal fusion of local and global features[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021. Piscataway: IEEE, 2021: 11772-11781.
[58] HENKEL C. Efficient large-scale image retrieval with deep feature orthogonality and hybrid-swin-transformers[J]. arXiv: 2110.03786, 2021.
[59] SONG Y, ZHU R, YANG M, et al. DALG: deep attentive local and global modeling for image retrieval[J]. arXiv:2207.00287, 2022.
[60] CHEN Y, ZHANG S, LIU F, et al. Transhash: transformer-based Hamming hashing for efficient image retrieval[C]//Proceedings of the 2022 International Conference on Multi-media Retrieval, Newark, Jun 27-30, 2022. New York:ACM, 2022: 127-136.
[61] LI Y, ZHANG Y, TAO C, et al. Content-based high-reso-lution remote sensing image retrieval via unsupervised feature learning and collaborative affinity metric fusion[J]. Remote Sensing, 2016, 8(9): 709.
[62] ZHOU W, NEWSAM S, LI C, et al. Learning low dimensional convolutional neural networks for high-resolution remote sensing image retrieval[J]. Remote Sensing, 2017, 9(5): 489.
[63] HU F, TONG X, XIA G S, et al. Delving into deep represen-tations for remote sensing image retrieval[C]//Proceedings of the 2016 IEEE 13th International Conference on Signal Processing, Chengdu, Nov 6-10, 2016. Piscataway: IEEE, 2016: 198-203.
[64] CAO R, ZHANG Q, ZHU J, et al. Enhancing remote sensing image retrieval with triplet deep metric learning network[J]. arXiv:1902.05818, 2019.
[65] WANG Z, LIU X, LI H, et al. A saliency detection based unsupervised commodity object retrieval scheme[J]. IEEE Access, 2018, 6: 49902-49912.
[66] YAMAGUCHI T, ARASE K, TOGASHI R, et al. Closing the gap between query and database through query feature transformation in C2C e-commerce visual search[C]//Pro-ceedings of the SIGIR 2019 Workshop on eCommerce, co-located with the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, Jul 25, 2019: 1-4.
[67] MAGNANI A, LIU F, XIE M, et al. Neural product retrieval at walmart.com[C]//Proceedings of the 2019 World Wide Web Conference，San Francisco, May 13-17, 2019. New York: ACM, 2019: 367-372.
[68] WICKSTR?M K K, ?STMO E A, RADIYA K, et al. A clinically motivated self-supervised approach for content-based image retrieval of CT liver images[J]. arXiv:2207. 04812, 2022.
[69] MOHITE N B, GONDE A B. Deep features based medical image retrieval[J]. Multimedia Tools and Applications, 2022, 81(8): 11379-11392.
[70] TRUONG T, MOHAMMADI S, LENGA M. How transferable are self-supervised features in medical image classification tasks?[C]//Proceedings of the Machine Learning for Health,Dec 4, 2021: 54-74.
[71] AZIZI S, MUSTAFA B, RYAN F, et al. Big self-supervised models advance medical image classification[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021. Piscataway: IEEE, 2021: 3458-3468.
[72] JEGOU H, DOUZE M, SCHMID C. Product quantization for nearest neighbor search[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 33(1): 117-128.
[73] PERRONNIN F, LIU Y, SáNCHEZ J, et al. Large-scale image retrieval with compressed fisher vectors[C]//Pro-ceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, Jun 13-18, 2010 . Piscataway: IEEE, 2010: 3384-3391.
[74] ZHOU S R, XIE Y, CAI B Y. Deep Hash image retrieval method based on multi-scale features[J]. Computer Science and Exploration, 2018, 12(12): 1974-1986.