[1] XU M, ZHU Z, ZHAO Y, et al. Subspace learning by kernel dependence maximization for cross-modal retrieval[J]. Neuro-computing, 2018, 309: 94-105.
[2] CAO G Q, WARIS M A, IOSIFIDIS A, et al. Multi-modal subspace learning with dropout regularization for cross-modal recognition and retrieval[C]//Proceedings of the 2016 6th International Conference on Image Processing Theory, Tools and Applications,?Oulu, Dec 12-15, 2016. Piscataway: IEEE, 2016: 1-6.
[3] HE R, ZHANG M, WANG L, et al. Cross-modal learning via pairwise constraints[J]. IEEE Transactions on Image Pro-cessing, 2014, 24(12): 5543-5556.
[4] HARDOON D R, SZEDMAK S, SHAWE-TAYLOR J. Can-onical correlation analysis: an overview with application to learning methods[J]. Neural Computation, 2004, 16(12): 2639-2664.
[5] ANDREW G, ARORA R, BILMES J A, et al. Deep can-onical correlation analysis[C]//Proceedings of the 30th Inter-national Conference on Machine Learning, Atlanta, Jun 16-21, 2013: 1247-1255.
[6] ROSIPAL R, KR?MER N. Overview and recent advances in partial least squares[C]//LNCS 3940: Proceedings of the Subspace, Latent Structure and Feature Selection, Statistical and Optimization, Perspectives Workshop, Bohinj, Feb 23-25,?2005. Berlin, Heidelberg: Springer, 2005: 34-51.
[7] SHARMA A, KUMAR A, DAUMé H, et al. Generalized multiview analysis: a discriminative latent space[C]//Pro-ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, Jun 16-21, 2012. Wash-ington: IEEE Computer Society, 2012: 2160-2167.
[8] WANG K Y, HE R, WANG W, et al. Learning coupled feature spaces for cross-modal matching[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Dec 1-8, 2013. Washington: IEEE Computer Society, 2013: 2088-2095.
[9] QU G Z, XIAO J, ZHU J, et al. Cross-modal learning to rank with adaptive listwise constraint[C]//Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, Apr 15-20, 2018. Piscataway: IEEE, 2018: 1658-1662.
[10] XU Q Z, LI M, YU M J. Learning to rank with relational graph and pointwise constraint for cross-modal retrieval[J]. Soft Computing, 2019, 23(19): 9413-9427.
[11] FROME A, CORRADO G S, SHLENS J, et al. DeViSE: a deep visual-semantic embedding model[C]//Proceedings of the 27th Annual Conference on Neural Information Process-ing Systems, Lake Tahoe, Dec 5-10, 2013. Red Hook: Curran Associates, 2013: 2121-2129.
[12] JIAN Y W, XIAO J, CAO Y, et al. Deep pairwise ranking with multi-label information for cross-modal retrieval[C]// Proceedings of the 2019 IEEE International Conference on Multimedia and Expo, Shanghai, Jul 8-12, 2019. Piscataway: IEEE,?2019: 1810-1815.
[13] YUAN Z Q, SANG J T, LIU Y, et al. Latent feature learning in social media network[C]//Proceedings of the 2013 ACM Multimedia Conference, Barcelona, Oct 21-25, 2013. New York: ACM, 2013: 253-262.
[14] WANG J, HE Y H, KANG C C, et al. Image-text cross-modal retrieval via modality-specific feature learning[C]// Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, Shanghai, Jun 23-26, 2015. New York: ACM, 2015: 347-354.
[15] JI Z, WANG H R, HAN J G, et al. Saliency-guided atten-tion network for image-sentence matching[C]//Proceedings of the 2019 IEEE/CVF International Conference on Com-puter Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 5754-5763.
[16] JI Z, LIN Z G, WANG H R, et al. Multi-modal memory enhancement attention network for image-text matching[J]. IEEE Access, 2020, 8: 38438-38447.
[17] WANG H R, JI Z, LIN Z G, et al. Stacked squeeze and excitation recurrent residual network for visual semantic matching[J]. Pattern Recognition, 2020, 105: 107359.
[18] GU W, GU X Y, GU J Z, et al. Adversary guided asymme-tric Hashing for cross-modal retrieval[C]//Proceedings of the 2019 International Conference on Multimedia Retrieval, Ottawa, Jun 10-13, 2019. New York: ACM, 2019: 159-167.
[19] LI K, DING Z M, LI K P, et al. Support neighbor loss for person re-identification[C]//Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Oct 22-26, 2018. New York: ACM, 2018: 1492-1500.
[20] ZHENG Z D, ZHENG L, YANG Y. A discriminatively learned CNN embedding for person reidentification[J]. ACM Tran-sactions on Multimedia Computing, Communications, and Applications, 2017, 14(1): 1-20.
[21] RASIWASIA N, COSTA PEREIRA J, COVIELLO E, et al. A new approach to cross-modal multimedia retrieval[C]// Proceedings of the 18th ACM International Conference on Multimedia, Firenze, Oct 25-29, 2010. New York: ACM, 2010: 251-260.
[22] JIA Y, SHELHAMER E, DONAHUE J, et al. Caffe: con-volutional architecture for fast feature embedding[C]//Pro-ceedings of the 22nd ACM International Conference on Multimedia, Orlando, Nov 3-7, 2014. New York: ACM, 2014: 675-678.
[23] HWANG S J, GRAUMAN K. Reading between the lines: object localization using implicit cues from image tags[J]. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 2011, 34(6): 1145-1158.
[24] RANJAN V, RASIWASIA N, JAWAHAR C V. Multi-label cross-modal retrieval[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Piscataway: IEEE, 2015: 4094-4102.
[25] KANG C, XIANG S, LIAO S, et al. Learning consistent feature representation for cross-modal multimedia retrieval[J]. IEEE Transactions on Multimedia, 2015, 17(3): 370-381.
[26] PENG Y, HUANG X, QI J. Cross-media shared representa-tion by hierarchical learning with multiple deep networks[C]//Proceedings of the 25th International Joint Conference on Artificial Intelligence, New York, Jul 9-15, 2016. Palo Alto: AAAI Press, 2016: 3846-3853.
[27] BELHUMEUR P N, HESPANHA J P, KRIEGMAN D J. Eigenfaces vs. fisherfaces: recognition using class specific linear projection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(7): 711-720.
[28] YAN S, XU D, ZHANG B, et al. Graph embedding and ex-tensions: a general framework for dimensionality reduction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 29(1): 40-51.
[29] ZHANG L, MA B, LI G, et al. Metric based on multi-order spaces for cross-modal retrieval[C]//Proceedings of the 2017 IEEE International Conference on Multimedia and Expo, Hong Kong, Jul 10-14, 2017. Piscataway: IEEE, 2017: 1374-1379.
[30] WU F, JIANG X, LI X, et al. Cross-modal learning to rank via latent joint representation[J]. IEEE Transactions on Image Processing, 2015, 24(5): 1497-1509. |