[1] KARPATHY A, TODERICI G, SHETTY S, et al. Large-scale video classification with convolutional neural networks[C]//Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 20-23, 2014. Piscataway: IEEE, 2014: 1725-1732.
[2] HUANG G, LIU Z, VAN DER M L, et al. Densely connected convolutional networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition,Las Vegas, Jul 21-26, 2017. Piscataway: IEEE, 2017: 4700-4708.
[3] HE J, CHEN J N, LIU S, et al. TransFG: a transformer archi-tecture for fine-grained recognition[J]. arXiv:2103.07976, 2021.
[4] DU R Y, CHANG D L, BHUNIA A K, et al. Fine-grained visual classification via progressive multi-granularity training of jigsaw patches[C]//Proceedings of the 16th European Conference on Computer Vision,Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 153-168.
[5] ZHUANG P Q, WANG Y L, QIAO Y. Learning attentive pairwise interaction for fine-grained classification[C]//Pro-ceedings of the 34th AAAI Conference on Artificial Intelli-gence, the 32nd Innovative Applications of Artificial Intelli-gence Conference, the 10th AAAI Symposium on Educa-tional Advances in Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 13130-13137.
[6] ZHANG H, XU T, ELHOSEINY M, et al. SPDA-CNN: unifying semantic part detection and abstraction for fine-grained recognition[C]//Proceedings of the 2016 IEEE Con-ference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Piscataway: IEEE, 2016: 1143-1152.
[7] KRAUSE J, JIN H L, YANG J C, et al. Fine-grained recog-nition without part annotations[C]//Proceedings of the 28th IEEE Conference on Computer Vision and Pattern Recogni-tion, Boston, Jun 7-12, 2015. Piscataway: IEEE, 2015: 5546-5555.
[8] WANG Y M, CHOI J, MORARIU V, et al. Mining discri-minative triplets of patches for fine-grained classification[C]//Proceedings of the 2016 IEEE Conference on Com-puter Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Piscataway: IEEE, 2016: 1163-1172.
[9] LIN TY, ROYCHOWDHURY A, MAJI S. Bilinear CNN models for fine-grained visual recognition[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 13-16, 2015. Piscataway: IEEE, 2015: 1449-1457.
[10] JI R Y, WEN L Y, ZHANG L B, et al. Attention convolu-tional binary neural tree for fine-grained visual categoriza-tion[C]//Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, Jun 13-19, 2020. Piscataway: IEEE, 2020: 10468-10477.
[11] ZHANG F, LIMEN G, ZHAI G S, et al. Multi-branch and multi-scale attention learning for fine-grained visual catego-rization[C]//LNCS 12572: Proceedings of the 27th Interna-tional Conference on Multimedia Modeling, Prague, Jun 22-24, 2021. Cham: Springer, 2021: 136-147.
[12] CHEN Y, BAI Y L, ZHANG W, et al. Destruction and construction learning for fine-grained image recognition[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 5157-5166.
[13] YANG Z, LUO T G, WANG D, et al. Learning to navigate for fine-grained classification[C]//LNCS 11218: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 438-454.
[14] ZHANG N, DONAHUE J, GIRSHICK R, et al. Part-based R-CNNs for fine-grained category detection[C]//LNCS 8689: Proceedings of the 13th European Conference on Computer Vision, Zurich, Sep 6-12, 2014. Cham: Springer, 2014: 834-849.
[15] LIU X, XIA T, WANG J, et al. Fully convolutional attention localization networks: efficient attention localization for fine-grained recognition[J]. arXiv:1603.06765, 2016.
[16] BRANSON S, VAN H G, BELONGIE S, et al. Bird species categorization using pose normalized deep convolutional nets[J]. arXiv:1406.2952, 2014.
[17] HUANG S L, XU Z, TAO D C, et al. Part-stacked CNN for fine-grained visual categorization[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recogni-tion, Las Vegas, Jun 27-30, 2016. Piscataway: IEEE, 2016: 1173-1182.
[18] GAO Y, HAN X T, WANG X, et al. Channel interaction networks for fine-grained image categorization[C]//Procee-dings of the 34th AAAI Conference on Artificial Intelli-gence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 10818-10825.
[19] CHANG D L, DING Y F, XIE J Y, et al. The devil is in the channels: mutual-channel loss for fine-grained image classi-fication[J]. IEEE Transactions on Image Processing, 2020, 29: 4683-4695.
[20] ZHANG L B, HUANG S L, LIU W, et al. Learning a mixture of granularity-specific experts for fine-grained cate-gorization[C]//Proceedings of the 26th IEEE International Conference on Computer Vision, Seoul, Oct 27-Nov 3, 2019. Piscataway: IEEE, 2019: 8330-8339.
[21] XU S, CHANG D L, XIE J Y, et al. Grad-CAM: guided channel-spatial attention module for fine-grained visual cla-ssification[C]//Proceedings of the 2021 IEEE 31st Interna-tional Workshop on Machine Learning for Signal Processing, Gold Coast, Oct 25-28, 2021. Piscataway: IEEE, 2021: 1-6.
[22] HADSELL R, CHOPRA S, LECUN Y. Dimensionality reduc-tion by learning an invariant mapping[C]//Proceedings of the 19th IEEE Conference on Computer Vision and Pattern Recognition, New York, Jun 17-22, 2006. Piscataway: IEEE, 2006: 1735-1742.
[23] GRILL J B, STRUB F, ALTCHE F, et al. Bootstrap your own latent—a new approach to self-supervised learning[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 21271-21284.
[24] SHARMA V, TAPASWI M, SARFRAZ M S, et al. Cluster-ing based contrastive learning for improving face represen-tations[C]//Proceedings of the 2020 IEEE International Conference on Automatic Face and Gesture Recognition, Buenos Aires, Nov 16-20, 2020. Piscataway: IEEE, 2020: 109-116.
[25] DOSOVITSKIY A, SPRINGENBERG J T, RIEDMILLER M, et al. Discriminative unsupervised feature learning with convolutional neural networks[C]//Advances in Neural Infor-mation Processing Systems 27, Montreal, Dec 8-13, 2014: 766-774.
[26] SCHROFF F, KALENICHENKO D, PHILBIN J. FaceNet: a unified embedding for face recognition and clustering[C]//Proceedings of the 28th IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Pis-cataway: IEEE, 2015: 815-823.
[27] LI Y F, HU P, LIU Z T, et al. Contrastive clustering[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence, the 32nd Conference on Innovative Applica-tions of Artificial Intelligence, the 11th Symposium on Edu-cational Advances in Artificial Intelligence, Feb 2-9, 2021. Menlo Park: AAAI, 2021: 8547-8555.
[28] DANG Z Y, DENG C, YANG X, et al. Doubly contrastive deep clustering[J]. arXiv:2103.05484, 2021.
[29] WAH C, BRANSON S, WELINDER P, et al. The caltech-ucsd birds-200-2011 dataset[R]. Pasadena: California Institute of Technology, 2011.
[30] MAJI S, RAHTU E, KANNALA J, et al. Fine-grained visual classification of aircraft[J]. arXiv:1306.5151, 2013.
[31] KRAUSE J, STARK M, DENG J, et al. 3D object represen-tations for fine-grained categorization[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops, Portland, Jun 23-24, 2013. Piscataway: IEEE, 2013: 554-561.
[32] KHOSLA A, JAYADEVAPRAKASH N, YAOB P, et al. Novel dataset for fine-grained image categorization: Stanford dogs[C]//Proceedings of the 2011 IEEE International Conference on Computer Vision, Barcelona, Nov 6-13, 2011. Piscataway: IEEE, 2011.
[33] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems 25,Lake Tahoe, Dec 3-6, 2012: 1106-1114.
[34] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recogni-tion, Las Vegas, Jun 27-30, 2016. Washington: IEEE Com-puter Society, 2016: 770-778.
[35] RAO Y M, CHEN G Y, LU J W, et al. Counterfactual atten-tion learning for fine-grained visual categorization and re-identification[C]//Proceedings of the 28th IEEE Interna-tional Conference on Computer Vision, Oct 10-17, 2021. Piscataway: IEEE, 2021: 1025-1034.
[36] WANG D Q, SHEN Z Q, SHAO J, et al. Multiple granularity descriptors for fine-grained categorization[C]//Proceedings of the 22nd IEEE International Conference on Computer Vision, Santiago, Dec 13-16, 2015. Piscataway: IEEE, 2015: 2399-2406.
[37] WANG Y M, MORARIU V I, DAVIS L S. Learning a discriminative filter bank within a CNN for fine-grained recognition[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Piscataway: IEEE, 2018: 4148-4157.
[38] FU J L, ZHENG H L, MEI T. Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 4438-4446.
[39] LUO W, YANG X T, MO X J, et al. Cross-x learning for fine-grained visual categorization[C]//Proceedings of the 26th IEEE International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 8241-8250.
[40] ZHANG T, CHANG D L, MAZ Y, et al. Progressive co-attention network for fine-grained visual classification[C]//Proceedings of the 2021 International Conference on Visual Communications and Image Processing, Munich, Dec 5-8,2021. Piscataway: IEEE, 2021: 1-5.
[41] DUBEY A, GUPTA O, GUO P, et al. Pairwise confusion for fine-grained visual classification[C]//LNCS 11216: Procee-dings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 71-88.
[42] DUBEY A, GUPTA O, RASKAR R, et al. Maximum-entropy fine grained classification[C]//Advances in Neural Informa-tion Processing Systems 31, Montréal, Dec 3-8, 2018: 635-645.
[43] SUN M, YUAN Y C, ZHOU F, et al. Multi-attention multi-class constraint for fine-grained image recognition[C]//LNCS 11220: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 834-850.
[44] LUO W, ZHANG H M, LI J, et al. Learning semantically enhanced feature for fine-grained image classification[J]. IEEE Signal Processing Letters, 2020, 27: 1545-1549.
[45] SELVARAJU R R, COGSWELL M, DAS A, et al. GRAD-CAM: visual explanations from deep networks via gradient-based localization[C]//Proceedings of the 24th IEEE Inter-national Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 618-626. |