[1] JOHNSON J, KRISHNA R, STARK M, et al. Image retrieval using scene graphs[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Washington: IEEE Computer Society, 2015: 3668-3678.
[2] MARINO K, SALAKHUTDINOV R, GUPTA A. The more you know: using knowledge graphs for image classification[J]. arXiv:1612.04844, 2017.
[3] FANG Y, KUAN K, LIN J, et al. Object detection meets knowledge graphs[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Aug 19-25, 2017: 1661-1667.
[4] ZITNICK C L, PARIKH D, VANDERWENDE L. Learning the visual interpretation of sentences[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Dec 1-8, 2013. Washington: IEEE Computer Society, 2013: 1681-1688.
[5] YATSKAR M, ZETTLEMOYER L, FARHADI A. Situation recognition: visual semantic role labeling for image understanding[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 5534-5542.
[6] LU C, KRISHNA R, BERNSTEIN M, et al. Visual relationship detection with language priors[J]. arXiv:1608.00187, 2016.
[7] DAI B, ZHANG Y Q, LIN D H. Detecting visual relationships with deep relational networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 3298-3308.
[8] CHEN T S, YU W H, CHEN R Q, et al. Knowledge-embedded routing network for scene graph generation[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 6163-6171.
[9] ZHAN Y B, YU J, YU T, et al. On exploring undetermined relationships for visual relationship detection[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 5128-5137.
[10] LIN X, TIAN X, JI Y, et al. Scene graph generation based on shuffle residual context information[J]. Journal of Computer Research and Development, 2019, 56(8): 1721-1730. 林欣, 田鑫, 季怡, 等.一种残差置乱上下文信息的场景图生成方法[J]. 计算机研究与发展, 2019, 56(8): 1721-1730.
[11] XU D F, ZHU Y K, CHOY C B, et al. Scene graph generation by iterative message passing[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 3097-3106.
[12] ZELLERS R, YATSKAR M, THOMSON S, et al. Neural motifs: scene graph parsing with global context[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 5831-5840.
[13] GU J X, ZHAO H D, LIN Z, et al. Scene graph generation with external knowledge and image reconstruction[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 1969-1978.
[14] KRISHNA R, ZHU Y K, GROTH O, et al. Visual genome: connecting language and vision using crowdsourced dense image annotations[J]. International Journal of Computer Vision, 2017, 123(1): 32-73.
[15] AUER S, BIZER C, KOBILAROV G, et al. DBpedia: a nucleus for a Web of open data[C]//LNCS 4825 : Proceedings of the 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference: the Semantic Web, Busan, Nov 11-15, 2007. Berlin, Heidelberg: Springer, 2007: 722-735.
[16] FELLBAUM C. WordNet[M]//Encyclopedia of Language and Linguistics. New York: Elsevier Science Inc., 2012.
[17] LIU H, SINGH P. ConceptNet—a practical commonsense reasoning tool-kit[J]. BT Technology Journal, 2004, 22(4): 211-226.
[18] LEE C W, FANG W, YEH C K, et al. Multi-label zero-shot learning with structured knowledge graphs[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 1576-1585.
[19] DENG J, DING N, JIA Y Q, et al. Large-scale object classification using label relation graphs[C]//LNCS 8689: Proceedings of the 13th European Conference on Computer Vision, Sep 6-12, 2014. Cham: Springer, 2014: 48-64.
[20] WU Q, SHEN C, WANG P, et al. Image captioning and visual question answering based on attributes and external knowledge[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6): 1367-1381.
[21] MIKOLOV T, KARAFIáT M, BURGET L, et al. Recurrent neural network based language model[C]//Proceedings of the 11th Annual Conference of the International Speech Communication Association, Makuhari, Sep 26-30, 2010: 1045-1048.
[22] YANG J W, LU J S, LEE S, et al. Graph R-CNN for scene graph generation[C]//LNCS 11205: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 690-706.
[23] LI Y K, OUYANG W L, ZHOU B L, et al. Scene graph generation from objects, phrases and region captions[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 1270-1279.
[24] TANG K H, ZHANG H W, WU B Y, et al. Learning to compose dynamic tree structures for visual contexts[J]. arXiv: 1812.01880, 2018.
[25] LIN X, DING C X, ZENG J Q, et al. GPS-Net: graph property sensing network for scene graph generation[J]. arXiv:2003. 12962, 2020.
[26] YU R C, LI A, MORARIU V I, et al. Visual relationship detection with internal and external linguistic knowledge distillation[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 1068-1076.
[27] ZAREIAN A, KARAMAN S, CHANG S F. Bridging know-ledge graphs to generate scene graphs[J]. arXiv:2001.02314, 2020.
[28] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of the 28th Annual Conference on Neural Information Processing Systems 2015, Montreal, Dec 7-12, 2015. Red Hook: Curran Associates, 2015: 91-99.
[29] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[30] CHO K, VAN MERRI?NBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv:1406.1078, 2014.
[31] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6517-6525.
[32] RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. arXiv:1409.0575, 2014.
[33] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409. 1556, 2014.
[34] NEWELL A, DENG J. Pixels to graphs by associative embedding[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2017, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 2171-2180. |