计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (2): 323-336.DOI: 10.3778/j.issn.1673-9418.2106004
+ E-mail: zhaozengshun@163.com作者简介:
RUAN Chenzhao, ZHANG Xiangsen, LIU Ke, ZHAO Zengshun+()
About author:
RUAN Chenzhao, born in 1996, M.S. candidate. His research interests include computer vision and image processing.Supported by:
阮晨钊, 张祥森, 刘科, 赵增顺. 深度学习的人-物体交互检测研究进展[J]. 计算机科学与探索, 2022, 16(2): 323-336.
RUAN Chenzhao, ZHANG Xiangsen, LIU Ke, ZHAO Zengshun. Progress on Human-Object Interaction Detection of Deep Learning[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(2): 323-336.
分类 | 子类 | 代表性工作 | 优点 | 局限 | 适用场景 |
两阶段方法 | 融入注意力 | BAR-CNN[ ICAN[ Wang[ | 能够有效提取上下文信息,准确率相较HO-RCNN有很大提高 | 除视觉信息和空间信息外并没有额外信息的引入,准确率有待提高 | 适用于训练样本充足,硬件算力高,对实时性要求较低的场景 |
融入图模型 | GPNN[ Wu[ VS-GATs[ VSGNet [ SAG [ DRG [ | 可同时预测图像中的所有交互对,能够消除配对歧义 | 鲜有视觉和空间信息外的额外信息的引入来帮助构建图模型,对硬件要求高 | ||
融入身体部位和姿态 | TIN[ PMFNet[ RPNN[ PFNet[ MLCNet[ PMN[ | 有效整合人的身体姿势或身体部分信息,准确率相对较高 | 计算量大且费时,对硬件要求高 | ||
一阶段方法 | — | PPDM[ IP-Net[ UnionDet[ AS-Net[ | 检测速度快,准确率高,易于部署 | 模型的构建与训练较为复杂 | 适用于对实时性、准确率要求较高的场景 |
表1 不同HOI检测方法比较
Table 1 Comparison of different HOI detection methods
分类 | 子类 | 代表性工作 | 优点 | 局限 | 适用场景 |
两阶段方法 | 融入注意力 | BAR-CNN[ ICAN[ Wang[ | 能够有效提取上下文信息,准确率相较HO-RCNN有很大提高 | 除视觉信息和空间信息外并没有额外信息的引入,准确率有待提高 | 适用于训练样本充足,硬件算力高,对实时性要求较低的场景 |
融入图模型 | GPNN[ Wu[ VS-GATs[ VSGNet [ SAG [ DRG [ | 可同时预测图像中的所有交互对,能够消除配对歧义 | 鲜有视觉和空间信息外的额外信息的引入来帮助构建图模型,对硬件要求高 | ||
融入身体部位和姿态 | TIN[ PMFNet[ RPNN[ PFNet[ MLCNet[ PMN[ | 有效整合人的身体姿势或身体部分信息,准确率相对较高 | 计算量大且费时,对硬件要求高 | ||
一阶段方法 | — | PPDM[ IP-Net[ UnionDet[ AS-Net[ | 检测速度快,准确率高,易于部署 | 模型的构建与训练较为复杂 | 适用于对实时性、准确率要求较高的场景 |
真实情况 | 预测结果 | |
Positive | Negative | |
True | TP(真正例) | FN(假反例) |
False | FP(假正例) | TN(真反例) |
表2 混淆矩阵
Table 2 Confusion matrix
真实情况 | 预测结果 | |
Positive | Negative | |
True | TP(真正例) | FN(假反例) |
False | FP(假正例) | TN(真反例) |
Method | Backbone | mAP/% |
Gupta[ | ResNet-50-FPN | 31.8 |
InteractNet[ | ResNet-50-FPN | 40.0 |
BAR-CNN[ | Inception-ResNet | 41.1 |
iCAN[ | ResNet-50 | 45.3 |
Wang[ | ResNet-50 | 47.3 |
GPNN[ | Res-DCN-152 | 44.0 |
Wang[ | ResNet-50-FPN | 52.7 |
Wu[ | VGG-16 | 44.6 |
VS-GATs[ | ResNet-50-FPN | 50.6 |
VSGNet[ | ResNet-152 | 51.8 |
DRG[ | ResNet-50-FPN | 51.0 |
TIN[ | ResNet-50 | 47.8 |
PMFNet[ | ResNet-50-FPN | 52.0 |
RPNN[ | ResNet-50 | 47.5 |
PFNet[ | ResNet-50 | 52.8 |
MLCNet[ | ResNet-50-FPN | 55.2 |
VS-GATs+PMN[ | ResNet-50-FPN | 51.8 |
IP-Net[ | Hourglass-104 | 51.0 |
UnionDet[ | ResNet-50-FPN | 47.5 |
AS-Net[ | ResNet-50 | 53.9 |
表3 V-COCO数据集测试结果
Table 3 Results on V-COCO data set
Method | Backbone | mAP/% |
Gupta[ | ResNet-50-FPN | 31.8 |
InteractNet[ | ResNet-50-FPN | 40.0 |
BAR-CNN[ | Inception-ResNet | 41.1 |
iCAN[ | ResNet-50 | 45.3 |
Wang[ | ResNet-50 | 47.3 |
GPNN[ | Res-DCN-152 | 44.0 |
Wang[ | ResNet-50-FPN | 52.7 |
Wu[ | VGG-16 | 44.6 |
VS-GATs[ | ResNet-50-FPN | 50.6 |
VSGNet[ | ResNet-152 | 51.8 |
DRG[ | ResNet-50-FPN | 51.0 |
TIN[ | ResNet-50 | 47.8 |
PMFNet[ | ResNet-50-FPN | 52.0 |
RPNN[ | ResNet-50 | 47.5 |
PFNet[ | ResNet-50 | 52.8 |
MLCNet[ | ResNet-50-FPN | 55.2 |
VS-GATs+PMN[ | ResNet-50-FPN | 51.8 |
IP-Net[ | Hourglass-104 | 51.0 |
UnionDet[ | ResNet-50-FPN | 47.5 |
AS-Net[ | ResNet-50 | 53.9 |
Method | Backbone | Default | Known Object | ||||
full | rare | non-rare | full | rare | non-rare | ||
HO-RCNN[ | CaffeNet | 7.81 | 5.37 | 8.54 | 10.41 | 8.94 | 10.85 |
InteractNet[ | ResNet-50-FPN | 9.94 | 7.16 | 10.77 | — | — | — |
iCAN[ | ResNet-50 | 14.84 | 10.45 | 16.15 | 16.26 | 11.33 | 17.73 |
Wang[ | ResNet-50 | 16.24 | 11.16 | 17.75 | 17.33 | 12.78 | 19.21 |
GPNN[ | Res-DCN-152 | 13.11 | 9.34 | 14.23 | — | — | — |
Wang[ | ResNet-50-FPN | 17.57 | 16.85 | 17.78 | 21.00 | 20.74 | 21.08 |
Wu[ | VGG-16 | 13.55 | 9.62 | 15.20 | — | — | — |
VS-GATs[ | ResNet-50-FPN | 20.27 | 16.03 | 21.54 | — | — | — |
VSGNet[ | ResNet-152 | 19.80 | 16.05 | 20.91 | — | — | — |
SAG[ | ResNet-50-FPN | 18.26 | 13.40 | 19.71 | — | — | — |
DRG[ | ResNet-50-FPN | 19.26 | 17.74 | 19.71 | 23.40 | 21.75 | 23.89 |
TIN[ | ResNet-50 | 17.03 | 13.42 | 18.11 | 19.17 | 15.51 | 20.26 |
PMFNet[ | ResNet-50-FPN | 17.46 | 15.65 | 18.00 | 20.34 | 17.47 | 21.20 |
RPNN[ | ResNet-50 | 17.35 | 12.78 | 18.71 | — | — | — |
PFNet[ | ResNet-50 | 20.05 | 16.66 | 21.07 | 24.01 | 21.09 | 24.89 |
MLCNet[ | ResNet-50-FPN | 17.95 | 16.62 | 18.35 | 22.28 | 20.73 | 22.74 |
VS-GATs+PMN[ | ResNet-50-FPN | 21.21 | 17.60 | 22.29 | — | — | — |
PPDM[ | Hourglass-104 | 21.73 | 13.78 | 24.10 | 24.58 | 16.65 | 26.84 |
IP-Net[ | Hourglass-104 | 19.56 | 12.79 | 21.58 | 22.05 | 15.77 | 23.92 |
UnionDet[ | ResNet-50-FPN | 17.58 | 11.72 | 19.33 | 19.76 | 14.68 | 21.27 |
AS-Net[ | ResNet-50 | 28.87 | 24.25 | 30.25 | 31.74 | 27.07 | 33.14 |
表4 HICO-DET数据集测试结果
Table 4 Results on HICO-DET data set %
Method | Backbone | Default | Known Object | ||||
full | rare | non-rare | full | rare | non-rare | ||
HO-RCNN[ | CaffeNet | 7.81 | 5.37 | 8.54 | 10.41 | 8.94 | 10.85 |
InteractNet[ | ResNet-50-FPN | 9.94 | 7.16 | 10.77 | — | — | — |
iCAN[ | ResNet-50 | 14.84 | 10.45 | 16.15 | 16.26 | 11.33 | 17.73 |
Wang[ | ResNet-50 | 16.24 | 11.16 | 17.75 | 17.33 | 12.78 | 19.21 |
GPNN[ | Res-DCN-152 | 13.11 | 9.34 | 14.23 | — | — | — |
Wang[ | ResNet-50-FPN | 17.57 | 16.85 | 17.78 | 21.00 | 20.74 | 21.08 |
Wu[ | VGG-16 | 13.55 | 9.62 | 15.20 | — | — | — |
VS-GATs[ | ResNet-50-FPN | 20.27 | 16.03 | 21.54 | — | — | — |
VSGNet[ | ResNet-152 | 19.80 | 16.05 | 20.91 | — | — | — |
SAG[ | ResNet-50-FPN | 18.26 | 13.40 | 19.71 | — | — | — |
DRG[ | ResNet-50-FPN | 19.26 | 17.74 | 19.71 | 23.40 | 21.75 | 23.89 |
TIN[ | ResNet-50 | 17.03 | 13.42 | 18.11 | 19.17 | 15.51 | 20.26 |
PMFNet[ | ResNet-50-FPN | 17.46 | 15.65 | 18.00 | 20.34 | 17.47 | 21.20 |
RPNN[ | ResNet-50 | 17.35 | 12.78 | 18.71 | — | — | — |
PFNet[ | ResNet-50 | 20.05 | 16.66 | 21.07 | 24.01 | 21.09 | 24.89 |
MLCNet[ | ResNet-50-FPN | 17.95 | 16.62 | 18.35 | 22.28 | 20.73 | 22.74 |
VS-GATs+PMN[ | ResNet-50-FPN | 21.21 | 17.60 | 22.29 | — | — | — |
PPDM[ | Hourglass-104 | 21.73 | 13.78 | 24.10 | 24.58 | 16.65 | 26.84 |
IP-Net[ | Hourglass-104 | 19.56 | 12.79 | 21.58 | 22.05 | 15.77 | 23.92 |
UnionDet[ | ResNet-50-FPN | 17.58 | 11.72 | 19.33 | 19.76 | 14.68 | 21.27 |
AS-Net[ | ResNet-50 | 28.87 | 24.25 | 30.25 | 31.74 | 27.07 | 33.14 |
[1] | CHAO Y W, WANG Z, HE Y, et al. HICO: a benchmark for recognizing human-object interactions in images[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision, Venice, Dec 11-18, 2015. Washington: IEEE Computer Society, 2015: 1017-1025. |
[2] | 周以重. 人与物体交互行为算法研究与应用[D]. 泉州: 华侨大学, 2019. |
ZHOU Y Z. Investigation and application of human-object interaction detection algorithm[D]. Quanzhou: Huaqiao Uni-versity, 2019. | |
[3] | 惠文珊, 李会军, 陈萌, 等. 基于CNN-LSTM的机器人触觉识别与自适应抓取控制[J]. 仪器仪表学报, 2019, 40(1):211-218. |
HUI W S, LI H J, CHEN M, et al. Robotic tactile recogni-tion and adaptive grasping control based on CNN-LSTM[J]. Chinese Journal of Scientific Instrument, 2019, 40(1):211-218. | |
[4] | DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recog-nition, San Diego, Jun 20-26, 2005. Washington: IEEE Com-puter Society, 2005: 886-893. |
[5] |
LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2):91-110.
[6] | GUPTA A, DAVIS L S. Objects in action: an approach for combining action understanding and object perception[C]// Proceedings of the 2007 IEEE Computer Society Con-ference on Computer Vision and Pattern Recognition, Min-neapolis, Jun 17-22, 2007. Washington: IEEE Computer Society, 2007: 1-8. |
[7] |
GUPTA A, KEMBHAVI A, DAVIS L S. Observing human-object interactions: using spatial and functional compatibility for recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(10):1775-1789.
[8] | YAO B, LI F F. Grouplet: a structured image representation for recognizing human and object interactions[C]//Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, Jun 13-18, 2010. Washington: IEEE Computer Society, 2010: 9-16. |
[9] | YAO B, LI F F. Modeling mutual context of object and human pose in human-object interaction activities[C]//Proceedings of the 2010 IEEE Computer Society Conference on Com-puter Vision and Pattern Recognition, San Francisco, Jun 13-18, 2010. Washington: IEEE Computer Society, 2010: 17-24. |
[10] | YAO B, JIANG X, KHOSLA A, et al. Human action recogni-tion by learning bases of action attributes and parts[C]//Proceedings of the 2011 IEEE International Conference on Computer Vision, Kathmandu, Nov 6-13, 2011. Washington: IEEE Computer Society, 2011: 1331-1338. |
[11] | DELAITRE V, SIVIC J, LAPTEV I. Learning person-object interactions for action recognition in still images[C]//Pro-ceedings of the 25th Annual Conference on Neural Informa-tion Processing Systems, Granada, Dec 12-14, 2011. Red Hook: Curran Associates, 2011: 1503-1511. |
[12] | DESAI C, RAMANAN D. Detecting actions, poses, and objects with relational phraselets[C]//LNCS 7575: Proceedings of the 12th European Conference on Computer Vision, Oct 7-13, 2012. Berlin, Heidelberg: Springer, 2012: 158-172. |
[13] | HU J F, ZHENG W S, LAI J, et al. Recognising human-object interaction via exemplar based modelling[C]//Pro-ceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Dec 1-8, 2013. Washington: IEEE Computer Society, 2013: 3144-3151. |
[14] | GUPTA S, MALIK J. Visual semantic role labeling[J]. arXiv: 1505.04474, 2015. |
[15] | CHAO Y W, LIU Y, LIU X, et al. Learning to detect human-object interactions[C]//Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, Mar 12-15, 2018. Washington: IEEE Computer Society, 2018: 381-389. |
[16] |
REN S Q, HE K M, GIRSHICK R B, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(6):1137-1149.
[17] | GIRSHICK R B, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and seman-tic segmentation[C]//Proceedings of the 27th IEEE Computer Society Conference on Computer Vision and Pattern Recog-nition, Honolulu, Jun 23-28, 2014. Washington: IEEE Computer Society, 2014: 580-587. |
[18] | GIRSHICK R B. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 1440-1448. |
[19] | SUTSKEVER I, VINYALS O, LE Q V. Sequence to se-quence learning with neural networks[C]//Proceedings of the 28th Annual Conference on Neural Information Pro-cessing Systems, Montreal, Dec 8-13, 2014. Red Hook: Curran Associates, 2014: 3104-3112. |
[20] | VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: a neural image caption generator[C]//Proceedings of the 2015 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Wa-shington: IEEE Computer Society, 2015: 3156-3164. |
[21] | CHAN W, JAITLY N, LE Q, et al. Listen, attend and spell: a neural network for large vocabulary conversational speech recognition[C]//Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, Shang-hai, Mar 20-25, 2016. Piscataway: IEEE, 2016: 4960-4964. |
[22] | GKIOXARI G, TOSHEV A, JAITLY N. Chained predictions using convolutional neural networks[C]//LNCS 9908: Pro-ceedings of the 14th European Conference on Computer Vision, Oct 11-14, 2016. Cham: Springer, 2016: 728-743. |
[23] | GEORGIA G, GIRSHICK R B, DOLLÁR P, et al. Detec-ting and recognizing human-object interactions[C]//Procee-dings of the 2018 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 8359-8367. |
[24] | KOLESNIKOV A, KUZNETSOVA A, LAMPERT C H, et al. Detecting visual relationships using box attention[C]//Pro-ceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop, Seoul, Oct 27-28, 2019. Piscataway: IEEE, 2019: 1749-1753. |
[25] | GAO C, ZOU Y, HUANG J B. ICAN: instance-centric attention network for human-object interaction detection[J]. arXiv:1808.10437, 2018. |
[26] | CHERON G, LAPTEV I, SCHMID C. P-CNN: pose-based CNN features for action recognition[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 3218-3226. |
[27] | MALLYA A, LAZEBNIK S. Learning models for actions and person-object interactions with transfer to question answering[C]//LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 414-428. |
[28] | GKIOXARI G, GIRSHICK R B, MALIK J. Contextual action recognition with R*CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 1080-1088. |
[29] | WANG T C, ANWER R M, KHAN M H, et al. Deep contextual attention for human-object interaction detection[C]//Proceedings of the 2019 IEEE International Con-ference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 5694-5702. |
[30] | GILMER J, SCHOENHOLZ S S, RILEY P F, et al. Neural message passing for quantum chemistry[C]//Proceedings of the 34th International Conference on Machine Learning, Sydney, Aug 6-11, 2017. New York: ACM, 2017: 1263-1272. |
[31] | JAIN A, ZAMIR A R, SAVARESE S, et al. Structural-RNN: deep learning on spatio-temporal graphs[C]//Procee-dings of the 29th IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 5308-5317. |
[32] | LI R Y, TAPASWI M, LIAO R J, et al. Situation recogni-tion with graph neural networks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 4183-4192. |
[33] | MARINO K, SALAKHUTDINOV R, GUPTA A. The more you know: using knowledge graphs for image classification[J]. arXiv:1612.04844, 2016. |
[34] | XU D F, ZHU Y K, CHOY C B, et al. Scene graph gene-ration by iterative message passing[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 3097-3106. |
[35] | LIANG X D, SHEN X H, FENG J S, et al. Semantic object parsing with graph LSTM[C]//LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 125-143. |
[36] | YUAN Y, LIANG X D, WANG X L, et al. Temporal dyna-mic graph LSTM for action-driven video object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington:IEEE Computer Society, 2017: 1819-1828. |
[37] | TENEY D, LIU L Q, VAN DEN HENGEL A. Graph-structured representations for visual question answering[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 3233-3241. |
[38] | QI S Y, WANG W G, JIA B X, et al. Learning human-object interactions by graph parsing neural networks[C]//LNCS 11213: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 407-423. |
[39] | KOPPULA H S, SAXENA A. Anticipating human activities using object affordances for reactive robotic response[J]. IEEE Transactions on Pattern Analysis and Machine In-telligence, 2015, 38(1):14-29. |
[40] | WANG H, ZHENG W S, LING Y B. Contextual hetero-geneous graph network for human-object interaction detec-tion[C]//LNCS 12362: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 248-264. |
[41] | 吴伟, 刘泽宇. 基于图的人-物交互识别[J]. 计算机工程与应用, 2021, 57(3):175-181. |
WU W, LIU Z Y. Graph-based human-object interactions recognition[J]. Computer Engineering and Applications, 2021, 57(3):175-181. | |
[42] | LIANG Z J, ROJAS J, LIU J F, et al. Visual-semantic graph attention networks for human-object interaction detection[J]. arXiv:2001.02302, 2020. |
[43] | ULUTAN O, IFTEKHAR A S M, MANJUNATH B S. VSGNet: spatial attention network for detecting human object interactions using graph convolutions[C]//Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 13614-13623. |
[44] | ZHANG F Z, CAMPBELL D, GOULD S. Spatio-attentive graphs for human-object interaction detection[J]. arXiv: 2012.06060, 2020. |
[45] | GAO C, XU J R, ZOU Y L, et al. DRG: dual relation graph for human-object interaction detection[C]//LNCS 12357: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 696-712. |
[46] | FANG H S, CAO J K, TAI Y W, et al. Pairwise body-part attention for recognizing human-object interactions[C]//LNCS 11214: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 51-67. |
[47] | LI Y L, ZHOU S Y, HUANG X J, et al. Transferable interactiveness knowledge for human-object interaction de-tection[C]//Proceedings of the 2019 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 3585-3594. |
[48] | WAN B, ZHOU D S, LIU Y F, et al. Pose-aware multi-level feature network for human object interaction detection[C]//Proceedings of the 2019 IEEE International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 9468-9477. |
[49] | ZHOU P H, CHI M M. Relation parsing neural network for human-object interaction detection[C]//Proceedings of the 2019 IEEE International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 843-851. |
[50] |
LIU H C, MU T J, HUANG X L. Detecting human-object interaction with multi-level pairwise feature network[J]. Computational Visual Media, 2021, 7(2):229-239.
[51] | SUN X, HU X W, REN T W, et al. Human object interac-tion detection via multi-level conditioned network[C]//Pro-ceedings of the 2020 International Conference on Multi-media Retrieval, Dublin, Jun 8-11, 2020. New York: ACM, 2020: 26-34. |
[52] | LIANG Z J, LIU J F, GUAN Y S, et al. Pose-based modular network for human-object interaction detection[J]. arXiv: 2008.02042, 2020. |
[53] | LIAO Y, LIU S, WANG F, et al. PPDM: parallel point detection and matching for real-time human-object interac-tion detection[C]//Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Washington: IEEE Computer Society, 2020: 479-487. |
[54] | WANG T, YANG T, MARTIN D, et al. Learning human-object interaction detection using interaction points[C]//Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Washington: IEEE Computer Society, 2020: 4116-4125. |
[55] | KIM B, CHOI T, KANG J, et al. UnionDet: union-level detector towards real-time human-object interaction detec-tion[C]//LNCS 12360: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 498-514. |
[56] | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37. |
[57] | LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 2999-3007. |
[58] | ZHOU P, NI B, GENG C, et al. Scale-transferrable object detection[C]//Proceedings of the 2018 IEEE Computer Society Conference on Computer Vision and Pattern Recog-nition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 528-537. |
[59] | CHEN M, LIAO Y, LIU S, et al. Reformulating HOI detection as adaptive set prediction[J]. arXiv:2103.05983, 2021. |
[60] | LIN T Y, MAIRE M, BELONGIE S J, et al. Microsoft COCO: common objects in context[C]//LNCS 8693: Procee-dings of the 13th European Conference on Computer Vision, Zurich, Sep 6-12, 2014. Cham: Springer, 2014: 740-755. |
[61] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washing-ton: IEEE Computer Society, 2016: 770-778. |
[62] | LIN T Y, DOLLÁR P, GIRSHICK R B, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 936-944. |
[63] | DAI J F, QI H Z, XIONG Y W, et al. Deformable convolu-tional networks[C]//Proceedings of the 2017 IEEE Interna-tional Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 764-773. |
[64] | JIA Y Q, SHELHAMER E, DONAHUE J, et al. Caffe: convolutional architecture for fast feature embedding[C]//Proceedings of the 2014 ACM Conference on Multimedia, Orlando, Nov 3-7, 2014. New York: ACM, 2014: 675-678. |
[65] | NEWELL A, YANG K Y, JIA D. Stacked hourglass net-works for human pose estimation[C]//LNCS 9912: Procee-dings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 483-499. |
[66] | SHEN L Y, YEUNG S, HOFFMAN J, et al. Scaling human-object interaction recognition through zero-shot learning[C]//Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, Mar 12-15, 2018. Washington: IEEE Computer Society, 2018: 1568-1576. |
[67] |
JI Z, LIU X Y, PANG Y W, et al. Few-shot human-object interaction recognition with semantic-guided attentive proto-types network[J]. IEEE Transactions on Image Processing, 2020, 30:1648-1661.
[68] |
LIU X Y, JI Z, PANG Y W, et al. DGIG-Net: dynamic graph-in-graph networks for few-shot human-object interac-tion[J]. IEEE Transactions on Cybernetics, 2021: 1-13. DOI: 10.1109/TCYB.2021.3049537.
[1] | 安凤平, 李晓薇, 曹翔. 权重初始化-滑动窗口CNN的医学图像分类[J]. 计算机科学与探索, 2022, 16(8): 1885-1897. |
[2] | 曾凡智, 许露倩, 周燕, 周月霞, 廖俊玮. 面向智慧教育的知识追踪模型研究综述[J]. 计算机科学与探索, 2022, 16(8): 1742-1763. |
[3] | 刘艺, 李蒙蒙, 郑奇斌, 秦伟, 任小广. 视频目标跟踪算法综述[J]. 计算机科学与探索, 2022, 16(7): 1504-1515. |
[4] | 赵小明, 杨轶娇, 张石清. 面向深度学习的多模态情感识别研究进展[J]. 计算机科学与探索, 2022, 16(7): 1479-1503. |
[5] | 夏鸿斌, 肖奕飞, 刘渊. 融合自注意力机制的长文本生成对抗网络模型[J]. 计算机科学与探索, 2022, 16(7): 1603-1610. |
[6] | 彭豪, 李晓明. 多尺度选择金字塔网络的小样本目标检测算法[J]. 计算机科学与探索, 2022, 16(7): 1649-1660. |
[7] | 张好聪, 李涛, 邢立冬, 潘风蕊. OpenVX特征抽取函数在可编程并行架构的实现[J]. 计算机科学与探索, 2022, 16(7): 1583-1593. |
[8] | 孙方伟, 李承阳, 谢永强, 李忠博, 杨才东, 齐锦. 深度学习应用于遮挡目标检测算法综述[J]. 计算机科学与探索, 2022, 16(6): 1243-1259. |
[9] | 刘雅芬, 郑艺峰, 江铃燚, 李国和, 张文杰. 深度半监督学习中伪标签方法综述[J]. 计算机科学与探索, 2022, 16(6): 1279-1290. |
[10] | 董文轩, 梁宏涛, 刘国柱, 胡强, 于旭. 深度卷积应用于目标检测算法综述[J]. 计算机科学与探索, 2022, 16(5): 1025-1042. |
[11] | 程卫月, 张雪琴, 林克正, 李骜. 融合全局与局部特征的深度卷积神经网络算法[J]. 计算机科学与探索, 2022, 16(5): 1146-1154. |
[12] | 钟梦圆, 姜麟. 超分辨率图像重建算法综述[J]. 计算机科学与探索, 2022, 16(5): 972-990. |
[13] | 伏轩仪, 张銮景, 梁文科, 毕方明, 房卫东. 锚点机制在目标检测领域的发展综述[J]. 计算机科学与探索, 2022, 16(4): 791-805. |
[14] | 赵鹏飞, 谢林柏, 彭力. 融合注意力机制的深层次小目标检测算法[J]. 计算机科学与探索, 2022, 16(4): 927-937. |
[15] | 裴利沈, 赵雪专. 群体行为识别深度学习方法研究综述[J]. 计算机科学与探索, 2022, 16(4): 775-790. |
阅读次数 | ||||||||||||||||||||||||||||||||||||||||||||||||||
全文 481
摘要 565