Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (2): 323-336.DOI: 10.3778/j.issn.1673-9418.2106004
• Surveys and Frontiers • Previous Articles Next Articles
RUAN Chenzhao, ZHANG Xiangsen, LIU Ke, ZHAO Zengshun+()
Received:
2021-06-01
Revised:
2021-08-06
Online:
2022-02-01
Published:
2021-08-19
About author:
RUAN Chenzhao, born in 1996, M.S. candidate. His research interests include computer vision and image processing.Supported by:
通讯作者:
+ E-mail: zhaozengshun@163.com作者简介:
阮晨钊(1996—),男,山东淄博人,硕士研究生,主要研究方向为计算机视觉、图像处理。基金资助:
CLC Number:
RUAN Chenzhao, ZHANG Xiangsen, LIU Ke, ZHAO Zengshun. Progress on Human-Object Interaction Detection of Deep Learning[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(2): 323-336.
阮晨钊, 张祥森, 刘科, 赵增顺. 深度学习的人-物体交互检测研究进展[J]. 计算机科学与探索, 2022, 16(2): 323-336.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2106004
分类 | 子类 | 代表性工作 | 优点 | 局限 | 适用场景 |
---|---|---|---|---|---|
两阶段方法 | 融入注意力 | BAR-CNN[ ICAN[ Wang[ | 能够有效提取上下文信息,准确率相较HO-RCNN有很大提高 | 除视觉信息和空间信息外并没有额外信息的引入,准确率有待提高 | 适用于训练样本充足,硬件算力高,对实时性要求较低的场景 |
融入图模型 | GPNN[ Wu[ VS-GATs[ VSGNet [ SAG [ DRG [ | 可同时预测图像中的所有交互对,能够消除配对歧义 | 鲜有视觉和空间信息外的额外信息的引入来帮助构建图模型,对硬件要求高 | ||
融入身体部位和姿态 | TIN[ PMFNet[ RPNN[ PFNet[ MLCNet[ PMN[ | 有效整合人的身体姿势或身体部分信息,准确率相对较高 | 计算量大且费时,对硬件要求高 | ||
一阶段方法 | — | PPDM[ IP-Net[ UnionDet[ AS-Net[ | 检测速度快,准确率高,易于部署 | 模型的构建与训练较为复杂 | 适用于对实时性、准确率要求较高的场景 |
Table 1 Comparison of different HOI detection methods
分类 | 子类 | 代表性工作 | 优点 | 局限 | 适用场景 |
---|---|---|---|---|---|
两阶段方法 | 融入注意力 | BAR-CNN[ ICAN[ Wang[ | 能够有效提取上下文信息,准确率相较HO-RCNN有很大提高 | 除视觉信息和空间信息外并没有额外信息的引入,准确率有待提高 | 适用于训练样本充足,硬件算力高,对实时性要求较低的场景 |
融入图模型 | GPNN[ Wu[ VS-GATs[ VSGNet [ SAG [ DRG [ | 可同时预测图像中的所有交互对,能够消除配对歧义 | 鲜有视觉和空间信息外的额外信息的引入来帮助构建图模型,对硬件要求高 | ||
融入身体部位和姿态 | TIN[ PMFNet[ RPNN[ PFNet[ MLCNet[ PMN[ | 有效整合人的身体姿势或身体部分信息,准确率相对较高 | 计算量大且费时,对硬件要求高 | ||
一阶段方法 | — | PPDM[ IP-Net[ UnionDet[ AS-Net[ | 检测速度快,准确率高,易于部署 | 模型的构建与训练较为复杂 | 适用于对实时性、准确率要求较高的场景 |
真实情况 | 预测结果 | |
---|---|---|
Positive | Negative | |
True | TP(真正例) | FN(假反例) |
False | FP(假正例) | TN(真反例) |
Table 2 Confusion matrix
真实情况 | 预测结果 | |
---|---|---|
Positive | Negative | |
True | TP(真正例) | FN(假反例) |
False | FP(假正例) | TN(真反例) |
Method | Backbone | mAP/% |
---|---|---|
Gupta[ | ResNet-50-FPN | 31.8 |
InteractNet[ | ResNet-50-FPN | 40.0 |
BAR-CNN[ | Inception-ResNet | 41.1 |
iCAN[ | ResNet-50 | 45.3 |
Wang[ | ResNet-50 | 47.3 |
GPNN[ | Res-DCN-152 | 44.0 |
Wang[ | ResNet-50-FPN | 52.7 |
Wu[ | VGG-16 | 44.6 |
VS-GATs[ | ResNet-50-FPN | 50.6 |
VSGNet[ | ResNet-152 | 51.8 |
DRG[ | ResNet-50-FPN | 51.0 |
TIN[ | ResNet-50 | 47.8 |
PMFNet[ | ResNet-50-FPN | 52.0 |
RPNN[ | ResNet-50 | 47.5 |
PFNet[ | ResNet-50 | 52.8 |
MLCNet[ | ResNet-50-FPN | 55.2 |
VS-GATs+PMN[ | ResNet-50-FPN | 51.8 |
IP-Net[ | Hourglass-104 | 51.0 |
UnionDet[ | ResNet-50-FPN | 47.5 |
AS-Net[ | ResNet-50 | 53.9 |
Table 3 Results on V-COCO data set
Method | Backbone | mAP/% |
---|---|---|
Gupta[ | ResNet-50-FPN | 31.8 |
InteractNet[ | ResNet-50-FPN | 40.0 |
BAR-CNN[ | Inception-ResNet | 41.1 |
iCAN[ | ResNet-50 | 45.3 |
Wang[ | ResNet-50 | 47.3 |
GPNN[ | Res-DCN-152 | 44.0 |
Wang[ | ResNet-50-FPN | 52.7 |
Wu[ | VGG-16 | 44.6 |
VS-GATs[ | ResNet-50-FPN | 50.6 |
VSGNet[ | ResNet-152 | 51.8 |
DRG[ | ResNet-50-FPN | 51.0 |
TIN[ | ResNet-50 | 47.8 |
PMFNet[ | ResNet-50-FPN | 52.0 |
RPNN[ | ResNet-50 | 47.5 |
PFNet[ | ResNet-50 | 52.8 |
MLCNet[ | ResNet-50-FPN | 55.2 |
VS-GATs+PMN[ | ResNet-50-FPN | 51.8 |
IP-Net[ | Hourglass-104 | 51.0 |
UnionDet[ | ResNet-50-FPN | 47.5 |
AS-Net[ | ResNet-50 | 53.9 |
Method | Backbone | Default | Known Object | ||||
---|---|---|---|---|---|---|---|
full | rare | non-rare | full | rare | non-rare | ||
HO-RCNN[ | CaffeNet | 7.81 | 5.37 | 8.54 | 10.41 | 8.94 | 10.85 |
InteractNet[ | ResNet-50-FPN | 9.94 | 7.16 | 10.77 | — | — | — |
iCAN[ | ResNet-50 | 14.84 | 10.45 | 16.15 | 16.26 | 11.33 | 17.73 |
Wang[ | ResNet-50 | 16.24 | 11.16 | 17.75 | 17.33 | 12.78 | 19.21 |
GPNN[ | Res-DCN-152 | 13.11 | 9.34 | 14.23 | — | — | — |
Wang[ | ResNet-50-FPN | 17.57 | 16.85 | 17.78 | 21.00 | 20.74 | 21.08 |
Wu[ | VGG-16 | 13.55 | 9.62 | 15.20 | — | — | — |
VS-GATs[ | ResNet-50-FPN | 20.27 | 16.03 | 21.54 | — | — | — |
VSGNet[ | ResNet-152 | 19.80 | 16.05 | 20.91 | — | — | — |
SAG[ | ResNet-50-FPN | 18.26 | 13.40 | 19.71 | — | — | — |
DRG[ | ResNet-50-FPN | 19.26 | 17.74 | 19.71 | 23.40 | 21.75 | 23.89 |
TIN[ | ResNet-50 | 17.03 | 13.42 | 18.11 | 19.17 | 15.51 | 20.26 |
PMFNet[ | ResNet-50-FPN | 17.46 | 15.65 | 18.00 | 20.34 | 17.47 | 21.20 |
RPNN[ | ResNet-50 | 17.35 | 12.78 | 18.71 | — | — | — |
PFNet[ | ResNet-50 | 20.05 | 16.66 | 21.07 | 24.01 | 21.09 | 24.89 |
MLCNet[ | ResNet-50-FPN | 17.95 | 16.62 | 18.35 | 22.28 | 20.73 | 22.74 |
VS-GATs+PMN[ | ResNet-50-FPN | 21.21 | 17.60 | 22.29 | — | — | — |
PPDM[ | Hourglass-104 | 21.73 | 13.78 | 24.10 | 24.58 | 16.65 | 26.84 |
IP-Net[ | Hourglass-104 | 19.56 | 12.79 | 21.58 | 22.05 | 15.77 | 23.92 |
UnionDet[ | ResNet-50-FPN | 17.58 | 11.72 | 19.33 | 19.76 | 14.68 | 21.27 |
AS-Net[ | ResNet-50 | 28.87 | 24.25 | 30.25 | 31.74 | 27.07 | 33.14 |
Table 4 Results on HICO-DET data set %
Method | Backbone | Default | Known Object | ||||
---|---|---|---|---|---|---|---|
full | rare | non-rare | full | rare | non-rare | ||
HO-RCNN[ | CaffeNet | 7.81 | 5.37 | 8.54 | 10.41 | 8.94 | 10.85 |
InteractNet[ | ResNet-50-FPN | 9.94 | 7.16 | 10.77 | — | — | — |
iCAN[ | ResNet-50 | 14.84 | 10.45 | 16.15 | 16.26 | 11.33 | 17.73 |
Wang[ | ResNet-50 | 16.24 | 11.16 | 17.75 | 17.33 | 12.78 | 19.21 |
GPNN[ | Res-DCN-152 | 13.11 | 9.34 | 14.23 | — | — | — |
Wang[ | ResNet-50-FPN | 17.57 | 16.85 | 17.78 | 21.00 | 20.74 | 21.08 |
Wu[ | VGG-16 | 13.55 | 9.62 | 15.20 | — | — | — |
VS-GATs[ | ResNet-50-FPN | 20.27 | 16.03 | 21.54 | — | — | — |
VSGNet[ | ResNet-152 | 19.80 | 16.05 | 20.91 | — | — | — |
SAG[ | ResNet-50-FPN | 18.26 | 13.40 | 19.71 | — | — | — |
DRG[ | ResNet-50-FPN | 19.26 | 17.74 | 19.71 | 23.40 | 21.75 | 23.89 |
TIN[ | ResNet-50 | 17.03 | 13.42 | 18.11 | 19.17 | 15.51 | 20.26 |
PMFNet[ | ResNet-50-FPN | 17.46 | 15.65 | 18.00 | 20.34 | 17.47 | 21.20 |
RPNN[ | ResNet-50 | 17.35 | 12.78 | 18.71 | — | — | — |
PFNet[ | ResNet-50 | 20.05 | 16.66 | 21.07 | 24.01 | 21.09 | 24.89 |
MLCNet[ | ResNet-50-FPN | 17.95 | 16.62 | 18.35 | 22.28 | 20.73 | 22.74 |
VS-GATs+PMN[ | ResNet-50-FPN | 21.21 | 17.60 | 22.29 | — | — | — |
PPDM[ | Hourglass-104 | 21.73 | 13.78 | 24.10 | 24.58 | 16.65 | 26.84 |
IP-Net[ | Hourglass-104 | 19.56 | 12.79 | 21.58 | 22.05 | 15.77 | 23.92 |
UnionDet[ | ResNet-50-FPN | 17.58 | 11.72 | 19.33 | 19.76 | 14.68 | 21.27 |
AS-Net[ | ResNet-50 | 28.87 | 24.25 | 30.25 | 31.74 | 27.07 | 33.14 |
[1] | CHAO Y W, WANG Z, HE Y, et al. HICO: a benchmark for recognizing human-object interactions in images[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision, Venice, Dec 11-18, 2015. Washington: IEEE Computer Society, 2015: 1017-1025. |
[2] | 周以重. 人与物体交互行为算法研究与应用[D]. 泉州: 华侨大学, 2019. |
ZHOU Y Z. Investigation and application of human-object interaction detection algorithm[D]. Quanzhou: Huaqiao Uni-versity, 2019. | |
[3] | 惠文珊, 李会军, 陈萌, 等. 基于CNN-LSTM的机器人触觉识别与自适应抓取控制[J]. 仪器仪表学报, 2019, 40(1):211-218. |
HUI W S, LI H J, CHEN M, et al. Robotic tactile recogni-tion and adaptive grasping control based on CNN-LSTM[J]. Chinese Journal of Scientific Instrument, 2019, 40(1):211-218. | |
[4] | DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recog-nition, San Diego, Jun 20-26, 2005. Washington: IEEE Com-puter Society, 2005: 886-893. |
[5] |
LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2):91-110.
DOI URL |
[6] | GUPTA A, DAVIS L S. Objects in action: an approach for combining action understanding and object perception[C]// Proceedings of the 2007 IEEE Computer Society Con-ference on Computer Vision and Pattern Recognition, Min-neapolis, Jun 17-22, 2007. Washington: IEEE Computer Society, 2007: 1-8. |
[7] |
GUPTA A, KEMBHAVI A, DAVIS L S. Observing human-object interactions: using spatial and functional compatibility for recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(10):1775-1789.
DOI URL |
[8] | YAO B, LI F F. Grouplet: a structured image representation for recognizing human and object interactions[C]//Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, Jun 13-18, 2010. Washington: IEEE Computer Society, 2010: 9-16. |
[9] | YAO B, LI F F. Modeling mutual context of object and human pose in human-object interaction activities[C]//Proceedings of the 2010 IEEE Computer Society Conference on Com-puter Vision and Pattern Recognition, San Francisco, Jun 13-18, 2010. Washington: IEEE Computer Society, 2010: 17-24. |
[10] | YAO B, JIANG X, KHOSLA A, et al. Human action recogni-tion by learning bases of action attributes and parts[C]//Proceedings of the 2011 IEEE International Conference on Computer Vision, Kathmandu, Nov 6-13, 2011. Washington: IEEE Computer Society, 2011: 1331-1338. |
[11] | DELAITRE V, SIVIC J, LAPTEV I. Learning person-object interactions for action recognition in still images[C]//Pro-ceedings of the 25th Annual Conference on Neural Informa-tion Processing Systems, Granada, Dec 12-14, 2011. Red Hook: Curran Associates, 2011: 1503-1511. |
[12] | DESAI C, RAMANAN D. Detecting actions, poses, and objects with relational phraselets[C]//LNCS 7575: Proceedings of the 12th European Conference on Computer Vision, Oct 7-13, 2012. Berlin, Heidelberg: Springer, 2012: 158-172. |
[13] | HU J F, ZHENG W S, LAI J, et al. Recognising human-object interaction via exemplar based modelling[C]//Pro-ceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Dec 1-8, 2013. Washington: IEEE Computer Society, 2013: 3144-3151. |
[14] | GUPTA S, MALIK J. Visual semantic role labeling[J]. arXiv: 1505.04474, 2015. |
[15] | CHAO Y W, LIU Y, LIU X, et al. Learning to detect human-object interactions[C]//Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, Mar 12-15, 2018. Washington: IEEE Computer Society, 2018: 381-389. |
[16] |
REN S Q, HE K M, GIRSHICK R B, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(6):1137-1149.
DOI URL |
[17] | GIRSHICK R B, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and seman-tic segmentation[C]//Proceedings of the 27th IEEE Computer Society Conference on Computer Vision and Pattern Recog-nition, Honolulu, Jun 23-28, 2014. Washington: IEEE Computer Society, 2014: 580-587. |
[18] | GIRSHICK R B. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 1440-1448. |
[19] | SUTSKEVER I, VINYALS O, LE Q V. Sequence to se-quence learning with neural networks[C]//Proceedings of the 28th Annual Conference on Neural Information Pro-cessing Systems, Montreal, Dec 8-13, 2014. Red Hook: Curran Associates, 2014: 3104-3112. |
[20] | VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: a neural image caption generator[C]//Proceedings of the 2015 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Wa-shington: IEEE Computer Society, 2015: 3156-3164. |
[21] | CHAN W, JAITLY N, LE Q, et al. Listen, attend and spell: a neural network for large vocabulary conversational speech recognition[C]//Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, Shang-hai, Mar 20-25, 2016. Piscataway: IEEE, 2016: 4960-4964. |
[22] | GKIOXARI G, TOSHEV A, JAITLY N. Chained predictions using convolutional neural networks[C]//LNCS 9908: Pro-ceedings of the 14th European Conference on Computer Vision, Oct 11-14, 2016. Cham: Springer, 2016: 728-743. |
[23] | GEORGIA G, GIRSHICK R B, DOLLÁR P, et al. Detec-ting and recognizing human-object interactions[C]//Procee-dings of the 2018 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 8359-8367. |
[24] | KOLESNIKOV A, KUZNETSOVA A, LAMPERT C H, et al. Detecting visual relationships using box attention[C]//Pro-ceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop, Seoul, Oct 27-28, 2019. Piscataway: IEEE, 2019: 1749-1753. |
[25] | GAO C, ZOU Y, HUANG J B. ICAN: instance-centric attention network for human-object interaction detection[J]. arXiv:1808.10437, 2018. |
[26] | CHERON G, LAPTEV I, SCHMID C. P-CNN: pose-based CNN features for action recognition[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 3218-3226. |
[27] | MALLYA A, LAZEBNIK S. Learning models for actions and person-object interactions with transfer to question answering[C]//LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 414-428. |
[28] | GKIOXARI G, GIRSHICK R B, MALIK J. Contextual action recognition with R*CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 1080-1088. |
[29] | WANG T C, ANWER R M, KHAN M H, et al. Deep contextual attention for human-object interaction detection[C]//Proceedings of the 2019 IEEE International Con-ference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 5694-5702. |
[30] | GILMER J, SCHOENHOLZ S S, RILEY P F, et al. Neural message passing for quantum chemistry[C]//Proceedings of the 34th International Conference on Machine Learning, Sydney, Aug 6-11, 2017. New York: ACM, 2017: 1263-1272. |
[31] | JAIN A, ZAMIR A R, SAVARESE S, et al. Structural-RNN: deep learning on spatio-temporal graphs[C]//Procee-dings of the 29th IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 5308-5317. |
[32] | LI R Y, TAPASWI M, LIAO R J, et al. Situation recogni-tion with graph neural networks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 4183-4192. |
[33] | MARINO K, SALAKHUTDINOV R, GUPTA A. The more you know: using knowledge graphs for image classification[J]. arXiv:1612.04844, 2016. |
[34] | XU D F, ZHU Y K, CHOY C B, et al. Scene graph gene-ration by iterative message passing[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 3097-3106. |
[35] | LIANG X D, SHEN X H, FENG J S, et al. Semantic object parsing with graph LSTM[C]//LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 125-143. |
[36] | YUAN Y, LIANG X D, WANG X L, et al. Temporal dyna-mic graph LSTM for action-driven video object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington:IEEE Computer Society, 2017: 1819-1828. |
[37] | TENEY D, LIU L Q, VAN DEN HENGEL A. Graph-structured representations for visual question answering[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 3233-3241. |
[38] | QI S Y, WANG W G, JIA B X, et al. Learning human-object interactions by graph parsing neural networks[C]//LNCS 11213: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 407-423. |
[39] | KOPPULA H S, SAXENA A. Anticipating human activities using object affordances for reactive robotic response[J]. IEEE Transactions on Pattern Analysis and Machine In-telligence, 2015, 38(1):14-29. |
[40] | WANG H, ZHENG W S, LING Y B. Contextual hetero-geneous graph network for human-object interaction detec-tion[C]//LNCS 12362: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 248-264. |
[41] | 吴伟, 刘泽宇. 基于图的人-物交互识别[J]. 计算机工程与应用, 2021, 57(3):175-181. |
WU W, LIU Z Y. Graph-based human-object interactions recognition[J]. Computer Engineering and Applications, 2021, 57(3):175-181. | |
[42] | LIANG Z J, ROJAS J, LIU J F, et al. Visual-semantic graph attention networks for human-object interaction detection[J]. arXiv:2001.02302, 2020. |
[43] | ULUTAN O, IFTEKHAR A S M, MANJUNATH B S. VSGNet: spatial attention network for detecting human object interactions using graph convolutions[C]//Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 13614-13623. |
[44] | ZHANG F Z, CAMPBELL D, GOULD S. Spatio-attentive graphs for human-object interaction detection[J]. arXiv: 2012.06060, 2020. |
[45] | GAO C, XU J R, ZOU Y L, et al. DRG: dual relation graph for human-object interaction detection[C]//LNCS 12357: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 696-712. |
[46] | FANG H S, CAO J K, TAI Y W, et al. Pairwise body-part attention for recognizing human-object interactions[C]//LNCS 11214: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 51-67. |
[47] | LI Y L, ZHOU S Y, HUANG X J, et al. Transferable interactiveness knowledge for human-object interaction de-tection[C]//Proceedings of the 2019 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 3585-3594. |
[48] | WAN B, ZHOU D S, LIU Y F, et al. Pose-aware multi-level feature network for human object interaction detection[C]//Proceedings of the 2019 IEEE International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 9468-9477. |
[49] | ZHOU P H, CHI M M. Relation parsing neural network for human-object interaction detection[C]//Proceedings of the 2019 IEEE International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 843-851. |
[50] |
LIU H C, MU T J, HUANG X L. Detecting human-object interaction with multi-level pairwise feature network[J]. Computational Visual Media, 2021, 7(2):229-239.
DOI URL |
[51] | SUN X, HU X W, REN T W, et al. Human object interac-tion detection via multi-level conditioned network[C]//Pro-ceedings of the 2020 International Conference on Multi-media Retrieval, Dublin, Jun 8-11, 2020. New York: ACM, 2020: 26-34. |
[52] | LIANG Z J, LIU J F, GUAN Y S, et al. Pose-based modular network for human-object interaction detection[J]. arXiv: 2008.02042, 2020. |
[53] | LIAO Y, LIU S, WANG F, et al. PPDM: parallel point detection and matching for real-time human-object interac-tion detection[C]//Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Washington: IEEE Computer Society, 2020: 479-487. |
[54] | WANG T, YANG T, MARTIN D, et al. Learning human-object interaction detection using interaction points[C]//Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Washington: IEEE Computer Society, 2020: 4116-4125. |
[55] | KIM B, CHOI T, KANG J, et al. UnionDet: union-level detector towards real-time human-object interaction detec-tion[C]//LNCS 12360: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 498-514. |
[56] | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37. |
[57] | LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 2999-3007. |
[58] | ZHOU P, NI B, GENG C, et al. Scale-transferrable object detection[C]//Proceedings of the 2018 IEEE Computer Society Conference on Computer Vision and Pattern Recog-nition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 528-537. |
[59] | CHEN M, LIAO Y, LIU S, et al. Reformulating HOI detection as adaptive set prediction[J]. arXiv:2103.05983, 2021. |
[60] | LIN T Y, MAIRE M, BELONGIE S J, et al. Microsoft COCO: common objects in context[C]//LNCS 8693: Procee-dings of the 13th European Conference on Computer Vision, Zurich, Sep 6-12, 2014. Cham: Springer, 2014: 740-755. |
[61] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washing-ton: IEEE Computer Society, 2016: 770-778. |
[62] | LIN T Y, DOLLÁR P, GIRSHICK R B, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 936-944. |
[63] | DAI J F, QI H Z, XIONG Y W, et al. Deformable convolu-tional networks[C]//Proceedings of the 2017 IEEE Interna-tional Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 764-773. |
[64] | JIA Y Q, SHELHAMER E, DONAHUE J, et al. Caffe: convolutional architecture for fast feature embedding[C]//Proceedings of the 2014 ACM Conference on Multimedia, Orlando, Nov 3-7, 2014. New York: ACM, 2014: 675-678. |
[65] | NEWELL A, YANG K Y, JIA D. Stacked hourglass net-works for human pose estimation[C]//LNCS 9912: Procee-dings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 483-499. |
[66] | SHEN L Y, YEUNG S, HOFFMAN J, et al. Scaling human-object interaction recognition through zero-shot learning[C]//Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, Mar 12-15, 2018. Washington: IEEE Computer Society, 2018: 1568-1576. |
[67] |
JI Z, LIU X Y, PANG Y W, et al. Few-shot human-object interaction recognition with semantic-guided attentive proto-types network[J]. IEEE Transactions on Image Processing, 2020, 30:1648-1661.
DOI URL |
[68] |
LIU X Y, JI Z, PANG Y W, et al. DGIG-Net: dynamic graph-in-graph networks for few-shot human-object interac-tion[J]. IEEE Transactions on Cybernetics, 2021: 1-13. DOI: 10.1109/TCYB.2021.3049537.
DOI |
[1] | AN Fengping, LI Xiaowei, CAO Xiang. Medical Image Classification Algorithm Based on Weight Initialization-Sliding Window CNN [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1885-1897. |
[2] | ZENG Fanzhi, XU Luqian, ZHOU Yan, ZHOU Yuexia, LIAO Junwei. Review of Knowledge Tracing Model for Intelligent Education [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1742-1763. |
[3] | LIU Yi, LI Mengmeng, ZHENG Qibin, QIN Wei, REN Xiaoguang. Survey on Video Object Tracking Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1504-1515. |
[4] | ZHAO Xiaoming, YANG Yijiao, ZHANG Shiqing. Survey of Deep Learning Based Multimodal Emotion Recognition [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1479-1503. |
[5] | XIA Hongbin, XIAO Yifei, LIU Yuan. Long Text Generation Adversarial Network Model with Self-Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1603-1610. |
[6] | ZHANG Haocong, LI Tao, XING Lidong, PAN Fengrui. Parallel Implementation of OpenVX Feature Extraction Functions in Programmable Processing Architecture [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1583-1593. |
[7] | SUN Fangwei, LI Chengyang, XIE Yongqiang, LI Zhongbo, YANG Caidong, QI Jin. Review of Deep Learning Applied to Occluded Object Detection [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1243-1259. |
[8] | LIU Yafen, ZHENG Yifeng, JIANG Lingyi, LI Guohe, ZHANG Wenjie. Survey on Pseudo-Labeling Methods in Deep Semi-supervised Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1279-1290. |
[9] | DONG Wenxuan, LIANG Hongtao, LIU Guozhu, HU Qiang, YU Xu. Review of Deep Convolution Applied to Target Detection Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1025-1042. |
[10] | CHENG Weiyue, ZHANG Xueqin, LIN Kezheng, LI Ao. Deep Convolutional Neural Network Algorithm Fusing Global and Local Features [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1146-1154. |
[11] | ZHONG Mengyuan, JIANG Lin. Review of Super-Resolution Image Reconstruction Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 972-990. |
[12] | ZHAO Pengfei, XIE Linbo, PENG Li. Deep Small Object Detection Algorithm Integrating Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 927-937. |
[13] | XU Jia, WEI Tingting, YU Ge, HUANG Xinyue, LYU Pin. Review of Question Difficulty Evaluation Approaches [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 734-759. |
[14] | PEI Lishen, ZHAO Xuezhuan. Survey of Collective Activity Recognition Based on Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 775-790. |
[15] | FU Xuanyi, ZHANG Luanjing, LIANG Wenke, BI Fangming, FANG Weidong. Review on Development of Anchor Mechanism in Object Detection [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 791-805. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/