Open-World Occluded Object Detection: Multi-level Feature Fusion and Enhanced Feature Representation

doi:10.3778/j.issn.1673-9418.2503066

Abstract

Abstract: Solving the problem of occluded object detection is a challenging task in the research of open-world auto-nomous driving object detection. This paper mainly studies the detection of occluded objects in open-world autonomous driving scenarios, where the diversity of unknown occlusion patterns leads to the sparsity and semantic ambiguity of the features presented by occluded objects in images, as well as the uncertainty of object shapes. By leveraging the information differences in feature maps of different dimensions and the relationships between the whole and parts of the object and within parts, this paper proposes an occluded object detection algorithm based on multi-level feature fusion and enhanced feature representation. Firstly, a co-occurrence spatial graph module is designed to enhance the features within and between parts, improving the feature representation of occluded objects and the semantic recovery ability of occluded regions. Secondly, a cross-level scale feature pyramid network is adopted to enhance the context information and the correlation between features of the core features. Finally, a parallel adaptive feature fusion module is used to fuse features of different levels, alleviating feature loss and improving the performance of object detection. The proposed method is compared with existing detection algorithms on the datasets Pascal VOC, BDD100K, and a self-made dataset. Compared with the Top 2 algorithm, the range of increase in the unknown class recall and mAP is 1.4 to 1.9 percentage points and 0.5 to 1.5 percentage points, respectively (relative increase). This algorithm aims to improve the detection accuracy of occluded objects in open-world autonomous driving scenarios by learning the feature correlations of occluded objects, effectively alleviating the adverse effects of feature sparsity and semantic ambiguity. Experiments verify that this algorithm can effectively improve the detection performance of occluded objects, demonstrating good generalization ability, and enhancing the detection ability of unknown objects.

Key words: open world, occluded object detection, feature fusion, graph convolutional neural network

摘要： 解决遮挡目标检测问题是开放世界自动驾驶目标检测研究中一项充满挑战的任务。主要研究在开放世界自动驾驶场景中，由于未知遮挡模式的多样性，被遮挡目标在图像中呈现的特征稀疏性与语义模糊性以及目标形态的不确定性，利用不同维度特征图的信息差异性以及目标整体与部分间和部分内的关系进行遮挡目标检测。提出了一种基于多层次特征融合和增强特征表示的遮挡目标检测算法。设计了共现空间图模块完成部分内与部分间的特征增强，提高遮挡目标的特征表现，增强遮挡区域的语义恢复能力。采用跨层次尺度的特征金字塔网络增强核心特征的上下文信息和特征之间的关联性。通过并行自适应特征融合模块融合不同层次的特征，缓解特征丢失，提高目标检测性能。所提方法在数据集Pascal VOC、BDD100K和自制数据集上与现有检测算法进行比较，相较Top 2算法，未知类召回率和mAP的增值范围在1.4~1.9个百分点和0.5~1.5个百分点（相对）。该算法旨在提升开放世界自动驾驶场景中遮挡目标的检测精度，通过学习遮挡目标的特征关联性，有效缓解了特征稀疏与语义模糊带来的不利影响。实验验证了该算法可以有效提升遮挡目标的检测性能，展现了很好的泛化能力，并增强了对未知目标的检测能力。

关键词: 开放世界, 遮挡目标检测, 特征融合, 图卷积神经网络

JIANG Yanji, CHEN Pengda, DONG Hao, LIU Daqian, FEI Bowen. Open-World Occluded Object Detection: Multi-level Feature Fusion and Enhanced Feature Representation[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(12): 3303-3318.

姜彦吉, 陈鹏达, 董浩, 刘大千, 费博雯. 开放世界遮挡目标检测：多层次特征融合与增强特征表示[J]. 计算机科学与探索, 2025, 19(12): 3303-3318.

References

[1] YAMADA M, UEDA K, HORIBA I, et al. Discrimination of the road condition toward understanding of vehicle driving environments[J]. IEEE Transactions on Intelligent Transportation Systems, 2001, 2(1): 26-31.
[2] JOSEPH K J, KHAN S, KHAN F S, et al. Towards open world object detection[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 5830-5840.
[3] GUPTA A, NARAYAN S, JOSEPH K J, et al. OW-DETR: open-world detection transformer[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 9225-9234.
[4] ZOHAR O, WANG K C, YEUNG S. PROB: probabilistic objectness for open world object detection[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 11444-11453.
[5] 谢斌红, 张鹏举, 张睿. 结合Graph-FPN与稳健优化的开放世界目标检测[J]. 计算机科学与探索, 2023, 17(12): 2954-2966.
XIE B H, ZHANG P J, ZHANG R. Open world object detection combining graph-FPN and robust optimization[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(12): 2954-2966.
[6] CHENG T H, SONG L, GE Y X, et al. YOLO-world: real-time open-vocabulary object detection[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 16901-16911.
[7] GUO Z H, LIU C, ZHANG X S, et al. Beyond bounding-box: convex-hull feature adaptation for oriented and densely packed object detection[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 8792-8801.
[8] ZHANG K, XIONG F, SUN P Z, et al. Double anchor R-CNN for human detection in a crowd[EB/OL]. [2025-01-18]. https:// arxiv.org/abs/1909.09998.
[9] CHOI H K, PAIK C K, KO H W, et al. Recurrent DETR: transformer-based object detection for crowded scenes[J]. IEEE Access, 2023, 11: 78623-78643.
[10] ZHANG S F, WEN L Y, BIAN X, et al. Occlusion-aware R-CNN: detecting pedestrians in a crowd[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 657-674.
[11] XIE J, PANG Y W, CHOLAKKAL H, et al. PSC-Net: learning part spatial co-occurrence for occluded pedestrian detection[J]. Science China Information Sciences, 2021, 64(2): 120103.
[12] ZHANG M, GUO Y N, WANG H D, et al. AODGCN: adaptive object detection with attention-guided dynamic graph convolutional network[J]. Computer Vision and Image Under-standing, 2025, 258: 104386.
[13] CHEN P Y, WANG Y H, LIU H W. GCN-YOLO: YOLO based on graph convolutional network for SAR vehicle target detection[J]. IEEE Geoscience and Remote Sensing Letters, 2024, 21: 4013005.
[14] 姜彦吉, 冯宇宙, 董浩, 等. 自动驾驶场景类间相似特征自适应分类网络[J]. 计算机科学与探索, 2024, 18(11): 3051-3064.
JIANG Y J, FENG Y Z, DONG H, et al. Adaptive classification network for similar features between classes in automatic driving scenarios[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(11): 3051-3064.
[15] HUANG J, LI T R. Small object detection by DETR via information augmentation and adaptive feature fusion[C]//Proceedings of the 2024 ACM ICMR Workshop on Multimodal Video Retrieval. New York: ACM, 2024: 39-44.
[16] LI Y J, LI S S, DU H H, et al. YOLO-ACN: focusing on small target and occluded object detection[J]. IEEE Access, 2020, 8: 227288-227303.
[17] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. [2025-01-18]. https://arxiv.org/abs/1609.02907.
[18] QIN Z Q, ZHANG P Y, WU F, et al. FcaNet: frequency channel attention networks[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 763-772.
[19] YANG Z X, ZHU L C, WU Y, et al. Gated channel transformation for visual recognition[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 11791-11800.
[20] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[21] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the 13th Euro-pean Conference on Computer Vision. Cham: Springer, 2014: 740-755.
[22] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[23] ZHU X Z, SU W J, LU L W, et al. Deformable DETR: deformable transformers for end-to-end object detection[EB/OL]. [2025-01-19]. https://arxiv.org/abs/2010.04159.
[24] LIANG W T, XUE F, LIU Y H, et al. Unknown sniffer for object detection: don’t turn a blind eye to unknown objects[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 3230-3239.
[25] SHMELKOV K, SCHMID C, ALAHARI K. Incremental learning of object detectors without catastrophic forgetting[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 3400-3409.