计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (9): 1933-1953.DOI: 10.3778/j.issn.1673-9418.2203070
收稿日期:
2022-03-02
修回日期:
2022-04-28
出版日期:
2022-09-01
发布日期:
2022-09-15
通讯作者:
+ E-mail: Fuyan@hrbeu.edu.cn作者简介:
任宁(1996—),女,河南濮阳人,博士研究生,主要研究方向为深度学习目标检测、图像处理。基金资助:
REN Ning1, FU Yan1,+(), WU Yanxia1, LIANG Pengju1, HAN Xi2
Received:
2022-03-02
Revised:
2022-04-28
Online:
2022-09-01
Published:
2022-09-15
About author:
REN Ning, born in 1996, Ph.D. candidate. Her research interests include deep learning object detection and image processing.Supported by:
摘要:
目前手工提取特征进行目标检测的方案被深度学习所取代,深度学习技术极大地推动了目标检测技术的发展。目标检测也成为了深度学习最重要的应用领域之一。目标检测是同时预测给定图像中对象实例的类别和位置,这项技术已经广泛应用于医学影像、遥感技术、监控安防、自动驾驶等领域。但是随着深度学习技术的应用领域的多元化,目标检测中出现的失衡问题成为了目前优化目标检测训练模型的一个新的切入点。主要分析在运用机器学习技术解决目标检测问题过程中,模型在每个训练阶段会出现的四类失衡问题:数据失衡、尺度失衡、相对空间失衡以及分类与回归失衡。剖析问题产生的主要原因,研究具有代表性的经典解决方案,阐述目标检测在各个领域中存在的问题。通过对目标检测失衡问题的分析和总结,讨论未来目标检测失衡问题的研究方向。
中图分类号:
任宁, 付岩, 吴艳霞, 梁鹏举, 韩希. 深度学习应用于目标检测中失衡问题研究综述[J]. 计算机科学与探索, 2022, 16(9): 1933-1953.
REN Ning, FU Yan, WU Yanxia, LIANG Pengju, HAN Xi. Review of Research on Imbalance Problem in Deep Learning Applied to Object Detection[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 1933-1953.
Classification | Method | Detector | AP/% | ∆AP |
---|---|---|---|---|
Hard sampling | MBS[ | Faster R-CNN | 36.4 | — |
OHEM[ | Faster R-CNN | 36.6 | +0.2 | |
S-OHEM[ | Faster R-CNN | 38.5 | +1.9 | |
Soft sampling | Focal loss[ | RetinaNet | 35.7 | +0.7 |
GHM[ | RetinaNet | 35.8 | +0.8 | |
PISA[ | RetinaNet | 40.4 | +5.4 | |
Faster R-CNN | 41.5 | +5.1 | ||
Sampling-Free | AP Loss[ | RetinaNet | 35.0 | — |
DR Loss[ | Faster R-CNN | 37.2 | +0.8 | |
Sampling-Free[ | RetinaNet | 36.6 | +1.6 | |
Faster R-CNN | 38.4 | +2.0 | ||
Generate | TADS[ | Fast R-CNN | 32.0 | — |
GA-RPN[ | RetinaNet | 37.1 | +2.1 | |
Fast R-CNN | 39.4 | +7.4 | ||
Faster R-CNN | 39.8 | +3.4 |
表1 前景/背景失衡优化策略性能对比
Table 1 Performance comparison of foreground-background class imbalance optimization strategies
Classification | Method | Detector | AP/% | ∆AP |
---|---|---|---|---|
Hard sampling | MBS[ | Faster R-CNN | 36.4 | — |
OHEM[ | Faster R-CNN | 36.6 | +0.2 | |
S-OHEM[ | Faster R-CNN | 38.5 | +1.9 | |
Soft sampling | Focal loss[ | RetinaNet | 35.7 | +0.7 |
GHM[ | RetinaNet | 35.8 | +0.8 | |
PISA[ | RetinaNet | 40.4 | +5.4 | |
Faster R-CNN | 41.5 | +5.1 | ||
Sampling-Free | AP Loss[ | RetinaNet | 35.0 | — |
DR Loss[ | Faster R-CNN | 37.2 | +0.8 | |
Sampling-Free[ | RetinaNet | 36.6 | +1.6 | |
Faster R-CNN | 38.4 | +2.0 | ||
Generate | TADS[ | Fast R-CNN | 32.0 | — |
GA-RPN[ | RetinaNet | 37.1 | +2.1 | |
Fast R-CNN | 39.4 | +7.4 | ||
Faster R-CNN | 39.8 | +3.4 |
Classifi- cation | Method | Time | AP/% | APr/% | APc/% | APf/% |
---|---|---|---|---|---|---|
Class re- balancing | SimCal[ | 2020 | 23.4 | 16.4 | 22.5 | 27.2 |
BALMS[ | 2020 | 27.0 | 19.6 | 28.9 | 27.5 | |
FVR[ | 2021 | 23.7 | 17.8 | 22.9 | 27.2 | |
FASA[ | 2021 | 31.5 | 24.1 | 31.9 | 34.0 |
表2 类-再平衡策略性能对比表
Table 2 Performance comparison of class re-balancing
Classifi- cation | Method | Time | AP/% | APr/% | APc/% | APf/% |
---|---|---|---|---|---|---|
Class re- balancing | SimCal[ | 2020 | 23.4 | 16.4 | 22.5 | 27.2 |
BALMS[ | 2020 | 27.0 | 19.6 | 28.9 | 27.5 | |
FVR[ | 2021 | 23.7 | 17.8 | 22.9 | 27.2 | |
FASA[ | 2021 | 31.5 | 24.1 | 31.9 | 34.0 |
Classification | Method | Time | Datasets | ACC/% | ΔACC |
---|---|---|---|---|---|
Information augmentation | LEAP[ | 2020 | MSMT17 | 50.50 | — |
M2M[ | 2020 | CIFAR-LT-50 | 79.10 | — | |
CIFAR-LT-100 | 43.50 | — | |||
GIST[ | 2021 | iNaturalist 2018 | 70.80 | — | |
Module improvement | KCL[ | 2021 | ImageNet-LT | 45.74 | — |
Hybrid- PSC[ | 2021 | CIFAR-LT-10 | 78.82 | — | |
CIFAR-LT-100 | 44.97 | +1.47 | |||
PaCo[ | 2021 | ImageNet-LT | 51.00 | +5.26 | |
DRO- LT[ | 2021 | CIFAR-LT-100 | 47.31 | +2.34 |
表3 信息增益和模型改进方案性能对比
Table 3 Performance comparison of information augmentation and module improvement
Classification | Method | Time | Datasets | ACC/% | ΔACC |
---|---|---|---|---|---|
Information augmentation | LEAP[ | 2020 | MSMT17 | 50.50 | — |
M2M[ | 2020 | CIFAR-LT-50 | 79.10 | — | |
CIFAR-LT-100 | 43.50 | — | |||
GIST[ | 2021 | iNaturalist 2018 | 70.80 | — | |
Module improvement | KCL[ | 2021 | ImageNet-LT | 45.74 | — |
Hybrid- PSC[ | 2021 | CIFAR-LT-10 | 78.82 | — | |
CIFAR-LT-100 | 44.97 | +1.47 | |||
PaCo[ | 2021 | ImageNet-LT | 51.00 | +5.26 | |
DRO- LT[ | 2021 | CIFAR-LT-100 | 47.31 | +2.34 |
Method | Time | Algorithm | Dataset | AP/% | Relative improvement/% |
---|---|---|---|---|---|
GIoU Loss[ | 2019 | YOLOv3 | PASCAL VOC 2007 | 47.73 | — |
SSD | 51.06 | — | |||
Faster R-CNN | MSCOCO 2017 | 38.02 | — | ||
— | COCO val-2017 | 36.50 | — | ||
DIoU Loss[ | 2019 | YOLOv3 | PASCAL VOC 2007 | 48.10 | 0.78 |
SSD | 51.31 | 0.48 | |||
Faster R-CNN | MSCOCO 2017 | 38.09 | 0.18 | ||
CIoU Loss[ | 2019 | YOLOv3 | PASCAL VOC 2007 | 49.21 | 3.10 |
SSD | 51.44 | 0.74 | |||
Faster R-CNN | MSCOCO 2017 | 38.65 | 1.66 | ||
— | COCO val-2017 | 36.70 | 0.55 | ||
EIoU Loss[ | 2021 | — | COCO val-2017 | 37.00 | 4.11 |
表4 IOU系列性能对比
Table 4 Performance comparison of IOU
Method | Time | Algorithm | Dataset | AP/% | Relative improvement/% |
---|---|---|---|---|---|
GIoU Loss[ | 2019 | YOLOv3 | PASCAL VOC 2007 | 47.73 | — |
SSD | 51.06 | — | |||
Faster R-CNN | MSCOCO 2017 | 38.02 | — | ||
— | COCO val-2017 | 36.50 | — | ||
DIoU Loss[ | 2019 | YOLOv3 | PASCAL VOC 2007 | 48.10 | 0.78 |
SSD | 51.31 | 0.48 | |||
Faster R-CNN | MSCOCO 2017 | 38.09 | 0.18 | ||
CIoU Loss[ | 2019 | YOLOv3 | PASCAL VOC 2007 | 49.21 | 3.10 |
SSD | 51.44 | 0.74 | |||
Faster R-CNN | MSCOCO 2017 | 38.65 | 1.66 | ||
— | COCO val-2017 | 36.70 | 0.55 | ||
EIoU Loss[ | 2021 | — | COCO val-2017 | 37.00 | 4.11 |
Method | AP/% | Latency/ms |
---|---|---|
YOLOv5-S YOLOX-S | 36.7 39.6(+2.9) | 8.7 9.8 |
YOLOv5-M YOLOX-M | 44.5 46.4(+1.9) | 11.1 12.3 |
YOLOv5-L YOLOX-L | 48.2 50.0(+1.8) | 13.7 14.5 |
YOLOv5-L YOLOX-L | 50.4 51.2(+0.8) | 16.0 17.3 |
表5 YOLOX性能对比
Table 5 Performance comparison of YOLOX
Method | AP/% | Latency/ms |
---|---|---|
YOLOv5-S YOLOX-S | 36.7 39.6(+2.9) | 8.7 9.8 |
YOLOv5-M YOLOX-M | 44.5 46.4(+1.9) | 11.1 12.3 |
YOLOv5-L YOLOX-L | 48.2 50.0(+1.8) | 13.7 14.5 |
YOLOv5-L YOLOX-L | 50.4 51.2(+0.8) | 16.0 17.3 |
目标检测 失衡 | 分类思想 | 名称 | 优点 | 局限性 | ||
---|---|---|---|---|---|---|
数据失衡 | 前景/背 景失衡 | 有采样 方法 | 硬采样 方法 | MBS[ | 广泛应用于两阶段方法 | 忽略了难易样本的影响 |
OHEM[ | 无需设置正负样本比例,数据集越大,算法越精确 | 消耗内存和时间 | ||||
S-OHEM[ | 根据分布抽样训练样本,捕捉困难样本优化失衡问题 | 引入新的超参数,训练耗时 | ||||
软采样 方法 | Focal loss[ | 对前景/背景的损失进行加权解决正负样本不平衡问题 | 只针对分类损失,离群点影响过大 | |||
GHM[ | 引入梯度密度,优化样本失衡 | 在COCO数据上提升有限 | ||||
PISA[ | IoU-HLR提升召回率和准确度 | 未考虑相同IoU的样本 | ||||
无采样方法 | AP Loss[ | 将分类问题转化为置信度排序问题,提高精度 | 无法提升判断速度 | |||
DR Loss[ | 超参数数量大,计算效率低 | |||||
Sampling-Free[ | 解决正负样本失衡问题 | 引入超参数,训练耗时 | ||||
生成方法 | TADS[ | 生成更高质量的硬样本优化失衡问题 | 样本生成过程耗时 | |||
GA-RPN[ | 可变性卷积生成高质量低密度的proposal,提高训练速度;自动生成anchor且实时修正;召回率高 | deformable卷积会相对地降低速度,对目标稠密图像不友好 | ||||
前景/前景失衡 | pRoI[ | 根据IoU自动生成RoI,提高RoI的利用率 | 在低IoU下的性能较差 | |||
G-SMOTE[ | 基于GNN通过插值生成少数类新样本并补全关系信息构建增广的平衡图,端到端的训练,避免引入噪音 | 训练阶段同一节点特征不稳定,存在反向传播时出现较大梯度方差问题 | ||||
BatchFormer[ | 隐式探索样本关系优化batch内的类别失衡,即插即用 | 隐式的样本关系太抽象,训练耗时 |
表6 前景/背景&前景/前景失衡优化策略总结
Table 6 Summary of foreground/background & foreground/foreground imbalance optimization strategy
目标检测 失衡 | 分类思想 | 名称 | 优点 | 局限性 | ||
---|---|---|---|---|---|---|
数据失衡 | 前景/背 景失衡 | 有采样 方法 | 硬采样 方法 | MBS[ | 广泛应用于两阶段方法 | 忽略了难易样本的影响 |
OHEM[ | 无需设置正负样本比例,数据集越大,算法越精确 | 消耗内存和时间 | ||||
S-OHEM[ | 根据分布抽样训练样本,捕捉困难样本优化失衡问题 | 引入新的超参数,训练耗时 | ||||
软采样 方法 | Focal loss[ | 对前景/背景的损失进行加权解决正负样本不平衡问题 | 只针对分类损失,离群点影响过大 | |||
GHM[ | 引入梯度密度,优化样本失衡 | 在COCO数据上提升有限 | ||||
PISA[ | IoU-HLR提升召回率和准确度 | 未考虑相同IoU的样本 | ||||
无采样方法 | AP Loss[ | 将分类问题转化为置信度排序问题,提高精度 | 无法提升判断速度 | |||
DR Loss[ | 超参数数量大,计算效率低 | |||||
Sampling-Free[ | 解决正负样本失衡问题 | 引入超参数,训练耗时 | ||||
生成方法 | TADS[ | 生成更高质量的硬样本优化失衡问题 | 样本生成过程耗时 | |||
GA-RPN[ | 可变性卷积生成高质量低密度的proposal,提高训练速度;自动生成anchor且实时修正;召回率高 | deformable卷积会相对地降低速度,对目标稠密图像不友好 | ||||
前景/前景失衡 | pRoI[ | 根据IoU自动生成RoI,提高RoI的利用率 | 在低IoU下的性能较差 | |||
G-SMOTE[ | 基于GNN通过插值生成少数类新样本并补全关系信息构建增广的平衡图,端到端的训练,避免引入噪音 | 训练阶段同一节点特征不稳定,存在反向传播时出现较大梯度方差问题 | ||||
BatchFormer[ | 隐式探索样本关系优化batch内的类别失衡,即插即用 | 隐式的样本关系太抽象,训练耗时 |
目标检测失衡 | 名称 | 优点 | 局限性 | |
---|---|---|---|---|
类别标签 失衡 | 重采样 | MLSOL[ | 多标签数据合成过采样;通过局部优化全局,更高效;采用集成框架提高鲁棒性 | 存在局部引导全局的风险 |
MMT[ | 更鲁棒的伪标签,性能提升,无监督的自适应模式 | 直接更新会导致cluster更新的不一致性 | ||
分类器 自适应 | Tagewise Loss[ | 优化弱监督多标签学习中的失衡问题 | 权重的选择限制训练结果 | |
COCOA[ | 同时解决inter-class和intra-class 失衡问题 | 每个类随机耦合引导学习器,耗时耗空间 | ||
集成 方法 | MCHE[ | 异构集成同时解决样本失衡和标签相关性问题,泛化性能强 | 个体之间存在强依赖关系 | |
ECCRU3[ | 提高示例的利用率,扩展ECC对失衡的弹性 | 当失衡率较小时检测效果受限 |
表7 类别标签失衡优化策略总结
Table 7 Summary of category label imbalance optimization strategy
目标检测失衡 | 名称 | 优点 | 局限性 | |
---|---|---|---|---|
类别标签 失衡 | 重采样 | MLSOL[ | 多标签数据合成过采样;通过局部优化全局,更高效;采用集成框架提高鲁棒性 | 存在局部引导全局的风险 |
MMT[ | 更鲁棒的伪标签,性能提升,无监督的自适应模式 | 直接更新会导致cluster更新的不一致性 | ||
分类器 自适应 | Tagewise Loss[ | 优化弱监督多标签学习中的失衡问题 | 权重的选择限制训练结果 | |
COCOA[ | 同时解决inter-class和intra-class 失衡问题 | 每个类随机耦合引导学习器,耗时耗空间 | ||
集成 方法 | MCHE[ | 异构集成同时解决样本失衡和标签相关性问题,泛化性能强 | 个体之间存在强依赖关系 | |
ECCRU3[ | 提高示例的利用率,扩展ECC对失衡的弹性 | 当失衡率较小时检测效果受限 |
目标检测失衡 | 名称 | 优点 | 局限性 | |
---|---|---|---|---|
长尾 数据 失衡 | 类-再平衡 | SimCal[ | 简洁的校准框架,双层类平衡采样优化失衡问题,显著提升多阶段模型的HTC | 缺少泛化性,收敛缓慢 |
BALMS[ | Balanced Softmax优化因训练与测试标签分布导致的失衡,自动学习最优采样率优化失衡,快速收敛 | 根据偏移量在空间和比例上进行采样存在误差 | ||
FVR[ | 自适应校准分类分数,分布调整优化失衡问题,应用广泛 | 牺牲loss函数性能下完成的失衡优化 | ||
FASA[ | 可集成到其他单阶段模型中,自动选择最佳特征 | anchor size未发挥作用 | ||
信息增强 | LEAP[ | 减小了类间类内特征方差的失真,优化头尾数据失衡 | 特征云加大计算量,运算速度慢 | |
M2M[ | 通过头部类学习尾部类别的特征构建更平衡的训练数据集,提升尾部类的泛化能力 | 均衡采样方式耗时 | ||
GIST[ | 获得更高效的尾类性能 | 采用余弦分类器,相比Decouple却未提升 | ||
模型改进 | KCL[ | 结合监督方法和对比学习方法的优势来学习具有区分性和平衡性的表示,泛化性更强 | 未证明平衡性度量显示特征空间的均匀性,并行工作存在差异 | |
PaCo[ | 创新了监督对比学习方案,优化每个类优化失衡 | 更高效的类别中心,因此需要更强的信息和更久的训练时间 | ||
DRO-LT[69] | 可以应用于深度模型的多个层的表示,减少特征空间中对头类的表示偏差,更具有鲁棒性 | 直接优化会导致收敛到0 |
表8 长尾数据失衡优化策略总结
Table 8 Summary of long-tail data imbalance optimization strategy
目标检测失衡 | 名称 | 优点 | 局限性 | |
---|---|---|---|---|
长尾 数据 失衡 | 类-再平衡 | SimCal[ | 简洁的校准框架,双层类平衡采样优化失衡问题,显著提升多阶段模型的HTC | 缺少泛化性,收敛缓慢 |
BALMS[ | Balanced Softmax优化因训练与测试标签分布导致的失衡,自动学习最优采样率优化失衡,快速收敛 | 根据偏移量在空间和比例上进行采样存在误差 | ||
FVR[ | 自适应校准分类分数,分布调整优化失衡问题,应用广泛 | 牺牲loss函数性能下完成的失衡优化 | ||
FASA[ | 可集成到其他单阶段模型中,自动选择最佳特征 | anchor size未发挥作用 | ||
信息增强 | LEAP[ | 减小了类间类内特征方差的失真,优化头尾数据失衡 | 特征云加大计算量,运算速度慢 | |
M2M[ | 通过头部类学习尾部类别的特征构建更平衡的训练数据集,提升尾部类的泛化能力 | 均衡采样方式耗时 | ||
GIST[ | 获得更高效的尾类性能 | 采用余弦分类器,相比Decouple却未提升 | ||
模型改进 | KCL[ | 结合监督方法和对比学习方法的优势来学习具有区分性和平衡性的表示,泛化性更强 | 未证明平衡性度量显示特征空间的均匀性,并行工作存在差异 | |
PaCo[ | 创新了监督对比学习方案,优化每个类优化失衡 | 更高效的类别中心,因此需要更强的信息和更久的训练时间 | ||
DRO-LT[69] | 可以应用于深度模型的多个层的表示,减少特征空间中对头类的表示偏差,更具有鲁棒性 | 直接优化会导致收敛到0 |
目标检测 失衡 | 分类思想 | 名称 | 优点 | 局限性 |
---|---|---|---|---|
尺度失衡 | 目标实例/边界框失衡 | SA-Fast R-CNN[ | 两个分类器联合预测,精度高 | 耗时耗空间,候选框模块独立 |
FPN[ | 解决了多尺度变化导致的失衡问题,对小目标检测效果精度高 | 丧失网络平移不变性,对大小目标的敏感度不同 | ||
SNIP[ | 采用了图像金字塔构造多尺度特征,解决多尺度问题,用单分支网络学习更深层特征,更高效 | 对每一个像素进行处理会导致运行慢 | ||
SNIPER[ | 将目标所在的部分上下文区域输入网络减少计算量,可进行多尺度训练,泛化能力强,训练速度相较于SNIP提升三倍 | 仅输入局部信息存在检测误差,SNIPER仅是一个采样策略 | ||
TridentNet[ | 提出空洞卷积获得更一致的特征来优化失衡,实现多尺度目标的检测,泛化能力强 | 并行多分支学习输入图像的不同 尺度的目标的特征,检测速度慢 | ||
特征失衡 | PANet[ | 双向融合主干网络增强表征能力,增加了全连接分支,提升了预测的掩码的质量,没有引入额外的可学习参数,不易过度拟合,对小尺度和中等尺度的实例检测精度提高 | 对大尺度示例不友好 | |
Thunder-Net[ | 简化FPN结构,引入RPN的前后景信息优化特征分布解决失衡,第一个实时监测器 | 相对检测准确度减低 | ||
Libra FPN[ | 性能提升,特征提取更有效,hard negative在IoU上均匀分布 | 网络模型更为复杂,无法满足实时的要求 | ||
STDN[ | 提升多尺度目标的检测效果,模型更专注于物体本身的检测 | 效率低,网络量大 | ||
NAS-FPN[ | 自动架构搜索的特征金字塔网络,架构简单且高效,图像分类任务中优势明显 | 搜索空间大,耗时 | ||
GraphFPN[ | 拓扑结构的神经网络跨空间和尺度地执行特征交互,卷积金字塔平衡特征与特征层 | 特征交互耗时耗空间 | ||
Multi-level FPN[ | 按照不同size特征融合解决失衡问题 | 结构复杂,耗时 | ||
AdaMixer[ | 动态解码采样特征,训练更快,更高效地完成Query与特征层的联系 | 线性层中动态地产生大量混合参数导致总参数量过大 |
表9 尺度失衡优化策略总结
Table 9 Summary of scale imbalance optimization strategy
目标检测 失衡 | 分类思想 | 名称 | 优点 | 局限性 |
---|---|---|---|---|
尺度失衡 | 目标实例/边界框失衡 | SA-Fast R-CNN[ | 两个分类器联合预测,精度高 | 耗时耗空间,候选框模块独立 |
FPN[ | 解决了多尺度变化导致的失衡问题,对小目标检测效果精度高 | 丧失网络平移不变性,对大小目标的敏感度不同 | ||
SNIP[ | 采用了图像金字塔构造多尺度特征,解决多尺度问题,用单分支网络学习更深层特征,更高效 | 对每一个像素进行处理会导致运行慢 | ||
SNIPER[ | 将目标所在的部分上下文区域输入网络减少计算量,可进行多尺度训练,泛化能力强,训练速度相较于SNIP提升三倍 | 仅输入局部信息存在检测误差,SNIPER仅是一个采样策略 | ||
TridentNet[ | 提出空洞卷积获得更一致的特征来优化失衡,实现多尺度目标的检测,泛化能力强 | 并行多分支学习输入图像的不同 尺度的目标的特征,检测速度慢 | ||
特征失衡 | PANet[ | 双向融合主干网络增强表征能力,增加了全连接分支,提升了预测的掩码的质量,没有引入额外的可学习参数,不易过度拟合,对小尺度和中等尺度的实例检测精度提高 | 对大尺度示例不友好 | |
Thunder-Net[ | 简化FPN结构,引入RPN的前后景信息优化特征分布解决失衡,第一个实时监测器 | 相对检测准确度减低 | ||
Libra FPN[ | 性能提升,特征提取更有效,hard negative在IoU上均匀分布 | 网络模型更为复杂,无法满足实时的要求 | ||
STDN[ | 提升多尺度目标的检测效果,模型更专注于物体本身的检测 | 效率低,网络量大 | ||
NAS-FPN[ | 自动架构搜索的特征金字塔网络,架构简单且高效,图像分类任务中优势明显 | 搜索空间大,耗时 | ||
GraphFPN[ | 拓扑结构的神经网络跨空间和尺度地执行特征交互,卷积金字塔平衡特征与特征层 | 特征交互耗时耗空间 | ||
Multi-level FPN[ | 按照不同size特征融合解决失衡问题 | 结构复杂,耗时 | ||
AdaMixer[ | 动态解码采样特征,训练更快,更高效地完成Query与特征层的联系 | 线性层中动态地产生大量混合参数导致总参数量过大 |
目标检测 失衡 | 分类 思想 | 名称 | 优点 | 局限性 |
---|---|---|---|---|
相对空间失衡 | 回归 失衡 | GIoU Loss[ | 非负性、尺度不变性 | 使用闭包影响收敛速度,存在退化问题,训练过程会发散 |
DIoU Loss[ | 计算边界框中心点距离,收敛速度快,回归准确 | 存在预测框被错误放大的问题,存在退化问题,破坏了尺度不变性,对小尺度样本间的检测精度失准 | ||
CIoU Loss[ | 引入相对比例,提升检测效果,计算成本低 | 宽和高不能同时增大或者减小 | ||
EIoU Loss[ | 采用直接对宽和高的预测结果进行惩罚的损失函数,收敛速度快 | 在部分数据集上性能不稳定 | ||
Cascade R-CNN[ | 提供足够满足阈值条件的样本,每个阶段的H不同适应多级分布,防噪声干扰和过拟合 | 未有效地利用前一时刻的结果 | ||
IoU-uniformR-CNN[ | 引入特征偏移避免特征的不对齐现象,检测精度高 | 产生的扰动ROI只用来训练回归分支会导致训练与测试的结果不一致 | ||
目标 失衡 | RefineDet[ | 过滤掉负锚以减少分类器的搜索空间,将前一个模块的细化锚点作为下一个的输入,调高回归精度并预测多类标签 | 搜索范围大导致存储空间需求大 | |
Reppoints[ | 相比锚更新精细的目标检测框,无需锚对边界框空间进行采样 | 空间采样缺乏可解释性,存储空间需求大 | ||
Free anchor[ | 网络自主学习,提高定位准确率,减少超参数的设计,泛化能力强,异常尺度目标检测精度高 | 预测得到的框置信度都偏低,检测精度低于Anchor-based | ||
YOLOv4[ | 使用1080Ti或2080Ti就能训练出超快、准确的目标检测器,改进SOTA方法,使其更有效、更适合单GPU训练 | Darknet 架构相对较大,模型的锚框长宽比只能适应大部分目标,缺少泛化性 |
表10 相对空间失衡优化策略总结
Table 10 Summary of relative spatial imbalance optimization strategy
目标检测 失衡 | 分类 思想 | 名称 | 优点 | 局限性 |
---|---|---|---|---|
相对空间失衡 | 回归 失衡 | GIoU Loss[ | 非负性、尺度不变性 | 使用闭包影响收敛速度,存在退化问题,训练过程会发散 |
DIoU Loss[ | 计算边界框中心点距离,收敛速度快,回归准确 | 存在预测框被错误放大的问题,存在退化问题,破坏了尺度不变性,对小尺度样本间的检测精度失准 | ||
CIoU Loss[ | 引入相对比例,提升检测效果,计算成本低 | 宽和高不能同时增大或者减小 | ||
EIoU Loss[ | 采用直接对宽和高的预测结果进行惩罚的损失函数,收敛速度快 | 在部分数据集上性能不稳定 | ||
Cascade R-CNN[ | 提供足够满足阈值条件的样本,每个阶段的H不同适应多级分布,防噪声干扰和过拟合 | 未有效地利用前一时刻的结果 | ||
IoU-uniformR-CNN[ | 引入特征偏移避免特征的不对齐现象,检测精度高 | 产生的扰动ROI只用来训练回归分支会导致训练与测试的结果不一致 | ||
目标 失衡 | RefineDet[ | 过滤掉负锚以减少分类器的搜索空间,将前一个模块的细化锚点作为下一个的输入,调高回归精度并预测多类标签 | 搜索范围大导致存储空间需求大 | |
Reppoints[ | 相比锚更新精细的目标检测框,无需锚对边界框空间进行采样 | 空间采样缺乏可解释性,存储空间需求大 | ||
Free anchor[ | 网络自主学习,提高定位准确率,减少超参数的设计,泛化能力强,异常尺度目标检测精度高 | 预测得到的框置信度都偏低,检测精度低于Anchor-based | ||
YOLOv4[ | 使用1080Ti或2080Ti就能训练出超快、准确的目标检测器,改进SOTA方法,使其更有效、更适合单GPU训练 | Darknet 架构相对较大,模型的锚框长宽比只能适应大部分目标,缺少泛化性 |
目标检测 失衡 | 名称 | 时间 | 优点 | 局限性 |
---|---|---|---|---|
分类与回归失衡 | IoU-Net[ | 2018 | 提高了IoU值高的框的置信度 | 分类和回归的misalignment依然存在 |
aLRP[ | 2020 | 仅一个超参数,精度更高,提供可证明的平衡 | 分类与回归的独立性 | |
Double-Head RCNN[ | 2020 | 实现了分类和回归的解耦 | 由于共享proposal ROI pooling之后的特征,分类与回归失衡问题依然存在 | |
TSD[ | 2020 | 通过对proposal特征的重采样或特征变换,解耦分类和回归任务 | 样本分配策略和任务无关不能完成两任务的同时预测 | |
YOLOX[ | 2021 | 控制Mosaic和Mix-up的开关,提升性能,取消真实框与正样本的一对一关系,增加正样本数量实现一对多策略优化失衡 | — | |
DIR[ | 2021 | 从失衡数据中学习连续目标处理缺失数据,采用分布平滑标签校准标签和特征分布来优化失衡问题 | 定义假设数据服从高斯分布存在误差 | |
TOOD[ | 2021 | 在学习任务交互和任务特定功能之间提供平衡,提出任务对齐学习检测器 | 忽略了时间上的不对齐问题 | |
RS LOSS[ | 2021 | 启发式算法来平衡多任务中的失衡问题,根据定位的质量对正样本进行了排序,更简洁高效;无需调参 | — |
表11 分类与回归失衡优化策略总结
Table 11 Summary of classification and regression imbalance optimization strategy
目标检测 失衡 | 名称 | 时间 | 优点 | 局限性 |
---|---|---|---|---|
分类与回归失衡 | IoU-Net[ | 2018 | 提高了IoU值高的框的置信度 | 分类和回归的misalignment依然存在 |
aLRP[ | 2020 | 仅一个超参数,精度更高,提供可证明的平衡 | 分类与回归的独立性 | |
Double-Head RCNN[ | 2020 | 实现了分类和回归的解耦 | 由于共享proposal ROI pooling之后的特征,分类与回归失衡问题依然存在 | |
TSD[ | 2020 | 通过对proposal特征的重采样或特征变换,解耦分类和回归任务 | 样本分配策略和任务无关不能完成两任务的同时预测 | |
YOLOX[ | 2021 | 控制Mosaic和Mix-up的开关,提升性能,取消真实框与正样本的一对一关系,增加正样本数量实现一对多策略优化失衡 | — | |
DIR[ | 2021 | 从失衡数据中学习连续目标处理缺失数据,采用分布平滑标签校准标签和特征分布来优化失衡问题 | 定义假设数据服从高斯分布存在误差 | |
TOOD[ | 2021 | 在学习任务交互和任务特定功能之间提供平衡,提出任务对齐学习检测器 | 忽略了时间上的不对齐问题 | |
RS LOSS[ | 2021 | 启发式算法来平衡多任务中的失衡问题,根据定位的质量对正样本进行了排序,更简洁高效;无需调参 | — |
[1] | FAN Q F, BROWN L M, SMITH J. A closer look at faster R-CNN for vehicle detection[C]// Proceedings of the 2016 IEEE Intelligent Vehicles Symposium, Gotenburg, Jun 19-22, 2016. Piscataway: IEEE, 2016: 124-129. |
[2] |
FU Z H, CHEN Y W, YONG H W, et al. Foreground gating and background refining network for surveillance object de-tection[J]. IEEE Transactions on Image Processing, 2019, 28(12): 6077-6090.
DOI URL |
[3] | TEKIN B, SINHA S N, FUA P. Real-time seamless single shot 6D object pose prediction[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recogni-tion, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Com-puter Society, 2018: 292-301. |
[4] | GEIGER A, LENZ P, URTASUN R. Are we ready for auto-nomous driving? The KITTI vision benchmark suite[C]// Pro-ceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, Jun 16-21, 2012. Was-hington: IEEE Computer Society, 2012: 3354-3361. |
[5] | DAI X R. HYBRIDNET: a fast vehicle detection system for autonomous driving[J]. Signal Processing: Image Communi-cation, 2019, 70: 79-88. |
[6] | 孙家泽, 唐彦梅, 王曙燕. 利用GAN和特征金字塔的模型鲁棒性优化方法[J/OL]. 计算机科学与探索(2021-08-19) [2022-01-14].https://kns.cnki.net/kcms/detail/11.5602.TP.20210819.1519.004.html. |
SUN J Z, TANG Y M, WANG S Y. Model robust optimiza-tion method of using GAN and feature pyramid[J/OL]. Journal of Frontiers of Computer Science and Technology (2021-08-19) [2022-01-14].https://kns.cnki.net/kcms/detail/11.5602.TP.20210819.1519.004.html. | |
[7] | JAEGER P F, KOHL S A A, BICKELHAUPT S, et al. Retina U-Net: embarrassingly simple exploitation of segmentation supervision for medical object detection[C]// Proceedings of the Machine Learning for Health Workshop, Vancouver, Dec 13, 2019: 171-183. |
[8] | LEE S, BAE J S, KIM H, et al. Liver lesion detection from weakly-labeled multi-phase CT volumes with a grouped single shot MultiBox detector[C]// LNCS 11071: Proceedings of the 21st International Conference on Medical Image Computing and Computer Assisted Intervention, Granada, Sep 16-21, 2018. Cham: Springer, 2018: 693-701. |
[9] | ZHANG L, WANG M L, LIU M X, et al. A survey on deep learning for neuroimaging-based brain disorder analysis[J]. arXiv:2005.04573, 2020. |
[10] | 王浩桐, 郭中华. 锚框策略匹配的SSD飞机遥感图像目标检测[J/OL]. 计算机科学与探索(2021-07-26) [2022-01-14].https://kns.cnki.net/kcms/detail/11.5602.TP.20210726.1310.002.html. |
WANG H T, GUO Z H. Target detection of SSD aircraft remote sensing images based on anchor frame strategy mat-ching[J/OL]. Journal of Frontiers of Computer Science and Technology (2021-07-26) [2022-01-14].https://kns.cnki.net/kcms/detail/11.5602.TP.20210726.1310.002.html. | |
[11] | 薛雅丽, 孙瑜, 马瀚融. 航空遥感影像中的轻量级小目标检测[J/OL]. 电光与控制[2022-03-08]. https://kns.cnki.net/kcms/detail/41.1227.TN.20220302.1312.004.html. |
XUE Y L, SUN Y, MA H R. Lightweight real-time detection of small targets in aerial remote sensing image[J/OL]. Elec-tronics Optics & Control [2022-03-08]. https://kns.cnki.net/kcms/detail/41.1227.TN.20220302.1312.004.html. | |
[12] | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]// LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37. |
[13] | 罗月童, 江佩峰, 段昶, 等. 面向小目标检测的改进Retina-Net模型及其应用[J/OL]. 计算机科学[2021-10-18]. https://kns.cnki.net/kcms/detail/50.1075.tp.20210628.1551.006.html. |
LUO Y T, JIANG P F, DUAN C, et al. Small object detec- tion oriented improved Retina-Net model and its application[J/OL]. Computer Science[2021-10-18]. https://kns.cnki.net/kcms/detail/50.1075.tp.20210628.1551.006.html. | |
[14] | REDMON J, DIVVALA S K, GIRSHICK R, et al. You only look once: unified real-time object detection[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pat-tern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 779-788. |
[15] | REDMON J, FARHADI A. YOLO9000: better, faster, stron-ger[C]// Proceedings of the 2017 IEEE Conference on Com-puter Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6517-6525. |
[16] | REDMON J, FARHADI A. YOLOv3: an incremental impro-vement[J]. arXiv:1804.02767, 2018. |
[17] | BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[J]. arXiv: 2004.10934, 2020. |
[18] |
LITJENS G, KOOI T, BEJNORDI B E, et al. A survey on deep learning in medical image analysis[J]. Medical Image Analysis, 2017, 42: 60-88.
DOI URL |
[19] |
JOHNSON J M, KHOSHGOFTAAR T M. Survey on deep learning with class imbalance[J]. Journal of Big Data, 2019, 6(1): 1-54.
DOI URL |
[20] |
董文轩, 梁宏涛, 刘国柱, 等. 深度卷积应用于目标检测算法综述[J]. 计算机科学与探索, 2022, 16(5): 1025-1042.
DOI |
DONG W X, LIANG H T, LIU G Z, et al. Review of deep convolution applied to target detection algorithms[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1025-1042. | |
[21] | 李柯泉, 陈燕, 刘佳晨, 等. 基于深度学习的目标检测算法综述[J/OL]. 计算机工程 [2022-03-12]. https://kns.cnki.net/kcms/detail/31.1289.TP.20211117.1341.001.html. |
LI K Q, CHEN Y, LIU J C, et al. Survey of deep learning- based object detection algorithms[J/OL]. Computer Enginee-ring[2022-03-12]. https://kns.cnki.net/kcms/detail/31.1289.TP.20211117.1341.001.html. | |
[22] |
LIU L, OUYANG W, WANG X, et al. Deep learning for generic object detection: a survey[J]. International Journal of Computer Vision, 2020, 128(2): 261-318.
DOI URL |
[23] | ZOU Z, SHI Z, GUO Y, et al. Object detection in 20 years: a survey[J]. arXiv:1905.05055, 2019. |
[24] |
刘雅芬, 郑艺峰, 江铃燚, 等. 深度半监督学习中伪标签方法综述[J]. 计算机科学与探索, 2022, 16(6): 1279-1290.
DOI |
LIU Y F, ZHENG Y F, JIANG L Y, et al. Survey on pseudo-labeling methods in deep semi-supervised learning[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1279-1290. | |
[25] |
程旭, 宋晨, 史金钢, 等. 基于深度学习的通用目标检测研究综述[J]. 电子学报, 2021, 49(7): 1428-1438.
DOI |
CHENG X, SONG C, SHI J G, et al. A survey of generic object detection methods based on deep learning[J]. Acta Electronica Sinica, 2021, 49(7): 1428-1438.
DOI |
|
[26] | DOLLAR P, WOJEK C, SCHIELE B, et al. Pedestrian de-tection: an evaluation of the state of the art[J]. IEEE Trans-actions on Pattern Analysis and Machine Intelligence, 2011, 34(4): 743-761. |
[27] | 张伟. 目标检测尺度不平衡问题综述[J]. 北京信息科技大学学报, 2020, 35(6): 95-100. |
ZHANG W. A review of scale imbalance problem of object detection[J]. Journal of Beijing Information Science & Tech-nology University, 2020, 35(6): 95-100. | |
[28] | HE Y H, ZHU C C, WANG J R, et al. Bounding box reg-ression with uncertainty for accurate object detection[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 2888-2897. |
[29] | HENDERSON P, FERRARI V. End-to-end training of object class detectors for mean average precision[C]// LNCS 10115: Proceedings of the 13th Asian Conference on Computer Vision, Taipei, China, Nov 20-24, 2016. Cham: Springer, 2016: 198-213. |
[30] | TAN Z Y, NIE X C, QIAN Q, et al. Learning to rank proposals for object detection[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision,Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 8272-8280. |
[31] | ZHOU P, NI B B, GENG C, et al. Scale-transferrable object detection[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 528-537. |
[32] | CAI Z W, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recogni-tion, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Com-puter Society, 2018: 6154-6162. |
[33] | ZHENG Z H, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]// Proceedings of the 34th AAAI Conference on Artificial In-telligence, the 32nd Innovative Applications of Artificial In-telligence Conference, the 10th AAAI Symposium on Edu-cational Advances in Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 12993-13000. |
[34] | GIDARIS S, KOMODAKIS N. Object detection via a multi-region and semantic segmentation-aware CNN model[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 1134-1142. |
[35] | GIDARIS S, KOMODAKIS N. Attend refine repeat: active box proposal generation via in-out localization[J]. arXiv:1606.04446, 2016. |
[36] | CAO J L, PANG Y W, HAN J G, et al. Hierarchical shot de-tector[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 9704-9713. |
[37] | ZHANG X, WAN F, LIU C, et al. Free anchor: learning to match anchors for visual object detection[J]. arXiv:1909.02466, 2019. |
[38] | REN S Q, HE K M, GIRSHICK R B, et al. Faster R-CNN: towards real-time object detection with region proposal net-works[C]// Advances in Neural Information Processing Systems 28, Montreal,Dec 7-12, 2015. Red Hook: Curran Associates, 2015: 91-99. |
[39] | LIN T Y, DOLLÁR P, GIRSHICK R B, et al. Feature pyra-mid networks for object detection[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Re-cognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Com-puter Society, 2017: 936-944. |
[40] | SHRIVASTAVA A, GUPTA A, GIRSHICK R B. Training region-based object detectors with online hard example mining[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 761-769. |
[41] | LI M N, ZHANG Z N, YU H, et al. S-OHEM: stratified online hard example mining for object detection[C]// Procee-dings of the 2nd CCF Chinese Conference on Computer Vi-sion, Tianjin, Oct 11-14, 2017. Cham: Springer, 2017: 166-177. |
[42] | LIN T Y, GOYAL P, GIRSHICK R B, et al. Focal loss for dense object detection[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 2999-3007. |
[43] | LI B Y, LIU Y, WANG X G. Gradient harmonized single-stage detector[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence, the 31st Innovative Applications of Artificial Intelligence Conference, the 9th AAAI Sympo-sium on Educational Advances in Artificial Intelligence, Honolulu, Jan 27-Feb 1, 2019. Menlo Park: AAAI, 2019: 8577-8584. |
[44] | CAO Y H, CHEN K, LOY C C, et al. Prime sample atten-tion in object detection[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recog-nition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 11580-11588. |
[45] |
CHEN K, LIN W, LI J, et al. AP-loss for accurate one-stage object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(11): 3782-3798.
DOI URL |
[46] | QIAN Q, CHEN L, LI H, et al. DR loss: improving object detection by distributional ranking[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 12161-12169. |
[47] |
CHEN J, LIU D, XU T, et al. Is heuristic sampling necessary in training deep object detectors?[J]. IEEE Transactions on Image Processing, 2021, 30: 8454-8467.
DOI URL |
[48] | TRIPATHI S, CHANDRA S, AGRAWAL A, et al. Learning to generate synthetic data via compositing[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pat-tern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 461-470. |
[49] | WANG J Q, CHEN K, YANG S, et al. Region proposal by guided anchoring[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 2965-2974. |
[50] | OKSUZ K, CAM B C, AKBAS E, et al. Generating positive bounding boxes for balanced training of object detectors[C]// Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision, Snowmass Village, Mar 1-5, 2020. Piscataway: IEEE, 2020: 883-892. |
[51] | ZHAO T X, ZHANG X, WANG S H. GraphSMOTE: imba-lanced node classification on graphs with graph neural net-works[C]// Proceedings of the 14th ACM International Con-ference on Web Search and Data Mining, Israel, Mar 8-12, 2021. New York: ACM, 2021: 833-841. |
[52] | HOU Z, YU B, TAO D. BatchFormer: learning to explore sample relationships for robust representation learning[J]. arXiv:2203.01522, 2022. |
[53] | LIU B, TSOUMAKAS G. Synthetic oversampling of multi- label data based on local label distribution[C]// LNCS 11907: Proceedings of the 2019 European Conference on Machine Learning and Knowledge Discovery in Databases, Würzburg, Sep 16-20, 2019. Cham: Springer, 2019: 180-193. |
[54] | GE Y, CHEN D, LI H. Mutual mean-teaching: pseudo label refinery for unsupervised domain adaptation on person re-identification[J]. arXiv:2001.01526, 2020. |
[55] |
LUO F F, GUO W Z, CHEN G L. Addressing imbalance in weakly supervised multi-label learning[J]. IEEE Access, 2019, 7: 37463-37472.
DOI URL |
[56] | ZHANG M L, LI Y K, LIU X Y. Towards class-imbalance aware multi-label learning[C]// Proceedings of the 24th Inter-national Joint Conference on Artificial Intelligence, Buenos Aires, Jul 25-31, 2015. Menlo Park: AAAI, 2015: 4041-4047. |
[57] |
TAHIR M A, KITTLER J, BOURIDANE A. Multilabel classification using heterogeneous ensemble of multi-label classifiers[J]. Pattern Recognition Letters, 2012, 33(5): 513-523.
DOI URL |
[58] | LIU B, TSOUMAKAS G. Making classifier chains resilient to class imbalance[C]// Proceedings of the 10th Asian Con-ference on Machine Learning,Beijing, Nov 14-16, 2018: 280-295. |
[59] | WANG T, LI Y, KANG B Y, et al. The devil is in classifica-tion: a simple framework for long-tail instance segmenta-tion[C]// LNCS 12359: Proceedings of the 16th European Con-ference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 728-744. |
[60] | REN J W, YU C J, SHENG S N, et al. Balanced meta-softmax for long-tailed visual recognition[C]// Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 4175-4186. |
[61] | ZHANG S Y, LI Z M, YAN S P, et al. Distribution align-ment: a unified framework for long-tail visual recognition[C]// Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 2361-2370. |
[62] | ZANG Y H, HUANG C, LOY C C. FASA: feature aug-mentation and sampling adaptation for long-tailed instance segmentation[C]// Proceedings of the 2021 IEEE/CVF Inter-national Conference on Computer Vision, Montreal, Oct 10-17, 2021. Piscataway: IEEE, 2021: 3437-3446. |
[63] | LIU J L, SUN Y F, HAN C C, et al. Deep representation learning on long-tailed data: a learnable embedding augmen-tation perspective[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 2967-2976. |
[64] | KIM J, JEONG J, SHIN J. M2m: imbalanced classification via major-to-minor translation[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Re-cognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 13896-13905. |
[65] | LIU B, LI H X, KANG H, et al. GistNet: a geometric struc-ture transfer network for long-tailed recognition[C]// Procee-dings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021. Piscataway:IEEE, 2021: 8189-8198. |
[66] | KANG B Y, LI Y, XIE S, et al. Exploring balanced feature spaces for representation learning[C]// Proceedings of the 9th International Conference on Learning Representations, Austria, May 3-7, 2021: 1-15. |
[67] | WANG P, HAN K, WEI X S, et al. Contrastive learning based hybrid networks for long-tailed image classification[C]// Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 943-952. |
[68] | CUI J Q, ZHONG Z S, LIU S, et al. Parametric contrastive learning[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021. Piscataway: IEEE, 2021: 695-704. |
[69] | SAMUEL D, CHECHIK G. Distributional robustness loss for long-tail learning[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021. Piscataway: IEEE, 2021: 9475-9484. |
[70] | DVORNIK N, MAIRAL J, SCHMID C. Modeling visual context is key to augmenting object detection datasets[C]// LNCS 11216: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Sprin-ger, 2018: 375-391. |
[71] | LI J, LIANG X, SHEN S M, et al. Scale-aware fast R-CNN for pedestrian detection[J]. IEEE Transactions on Multime-dia, 2018, 20(4): 985-996. |
[72] | SINGH B, DAVIS L S. An analysis of scale invariance in object detection snip[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 3578-3587. |
[73] | SINGH B, NAJIBI M, DAVIS L S. SNIPER: efficient multi- scale training[C]// Advances in Neural Information Processing Systems 31, Montréal, Dec 3-8, 2018: 9333-9343. |
[74] | NOH J, BAE W, LEE W, et al. Better to follow, follow to be better: towards precise supervision of feature super- resolution for small object detection[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vi-sion, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 9724-9733. |
[75] | LI Y H, CHEN Y T, WANG N Y, et al. Scale-aware trident networks for object detection[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 6053-6062. |
[76] | LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Piscataway: IEEE, 2018: 8759-8768. |
[77] | QIN Z, LI Z M, ZHANG Z N, et al. ThunderNet: towards real-time generic object detection on mobile devices[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 6717-6726. |
[78] | PANG J M, CHEN K, SHI J P, et al. Libra R-CNN: towards balanced learning for object detection[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 821-830. |
[79] | GHIASI G, LIN T Y, LE Q. NAS-FPN: learning scalable feature pyramid architecture for object detection[C]// Procee-dings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Pis-cataway: IEEE, 2019: 7036-7045. |
[80] | XU H, YAO L W, ZHANG W G, et al. Auto-FPN: automatic network architecture adaptation for object detection beyond classification[C]// Proceedings of the 2019 IEEE/CVF Inter-national Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 6648-6657. |
[81] | ZHAO G M, GE W F, YU Y Z. GraphFPN: graph feature pyramid network for object detection[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021. Piscataway: IEEE, 2021: 2743-2752. |
[82] |
ZHOU L, RAO X, LI Y, et al. A lightweight object detec- tion method in aerial images based on dense feature fusion path aggregation network[J]. ISPRS International Journal of Geo-Information, 2022, 11(3): 189.
DOI URL |
[83] | GAO Z T, WANG L M, HAN B, et al. AdaMixer: a fast-converging query-based object detector[J]. arXiv:2203.16507, 2022. |
[84] | REZATOFIGHI H, TSOI N, GWAK J Y, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 658-666. |
[85] | ZHANG Y F, REN W, ZHANG Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. arXiv:2101.08158, 2021. |
[86] | ZHU L, XIE Z H, LIU L M, et al. IoU-uniform R-CNN: brea-king through the limitations of RPN[J]. Pattern Recogni-tion, 2021, 112: 107816. |
[87] | DAI J F, QI H Z, XIONG Y W, et al. Deformable convolu-tional networks[C]// Proceedings of the 2017 IEEE Interna-tional Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 764-773. |
[88] | ZHANG S F, WEN L Y, BIAN X, et al. Single-shot refine-ment neural network for object detection[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 4203-4212. |
[89] | YANG Z, LIU S H, HU H, et al. RepPoints: point set repre-sentation for object detection[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 9656-9665. |
[90] | JIANG B R, LUO R X, MAO J Y, et al. Acquisition of localization confidence for accurate object detection[C]// LNCS 11218: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 784-799. |
[91] | KENDALL A, GAL Y, CIPOLLA R. Multi-task learning using uncertainty to weigh losses for scene geometry and seman-tics[C]// Proceedings of the 2018 IEEE Conference on Com-puter Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7482-7491. |
[92] | ZHANG Z, HE T, ZHANG H, et al. Bag of freebies for trai-ning object detection neural networks[J]. arXiv:1902.04103, 2019. |
[93] | OKSUZ K, CAM B C, AKBAS E, et al. A ranking-based, balanced loss function unifying classification and localisa-tion in object detection[C]// Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 15534-15545. |
[94] | WU Y, CHEN Y, YUAN L, et al. Rethinking classification and localization for object detection[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 10183-10192. |
[95] | SONG G L, LIU Y, WANG X G. Revisiting the sibling head in object detector[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recogni-tion, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 11560-11569. |
[96] | GE Z, LIU S, WANG F, et al. YOLOX: exceeding YOLO series in 2021[J]. arXiv:2107.08430, 2021. |
[97] | YANG Y Z, ZHA K W, CHEN Y C, et al. Delving into deep imbalanced regression[C]// Proceedings of the 38th Interna-tional Conference on Machine Learning, Jul 18-24, 2021: 11842-11851. |
[98] | FENG C J, ZHONG Y J, GAO Y, et al. TOOD: task-aligned one-stage object detection[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021. Piscataway: IEEE, 2021: 3490-3499. |
[99] | OKSUZ K, CAM B C, AKBAS E, et al. Rank & sort loss for object detection and instance segmentation[C]// Procee-dings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021. Piscataway:IEEE, 2021: 2989-2998. |
[1] | 陈灏然, 彭力, 李文涛, 戴菲菲. 加权网络下的小目标检测算法[J]. 计算机科学与探索, 2022, 16(9): 2143-2150. |
[2] | 吕晓琦, 纪科, 陈贞翔, 孙润元, 马坤, 邬俊, 李浥东. 结合注意力与循环神经网络的专家推荐算法[J]. 计算机科学与探索, 2022, 16(9): 2068-2077. |
[3] | 张祥平, 刘建勋. 基于深度学习的代码表征及其应用综述[J]. 计算机科学与探索, 2022, 16(9): 2011-2029. |
[4] | 李冬梅, 罗斯斯, 张小平, 许福. 命名实体识别方法研究综述[J]. 计算机科学与探索, 2022, 16(9): 1954-1968. |
[5] | 杨才东, 李承阳, 李忠博, 谢永强, 孙方伟, 齐锦. 深度学习的图像超分辨率重建技术综述[J]. 计算机科学与探索, 2022, 16(9): 1990-2010. |
[6] | 曾凡智, 许露倩, 周燕, 周月霞, 廖俊玮. 面向智慧教育的知识追踪模型研究综述[J]. 计算机科学与探索, 2022, 16(8): 1742-1763. |
[7] | 安凤平, 李晓薇, 曹翔. 权重初始化-滑动窗口CNN的医学图像分类[J]. 计算机科学与探索, 2022, 16(8): 1885-1897. |
[8] | 刘艺, 李蒙蒙, 郑奇斌, 秦伟, 任小广. 视频目标跟踪算法综述[J]. 计算机科学与探索, 2022, 16(7): 1504-1515. |
[9] | 赵小明, 杨轶娇, 张石清. 面向深度学习的多模态情感识别研究进展[J]. 计算机科学与探索, 2022, 16(7): 1479-1503. |
[10] | 夏鸿斌, 肖奕飞, 刘渊. 融合自注意力机制的长文本生成对抗网络模型[J]. 计算机科学与探索, 2022, 16(7): 1603-1610. |
[11] | 彭豪, 李晓明. 多尺度选择金字塔网络的小样本目标检测算法[J]. 计算机科学与探索, 2022, 16(7): 1649-1660. |
[12] | 孙方伟, 李承阳, 谢永强, 李忠博, 杨才东, 齐锦. 深度学习应用于遮挡目标检测算法综述[J]. 计算机科学与探索, 2022, 16(6): 1243-1259. |
[13] | 刘雅芬, 郑艺峰, 江铃燚, 李国和, 张文杰. 深度半监督学习中伪标签方法综述[J]. 计算机科学与探索, 2022, 16(6): 1279-1290. |
[14] | 董文轩, 梁宏涛, 刘国柱, 胡强, 于旭. 深度卷积应用于目标检测算法综述[J]. 计算机科学与探索, 2022, 16(5): 1025-1042. |
[15] | 程卫月, 张雪琴, 林克正, 李骜. 融合全局与局部特征的深度卷积神经网络算法[J]. 计算机科学与探索, 2022, 16(5): 1146-1154. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||