Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (3): 489-511.DOI: 10.3778/j.issn.1673-9418.2107076
• Surveys and Frontiers • Previous Articles Next Articles
LIU Ying1,2,3,+(), GUO Yingying1, FANG Jie1,2,3, FAN Jiulun1,3, HAO Yu1,3, LIU Jiming4
Received:
2021-07-21
Revised:
2021-09-23
Online:
2022-03-01
Published:
2021-09-23
About author:
LIU Ying, born in 1972, Ph.D., professor. Her research interests include image retrieval, image enhancement, etc.Supported by:
刘颖1,2,3,+(), 郭莹莹1, 房杰1,2,3, 范九伦1,3, 郝羽1,3, 刘继明4
通讯作者:
+ E-mail: liuying_ciip@163.com作者简介:
刘颖(1972—),女,陕西户县人,博士,教授,主要研究方向为图像检索、图像增强等。基金资助:
CLC Number:
LIU Ying, GUO Yingying, FANG Jie, FAN Jiulun, HAO Yu, LIU Jiming. Survey of Research on Deep Learning Image-Text Cross-Modal Retrieval[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(3): 489-511.
刘颖, 郭莹莹, 房杰, 范九伦, 郝羽, 刘继明. 深度学习跨模态图文检索研究综述[J]. 计算机科学与探索, 2022, 16(3): 489-511.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2107076
类别 | 代表性方法 | 特点 | 适用场景 |
---|---|---|---|
基于特征表示的方法 | MSDS | 能够更有效地提取文本特征,但对于样本数量较少的情况性能较差 | 大规模数据集 |
DeCAF | 学习到的视觉特征具有足够的表征能力 | 单标签或多标签样本图文跨模态检索 | |
deep-SM | 提高了对目标数据集的适应性,有效降低了图像与相应语义概念之间的鸿沟,但未能良好建立文本数据的低级特征和高级语义间的关系 |
Table 1 Summary of representative methods based on feature representation
类别 | 代表性方法 | 特点 | 适用场景 |
---|---|---|---|
基于特征表示的方法 | MSDS | 能够更有效地提取文本特征,但对于样本数量较少的情况性能较差 | 大规模数据集 |
DeCAF | 学习到的视觉特征具有足够的表征能力 | 单标签或多标签样本图文跨模态检索 | |
deep-SM | 提高了对目标数据集的适应性,有效降低了图像与相应语义概念之间的鸿沟,但未能良好建立文本数据的低级特征和高级语义间的关系 |
类别 | 代表性方法 | 特点 | 适用场景 |
---|---|---|---|
图像-文本对齐的方法 | Neural-Image-QA | 减少了爆炸梯度问题,且“单字”变体能使获得的准确率翻倍,实现最佳性能,但答案较长时准确率会迅速下降,且在训练数据点太少或图像有强遮挡等情况时容易匹配失败 | 现实世界的图像问答 任务 |
FashionBERT | 可掩盖图像中的不相关信息,且补丁能提供不重复且合理相关的信息,但需要大量带注释的图像文本对,在实际场景中不易获得 | 细粒度的跨模态图文检索任务 | |
CAAN | 检索速度快,模型小,在部署和应用上更方便实用,然而模态间比对和模态内相关性的结合要求更精细的模型设计 | 单词或图像区域在不同全局上下文中多种语义 | |
IMRAM | 对小规模和大规模数据集都能达到最佳性能,具有鲁棒性 | 考虑语义复杂性的跨模态检索任务 | |
跨模态重构的方法 | SC-NLM | 训练模型后可在图像嵌入上对模型调节,且能够不断优化对生成新描述的评分函数,实现对图像和字幕的排序;但不能动态修改用于调节解码器的向量 | 解决图像字幕生成问题 |
MASLN | 能够有效缩小不同模态间的统计差距,最大化语义区分能力,但训练模型可能不能很好地反映公共表示中目标集的成对相关性 | 可扩展跨模态检索任务 | |
CyTIR-Net | 在中小型数据集情况下表现更好,且在训练数据稀缺时能够增强最终检索性能,无需额外注释成本,但在某些情况下无法检索到模态间对应的所有细节信息 | 中小型数据集 | |
图文联合嵌入的方法 | RE-DNN | 仅需要很少的模型训练先验知识,同时能解决模态缺失问题,且对大规模数据集可扩展 | 多媒体信息检索系统 |
PVSE | 克服了被忽略信息在映射点丢失后无法恢复的问题 | 多义实例问题 | |
GXN | 能够检索具有局部相似性的图像或具有词级相似性的句子 |
Table 2 Summary of representative methods based on image-text matching
类别 | 代表性方法 | 特点 | 适用场景 |
---|---|---|---|
图像-文本对齐的方法 | Neural-Image-QA | 减少了爆炸梯度问题,且“单字”变体能使获得的准确率翻倍,实现最佳性能,但答案较长时准确率会迅速下降,且在训练数据点太少或图像有强遮挡等情况时容易匹配失败 | 现实世界的图像问答 任务 |
FashionBERT | 可掩盖图像中的不相关信息,且补丁能提供不重复且合理相关的信息,但需要大量带注释的图像文本对,在实际场景中不易获得 | 细粒度的跨模态图文检索任务 | |
CAAN | 检索速度快,模型小,在部署和应用上更方便实用,然而模态间比对和模态内相关性的结合要求更精细的模型设计 | 单词或图像区域在不同全局上下文中多种语义 | |
IMRAM | 对小规模和大规模数据集都能达到最佳性能,具有鲁棒性 | 考虑语义复杂性的跨模态检索任务 | |
跨模态重构的方法 | SC-NLM | 训练模型后可在图像嵌入上对模型调节,且能够不断优化对生成新描述的评分函数,实现对图像和字幕的排序;但不能动态修改用于调节解码器的向量 | 解决图像字幕生成问题 |
MASLN | 能够有效缩小不同模态间的统计差距,最大化语义区分能力,但训练模型可能不能很好地反映公共表示中目标集的成对相关性 | 可扩展跨模态检索任务 | |
CyTIR-Net | 在中小型数据集情况下表现更好,且在训练数据稀缺时能够增强最终检索性能,无需额外注释成本,但在某些情况下无法检索到模态间对应的所有细节信息 | 中小型数据集 | |
图文联合嵌入的方法 | RE-DNN | 仅需要很少的模型训练先验知识,同时能解决模态缺失问题,且对大规模数据集可扩展 | 多媒体信息检索系统 |
PVSE | 克服了被忽略信息在映射点丢失后无法恢复的问题 | 多义实例问题 | |
GXN | 能够检索具有局部相似性的图像或具有词级相似性的句子 |
类别 | 代表性方法 | 特点 | 适用场景 |
---|---|---|---|
二进制表示学习方法 | DVSH | 有效克服了传统融合网络对双峰对象联合嵌入的需求,对参数选择具有鲁棒性 | 适用高精度,同时能容忍较少最佳检索结果的应用程序 |
DCMH | 直接学习离散的哈希码,避免了对所学习哈希码准确性的降低,但所需训练时间长 | 大规模数据集上的人工神经网络搜索 | |
SSAH | 训练时间短,可捕获更精确的模态间相关性 | ||
CYC-DGH | 可以有效压缩输入数据,同时能最大限度地保留其自身信息及来自不同模态的样本间关系 | 有限数量标记样本的跨模态检索任务 |
Table 3 Summary of representative methods of binary representation learning
类别 | 代表性方法 | 特点 | 适用场景 |
---|---|---|---|
二进制表示学习方法 | DVSH | 有效克服了传统融合网络对双峰对象联合嵌入的需求,对参数选择具有鲁棒性 | 适用高精度,同时能容忍较少最佳检索结果的应用程序 |
DCMH | 直接学习离散的哈希码,避免了对所学习哈希码准确性的降低,但所需训练时间长 | 大规模数据集上的人工神经网络搜索 | |
SSAH | 训练时间短,可捕获更精确的模态间相关性 | ||
CYC-DGH | 可以有效压缩输入数据,同时能最大限度地保留其自身信息及来自不同模态的样本间关系 | 有限数量标记样本的跨模态检索任务 |
数据集名称 | 年份 | 图像数量 | 图像对应标签数量 | 文本(标签)数量 | 介绍 | 来源 | 示例图像-文本对 |
---|---|---|---|---|---|---|---|
NUS-WIDE[ | 2009 | 269 648(删除重复图像后) | 2~5 | 5 108(去掉无意义标签后) | 多标签定义图像 | 新加坡国立大学 多媒体检索实验室 | |
MSCOCO[ | 2014 | 123 287 | 5 | 616 435 | 图像描述 | 微软公司 | |
Flickr30k[ | 2014 | 31 783 | 5 | 158 915 | 图像描述 | 雅虎网站 | |
Wikipedia[ | 2014 | 2 866 | 1 | 2 866 | 图像/文本数据对 | 维基百科网站 | |
IAPRTC-12[ | 2006 | 19 627 | 1~5 | 4 576 | 多标签定义图像 | CLEF |
Table 4 Introduction of common datasets
数据集名称 | 年份 | 图像数量 | 图像对应标签数量 | 文本(标签)数量 | 介绍 | 来源 | 示例图像-文本对 |
---|---|---|---|---|---|---|---|
NUS-WIDE[ | 2009 | 269 648(删除重复图像后) | 2~5 | 5 108(去掉无意义标签后) | 多标签定义图像 | 新加坡国立大学 多媒体检索实验室 | |
MSCOCO[ | 2014 | 123 287 | 5 | 616 435 | 图像描述 | 微软公司 | |
Flickr30k[ | 2014 | 31 783 | 5 | 158 915 | 图像描述 | 雅虎网站 | |
Wikipedia[ | 2014 | 2 866 | 1 | 2 866 | 图像/文本数据对 | 维基百科网站 | |
IAPRTC-12[ | 2006 | 19 627 | 1~5 | 4 576 | 多标签定义图像 | CLEF |
模型 | 文本检索 | 图像检索 | ||||
---|---|---|---|---|---|---|
R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | |
BRNN[ | — | — | — | 15.2 | 37.7 | 50.0 |
SCAN[ | 67.4 | 90.3 | 95.8 | 48.6 | 77.7 | 85.2 |
CAAN[ | 70.1 | 91.6 | 97.2 | 52.8 | 79.0 | 87.9 |
IMRAM[ | 74.1 | 93.0 | 96.6 | 53.9 | 79.4 | 87.2 |
CyTIR-Net[ | 36.9 | 67.8 | 79.2 | 21.6 | 51.8 | 65.5 |
Table 5 R @ Kcomparison of different existing algorithms for Flickr30k
模型 | 文本检索 | 图像检索 | ||||
---|---|---|---|---|---|---|
R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | |
BRNN[ | — | — | — | 15.2 | 37.7 | 50.0 |
SCAN[ | 67.4 | 90.3 | 95.8 | 48.6 | 77.7 | 85.2 |
CAAN[ | 70.1 | 91.6 | 97.2 | 52.8 | 79.0 | 87.9 |
IMRAM[ | 74.1 | 93.0 | 96.6 | 53.9 | 79.4 | 87.2 |
CyTIR-Net[ | 36.9 | 67.8 | 79.2 | 21.6 | 51.8 | 65.5 |
模型 | 文本检索 | 图像检索 | ||||
---|---|---|---|---|---|---|
R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | |
BRNN[ | — | — | — | 27.4 | 60.2 | 74.8 |
SCAN[ | 72.7 | 94.8 | 98.4 | 58.8 | 88.4 | 94.8 |
CAAN[ | 75.5 | 95.4 | 98.5 | 61.3 | 89.7 | 95.2 |
IMRAM[ | 76.7 | 95.6 | 98.5 | 61.7 | 89.1 | 95.0 |
CyTIR-Net[ | 44.7 | 78.0 | 88.1 | 28.9 | 63.8 | 79.4 |
PVSE[ | 55.2 | 86.5 | 93.7 | 69.2 | 91.6 | 96.6 |
GXN[ | 56.6 | — | 94.5 | 68.5 | — | 97.9 |
Table 6 R @ Kcomparison of different existing algorithms for MSCOCO (1000 test images)
模型 | 文本检索 | 图像检索 | ||||
---|---|---|---|---|---|---|
R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | |
BRNN[ | — | — | — | 27.4 | 60.2 | 74.8 |
SCAN[ | 72.7 | 94.8 | 98.4 | 58.8 | 88.4 | 94.8 |
CAAN[ | 75.5 | 95.4 | 98.5 | 61.3 | 89.7 | 95.2 |
IMRAM[ | 76.7 | 95.6 | 98.5 | 61.7 | 89.1 | 95.0 |
CyTIR-Net[ | 44.7 | 78.0 | 88.1 | 28.9 | 63.8 | 79.4 |
PVSE[ | 55.2 | 86.5 | 93.7 | 69.2 | 91.6 | 96.6 |
GXN[ | 56.6 | — | 94.5 | 68.5 | — | 97.9 |
模型 | 文本检索 | 图像检索 | ||||
---|---|---|---|---|---|---|
R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | |
BRNN[ | — | — | — | 10.7 | 29.6 | 42.2 |
SCAN[ | 50.4 | 82.2 | 90.0 | 38.6 | 69.3 | 80.4 |
CAAN[ | 52.5 | 83.3 | 90.9 | 41.2 | 70.3 | 82.9 |
IMRAM[ | 53.7 | 83.2 | 91.0 | 39.7 | 69.1 | 79.8 |
PVSE[ | 32.4 | 63.0 | 75.0 | 45.2 | 74.3 | 84.5 |
GXN[ | 31.7 | — | 74.6 | 42.0 | — | 84.7 |
Table 7 R @ Kcomparison of different existing algorithms for MSCOCO (5000 test images)
模型 | 文本检索 | 图像检索 | ||||
---|---|---|---|---|---|---|
R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | |
BRNN[ | — | — | — | 10.7 | 29.6 | 42.2 |
SCAN[ | 50.4 | 82.2 | 90.0 | 38.6 | 69.3 | 80.4 |
CAAN[ | 52.5 | 83.3 | 90.9 | 41.2 | 70.3 | 82.9 |
IMRAM[ | 53.7 | 83.2 | 91.0 | 39.7 | 69.1 | 79.8 |
PVSE[ | 32.4 | 63.0 | 75.0 | 45.2 | 74.3 | 84.5 |
GXN[ | 31.7 | — | 74.6 | 42.0 | — | 84.7 |
模型 | 文本查询 | 图像查询 | 平均值 |
---|---|---|---|
ACMR[ | 0.871 0 | 0.932 0 | 0.902 0 |
DVSH[ | 0.767 3 | 0.755 2 | 0.761 3 |
SSAH[ | 0.578 0 | 0.577 0 | 0.578 0 |
CYC-DGH[ | 0.859 0 | 0.781 0 | 0.820 0 |
Table 8 MAP comparison of different existing algorithms for MSCOCO
模型 | 文本查询 | 图像查询 | 平均值 |
---|---|---|---|
ACMR[ | 0.871 0 | 0.932 0 | 0.902 0 |
DVSH[ | 0.767 3 | 0.755 2 | 0.761 3 |
SSAH[ | 0.578 0 | 0.577 0 | 0.578 0 |
CYC-DGH[ | 0.859 0 | 0.781 0 | 0.820 0 |
模型 | 文本查询 | 图像查询 | 平均值 |
---|---|---|---|
SCM (semantic correlation matching)[ | 0.226 | 0.227 | 0.252 |
deep-SM[ | 0.354 | 0.398 | 0.376 |
MSFN+TextNet[ | 0.453 | 0.518 | 0.486 |
MASLN[ | 0.499 | 0.623 | 0.561 |
MASLN[ | 0.287 | 0.331 | 0.309 |
DSCMR[ | 0.478 | 0.521 | 0.499 |
ACMR[ | 0.489 | 0.619 | 0.546 |
CYC-DGH[ | 0.826 | 0.820 | 0.823 |
Table 9 MAP comparison of different existing algorithms for Wikipedia
模型 | 文本查询 | 图像查询 | 平均值 |
---|---|---|---|
SCM (semantic correlation matching)[ | 0.226 | 0.227 | 0.252 |
deep-SM[ | 0.354 | 0.398 | 0.376 |
MSFN+TextNet[ | 0.453 | 0.518 | 0.486 |
MASLN[ | 0.499 | 0.623 | 0.561 |
MASLN[ | 0.287 | 0.331 | 0.309 |
DSCMR[ | 0.478 | 0.521 | 0.499 |
ACMR[ | 0.489 | 0.619 | 0.546 |
CYC-DGH[ | 0.826 | 0.820 | 0.823 |
模型 | 文本查询 | 图像查询 | 平均值 |
---|---|---|---|
DeCAF[ | 0.409 | 0.486 | 0.448 |
deep-SM[ | 0.776 | 0.823 | 0.800 |
MSFN+TextNet[ | 0.453 | 0.518 | 0.486 |
MASLN[ | 0.552 | 0.550 | 0.551 |
MASLN[ | 0.284 | 0.291 | 0.287 |
DSCMR[ | 0.615 | 0.611 | 0.613 |
ACMR[ | 0.538 | 0.544 | 0.541 |
DCMH[ | 0.691 | 0.644 | 0.667 |
SSAH[ | 0.683 | 0.639 | 0.661 |
Table 10 MAP comparison of different existing algorithms for NUS-WIDE
模型 | 文本查询 | 图像查询 | 平均值 |
---|---|---|---|
DeCAF[ | 0.409 | 0.486 | 0.448 |
deep-SM[ | 0.776 | 0.823 | 0.800 |
MSFN+TextNet[ | 0.453 | 0.518 | 0.486 |
MASLN[ | 0.552 | 0.550 | 0.551 |
MASLN[ | 0.284 | 0.291 | 0.287 |
DSCMR[ | 0.615 | 0.611 | 0.613 |
ACMR[ | 0.538 | 0.544 | 0.541 |
DCMH[ | 0.691 | 0.644 | 0.667 |
SSAH[ | 0.683 | 0.639 | 0.661 |
模型 | 文本查询 | 图像查询 |
---|---|---|
DVSH[ | 0.680 6 | 0.723 6 |
CYC-DGH[ | 0.837 0 | 0.832 0 |
Table 11 MAP comparison of different existing algorithms for IAPRTC-12
模型 | 文本查询 | 图像查询 |
---|---|---|
DVSH[ | 0.680 6 | 0.723 6 |
CYC-DGH[ | 0.837 0 | 0.832 0 |
模型 | 文本检索 | 图像检索 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
top@1 | top@2 | top@10 | top@20 | top@100 | top@1 | top@2 | top@10 | top@20 | top@100 | |
MSDS[ | 0.256 0 | 0.364 0 | 0.626 0 | 0.740 0 | 0.921 0 | 0.255 0 | 0.362 0 | 0.633 0 | 0.742 0 | 0.919 0 |
DBRLM [ | 0.207 0 | 0.296 0 | 0.557 0 | 0.673 0 | 0.895 0 | 0.201 0 | 0.292 0 | 0.567 0 | 0.682 0 | 0.904 0 |
Table 12 top@ k comparison of different existing algorithms for IAPRTC-12
模型 | 文本检索 | 图像检索 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
top@1 | top@2 | top@10 | top@20 | top@100 | top@1 | top@2 | top@10 | top@20 | top@100 | |
MSDS[ | 0.256 0 | 0.364 0 | 0.626 0 | 0.740 0 | 0.921 0 | 0.255 0 | 0.362 0 | 0.633 0 | 0.742 0 | 0.919 0 |
DBRLM [ | 0.207 0 | 0.296 0 | 0.557 0 | 0.673 0 | 0.895 0 | 0.201 0 | 0.292 0 | 0.567 0 | 0.682 0 | 0.904 0 |
[1] |
MCGURK H, MACDONALD H. Hearing lips and seeing voices[J]. Nature, 1976, 264(5588): 746-748.
DOI URL |
[2] | ATREY P K, HOSSAIN M A, SADDIK A EI, et al. Multi-modal fusion for multimedia analysis: a survey[J]. Multi-media Systems, 2010, 16(6): 345-379. |
[3] |
JIANG X, WU F, ZHANG Y, et al. The classification of multi-modal data with hidden conditional random field[J]. Pattern Recognition Letters, 2015, 51: 63-69.
DOI URL |
[4] | WANG D X, CUI P, OU M D, et al. Deep multimodal Hash-ing with orthogonal regularization[C]// Proceedings of the 24th International Joint Conference on Artificial Intelligence, Buenos Aires, Jul 25-31, 2015. Menlo Park: AAAI, 2015: 2291-2297. |
[5] | BALTRUSAITIS T, AHUJA C, MORENCY L P. Multimodal machine learning: a survey and taxonomy[J]. IEEE Transac-tions on Pattern Analysis and Machine Intelligence, 2019, 41(2): 423-443. |
[6] | RAMACHANDRAM D, TAYLOR G W. Deep multimodal learning: a survey on recent advances and trends[J]. IEEE Signal Processing Magazine, 2017, 34(6): 96-108. |
[7] | 欧卫华, 刘彬, 周永辉, 等. 跨模态检索研究综述[J]. 贵州师范大学学报(自然科学版), 2018, 36(2): 114-120. |
OU W H, LIU B, ZHOU Y H, et al. Survey on the cross-modal retrieval research[J]. Journal of Guizhou Normal Uni-versity (Natural Sciences), 2018, 36(2): 114-120. | |
[8] | WANG J, HE Y H, KANG C C, et al. Image-text cross-modal retrieval via modality-specific feature learning[C]// Proceed-ings of the 5th ACM on International Conference on Multi-media Retrieval, Shanghai, Jun 23-26, 2015. New York: ACM, 2015: 347-354. |
[9] |
OTTO C, SPRINGSTEIN M, ANAND A, et al. Characteri-zation and classification of semantic image-text relations[J]. International Journal of Multimedia Information Retrieval, 2020, 9(1): 31-45.
DOI URL |
[10] |
HAROLD H. Relations between two sets of variates[J]. Biometrika, 1936, 28(3): 321-377.
DOI URL |
[11] | LI D G, DIMITROVA N, LI M K, et al. Multimedia content processing through cross-modal association[C]// Proceedings of the 11th ACM International Conference on Multimedia, Berkeley, Nov 2-8, 2003. New York: ACM, 2003: 604-611. |
[12] | RASIWASIA N, PEREIRA J C, COVIELLO E, et al. A new approach to cross-modal multimedia retrieval[C]// Proceed-ings of the 18th International Conference on Firenze, Oct 25-29, 2010. New York: ACM, 2010: 251-260. |
[13] |
JI Z Y, YAO W N, WEI W, et al. Deep multi-level semantic Hashing for cross-modal retrieval[J]. IEEE Access, 2019, 7: 23667-23674.
DOI URL |
[14] | WANG C, YANG H J, MEINEL C. Deep semantic mapping for cross-modal retrieval[C]// Proceedings of the 27th Inter-national Conference on Tools with Artificial Intelligence, Vietri sul Mare, Nov 9-11, 2015. Washington: IEEE Computer Society, 2015: 234-241. |
[15] | FENG F X, WANG X J, LI R F. Cross-modal retrieval with correspondence autoencoder[C]// Proceedings of the 2014 ACM International Conference on Multimedia, Orlando, Nov 3-7, 2014. New York: ACM, 2014: 7-16. |
[16] | 冯方向. 基于深度学习的跨模态检索研究[D]. 北京: 北京邮电大学, 2015. |
FENG F X. Deep learning for cross-modal retrieval[D]. Bei-jing: Beijing University of Posts and Telecommunications, 2015. | |
[17] | WANG K, YIN Q, WANG W, et al. A comprehensive survey on cross-modal retrieval[J]. arXiv:1607.06215, 2016. |
[18] |
PENG Y, HUANG X, ZHAO Y. An overview of cross-media retrieval: concepts, methodologies, benchmarks, and chall-enges[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(9): 2372-2385.
DOI URL |
[19] | 李志义, 黄子风, 许晓绵. 基于表示学习的跨模态检索模型与特征抽取研究综述[J]. 情报学报, 2018, 37(4): 422-435. |
LI Z Y, HUANG Z F, XU X M. A review of the cross-modal retrieval model and feature extraction based on representa-tion learning[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(4): 422-435. | |
[20] | AYYAVARAIAH M, VENKATESWARLU B. Joint graph regularization based semantic analysis for cross-media retri-eval: a systematic review[J]. International Journal of Eng-ineering & Technology, 2018, 7: 257-261. |
[21] | AYYAVARAIAH M, VENKATESWARLU B. Cross media feature retrieval and optimization: a contemporary review of research scope, challenges and objectives[C]// Proceedings of the 3rd International Conference on Computational Vision and Bio Inspired Computing, Coimbatore, Sep 25-26, 2019. Cham: Springer, 2019: 1125-1136. |
[22] | 邵杰. 基于深度学习的跨模态检索[D]. 北京: 北京邮电大学, 2017. |
SHAO J. Cross-modal retrieval based on deep learning[D]. Beijing: Beijing University of Posts and Telecommunica-tions, 2017. | |
[23] | 赵天. 基于深度学习的跨模态图文检索方法研究[D]. 桂林: 桂林电子科技大学, 2019. |
ZHAO T. Research on the method of cross-modal image and text retrieval based on deep learning[D]. Guilin: Guilin University of Electronic Technology, 2019. | |
[24] |
KAUR P, PANNU H S, MALHI A K. Comparative analysis on cross-modal information retrieval: a review[J]. Computer Science Review, 2021, 39(2): 100336.
DOI URL |
[25] |
HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507.
DOI URL |
[26] |
HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2014, 18(7): 1527-1554.
DOI URL |
[27] | SRIVASTAVA N, SALAHUTDINOV R. Multimodal learn-ing with deep Boltzmann machines[C]// Proceedings of the 26th Annual Conference on Neural Information Processing Systems, Lake Tahoe, Dec 3-6, 2012. Red Hook: Curran Asso-ciates, 2012: 2231-2239. |
[28] | HERMANN K M, KOCISKÝ T, GREFENSTETTE E, et al. Teaching machines to read and comprehend[C]// Proceedings of the Annual Conference on Neural Information Processing Systems, Montreal, Dec 7-12, 2015. Red Hook: Curran Asso-ciates, 2015: 1693-1701. |
[29] | MAO J H, XU W, YANG Y, et al. Explain images with multi-modal recurrent neural networks[J]. arXiv:1410.1090, 2014. |
[30] | GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]// Proceedings of the 27th In-ternational Conference on Neural Information Processing Systems, Montreal, Dec 8-13, 2014. Red Hook: Curran Asso-ciates, 2014: 2672-2680. |
[31] |
LECUN Y, BOTTOU L, BEBGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
DOI URL |
[32] | GOODFELLOW I, BENGIO Y, COURVILLE A. Deep learning[M]. Cambridge: MIT Press, 2016. |
[33] |
GU J X, WANG Z H, KUEN J, et al. Recent advances in convolutional neural networks[J]. Pattern Recognition, 2018, 77: 354-377.
DOI URL |
[34] |
KRIZHEVSKY A, SUTSKEVER I, HINTON G. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
DOI URL |
[35] | SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Washington: IEEE Computer Society, 2015: 1-9. |
[36] | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409.1556, 2014. |
[37] | XIE S N, GIRSHICK R B, DOLLAR P, et al. Aggregated residual transformations for deep neural networks[C]// Pro-ceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Wash-ington: IEEE Computer Society, 2017: 1492-1500. |
[38] | HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 2261-2269. |
[39] | JIA Y Q, SHELHAMER E, DONAHUE J, et al. Caffe: con-volutional architecture for fast feature embedding[C]// Pro-ceedings of the 22nd ACM International Conference on Multi-media, Orlando, Nov 3-7, 2014. New York: ACM, 2014: 675-678. |
[40] | PASZKE A, GROSS S, MASSA F, et al. PyTorch: an imper-ative style, high-performance deep learning library[C]// Pro-ceedings of the Annual Conference on Neural Information Processing Systems, Vancouver, Dec 8-14, 2019: 8026-8037. |
[41] | ABADI M, AGARWAL A, BARHAM P, et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems[J]. arXiv:1603.04467, 2016. |
[42] |
HOCHREITER S, SCHMIDHUBER J. Long short-term me-mory[J]. Neural Computation, 1997, 9(8): 1735-1780.
DOI URL |
[43] |
SCHUSTER M, PALIWAL K K. Bidirectional recurrent ne-ural networks[J]. IEEE Transactions on Signal Processing, 1997, 45(11): 2673-2681.
DOI URL |
[44] | KIM Y. Convolutional neural networks for sentence classifi-cation[J]. arXiv:1408.5882, 2014. |
[45] |
GIRSHICK R, DONAHUE J, DARRELL T, et al. Region-based convolutional networks for accurate object detection and segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 38(1): 142-158.
DOI URL |
[46] | HE Y, XIANG S, KANG C, et al. Cross-modal retrieval via deep and bidirectional representation learning[J]. IEEE Tran-sactions on Multimedia, 2016, 18(7): 1363-1377. |
[47] | LI Z, LU W, BAO E, et al. Learning a semantic space by deep network for cross-media retrieval[C]// Proceedings of the 21st International Conference on Distributed Multimedia Systems, Vancouver, Aug 31-Sep 2, 2015. Skokie: Knowledge Systems Institute, 2015: 199-203. |
[48] | WEI Y, ZHAO Y, LU C, et al. Cross-modal retrieval with CNN visual features: a new baseline[J]. IEEE Transactions on Cybernetics, 2017, 47(2): 449-460. |
[49] | FAN M D, WANG W M, DONG P L, et al. Cross-media re-trieval by learning rich semantic embeddings of multimedia[C]// Proceedings of the 2017 ACM on Multimedia Confer-ence, Mountain View, Oct 23-27, 2017. New York: ACM, 2017: 1698-1706. |
[50] | KARPATHY A, LI F F. Deep visual-semantic alignments for generating image descriptions[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion, Boston, Jun 8-10, 2015. Washington: IEEE Computer Society, 2015: 3128-3137. |
[51] | MALINOWSKI M, ROHRBACH M, FRITZ M. Ask your neurons: a neural-based approach to answering questions about images[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 1-9. |
[52] | GAO D H, JIN L B, CHEN B, et al. FashionBERT: text and image matching with adaptive loss for cross-modal retrieval[C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul 25-30, 2020. New York: ACM, 2020: 2251-2260. |
[53] | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understand-ing[J]. arXiv:1810.04805, 2018. |
[54] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the Annual Conference on Neural Information Processing Systems, Long Beach, Dec 4-9, 2017. Cambridge: MIT Press, 2017: 5998-6008. |
[55] | LEE K H, CHEN X, HUA G, et al. Stacked cross attention for image-text matching[C]// LNCS 11208: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 212-228. |
[56] | ZHANG Q, LEI Z, ZHANG Z X, et al. Context-aware atten-tion network for image-text retrieval[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 16-20, 2020. Piscataway: IEEE, 2020: 3533-3542. |
[57] | CHEN H, DING G G, LIU X D, et al. IMRAM: iterative matching with recurrent attention memory for cross-modal image-text retrieval[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle, Jun 16-20, 2020. Piscataway: IEEE, 2020: 12652-12660. |
[58] | NGIAM J, KHOSLA A, KIM M, et al. Multimodal deep learning[C]// Proceedings of the 28th International Confer-ence on Machine Learning, Bellevue, Jun 28-Jul 2, 2011. Madison: Omnipress, 2011: 689-696. |
[59] | KIROS R, SALAKHUTDINOV R, ZEMEL R S. Unifying visual-semantic embeddings with multimodal neural language models[J]. arXiv:1411.2539, 2014. |
[60] | XU X, SONG J K, LU H M, et al. Modal-adversarial semantic learning network for extendable cross-modal retrieval[C]// Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, Yokohama, Jun 11-14, 2018. New York: ACM, 2018: 46-54. |
[61] |
CORNIA M, BARALDI L, TAVAKOLI H R, et al. A unified cycle-consistent neural model for text and image retrieval[J]. Multimedia Tools and Applications, 2020, 79(35): 25697-25721.
DOI URL |
[62] | CASTREJÓN L, AYTAR Y, VONDRICK C, et al. Learning aligned cross-modal representations from weakly aligned data[C]// Proceedings of the 2016 IEEE Conference on Com-puter Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 2940-2949. |
[63] | ZHEN L L, HU P, WANG X, et al. Deep supervised cross-modal retrieval[C]// Proceedings of the 2019 IEEE/CVF Con-ference on Computer Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 10394-10403. |
[64] | SONG Y, SOLEYMANI M. Polysemous visual-semantic embedding for cross-modal retrieval[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 1979-1988. |
[65] | GU J X, CAI J F, JOTY S R, et al. Look, imagine and match: improving textual-visual cross-modal retrieval with genera-tive models[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Piscataway: IEEE, 2018: 7181-7189. |
[66] | WANG B K, YANG Y, XU X, et al. Adversarial cross-modal retrieval[C]// Proceedings of the 2017 ACM on Multimedia Conference, Mountain View, Oct 23-27, 2017. New York: ACM, 2017: 154-162. |
[67] |
SALAKHUTDINOV R, HINTON G E. Semantic Hashing[J]. International Journal of Approximate Reasoning, 2009, 50(7): 969-978.
DOI URL |
[68] | XIA R K, PAN Y, LAI H J, et al. Supervised Hashing for image retrieval via image representation learning[C]// Pro-ceedings of the 28th AAAI Conference on Artificial Intelli-gence, Québec, Jul 27-31, 2014. Menlo Park: AAAI, 2014: 2156-2162. |
[69] | LIN K, YANG H F, HSIAO J H, et al. Deep learning of binary Hash codes for fast image retrieval[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Rec-ognition Workshops, Boston, Jun 7-12, 2015. Washington: IEEE Computer Society, 2015: 27-35. |
[70] | ZHAO F, HUANG Y Z, WANG L, et al. Deep semantic ranking based Hashing for multi-label image retrieval[C]// Proceedings of the 2015 IEEE Conference on Computer Vi-sion and Pattern Recognition, Boston, Jun 8-10, 2015. Wash-ington: IEEE Computer Society, 2015: 1556-1564. |
[71] | ZHANG D, WANG F, SI L. Composite Hashing with multiple information sources[C]// Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, Jul 25-29, 2011. New York: ACM, 2011: 225-234. |
[72] | CAO Y, LONG M S, WANG J M, et al. Deep visual-semantic Hashing for cross-modal retrieval[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining, San Francisco, Aug 13-17, 2016. New York: ACM, 2016: 1445-1454. |
[73] | JIANG Q Y, LI W J. Deep cross-modal Hashing[C]// Proceed-ings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 3270-3278. |
[74] | LI C, DENG C, LI N, et al. Self-supervised adversarial Hash-ing networks for cross-modal retrieval[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Pis-cataway: IEEE, 2018: 4242-4251. |
[75] |
WU L, WANG Y, SHAO L. Cycle-consistent deep generative Hashing for cross-modal retrieval[J]. IEEE Transactions on Image Processing, 2019, 28(4): 1602-1612.
DOI URL |
[76] | CHUA T S, TANG J H, HONG R C, et al. NUS-WIDE: a real-world web image database from national University of Singapore[C]// Proceedings of the 8th ACM International Conference on Image and Video Retrieval, Santorini Island, Jul 8-10, 2009. New York: ACM, 2009: 1-9. |
[77] | LIN T Y, MAIRE M, BELONGIE S J, et al. Microsoft COCO: common objects in context[C]// LNCS 8693: Proceedings of the 13th European Conference on Computer Vision, Zurich, Sep 6-12, 2014. Cham: Springer, 2014: 740-755. |
[78] |
YOUNG P, LAI A, HODOSH M, et al. From image descri-ptions to visual denotations: new similarity metrics for semantic inference over event descriptions[J]. Transactions of the Association for Computational Linguistics, 2014, 2: 67-78.
DOI URL |
[79] | PEREIRA J C, COVIELLO E, DOYLE G, et al. On the role of correlation and abstraction in cross-modal multimedia re-trieval[J]. IEEE Transactions on Pattern Analysis and Mach-ine Intelligence, 2014, 36(3): 521-535. |
[80] | HENNING M, CLOUGH P, MÜLLER H, et al. The IAPR benchmark: a new evaluation resource for visual informa-tion systems[C]// Proceedings of the 2006 International Con-ference on Language Resources and Evaluation, Genova, May 24-26, 2006. European Language Resources Associa-tion, 2006: 13-23. |
[81] | 刘德鹏. 互联网舆情监控分析系统的研究与实现[D]. 成都: 电子科技大学, 2011. |
LU D P. Design and implementation of Internet opinions monitoring and analyzing system[D]. Chengdu: University of Electronic Science and Technology of China, 2011. | |
[82] | 何奕江. 社交网络跨媒体国民安全事件语义学习与行为分析研究[D]. 北京: 北京邮电大学, 2018. |
HE Y J. Social network cross-media national security incid-ents semantic study and behavioral analysis[D]. Beijing: Beijing University of Posts and Telecommunications, 2018. | |
[83] | 申自强. 基于文本和图像的舆情分析方法研究[D]. 镇江: 江苏大学, 2018. |
SHEN Z Q. Research on public opinion analysis methods based on text and image[D]. Zhenjiang: Jiangsu University, 2018. | |
[84] |
HUANG F R, ZHANG X M, ZHAO Z H, et al. Image-text sentiment analysis via deep multimodal attentive fusion[J]. Knowledge-Based Systems, 2019, 167: 26-37.
DOI URL |
[85] | 曾倩倩, 张婷婷. 基于大数据的图像检索技术在侦查中的应用[J]. 电子技术与软件工程, 2018, 143(21): 175-176. |
ZENG Q Q, ZHANG T T. Application of image retrieval technology based on big data in investigation[J]. Electronic Technology & Software Engineering, 2018, 143(21): 175-176. | |
[86] | 乔凤才. 图文特征联合的证据图像检索技术研究[D]. 长沙: 国防科学技术大学, 2013. |
QIAO F C. Research on the technology of evidence images retrieval using visual and text features[D]. Changsha: Na-tional University of Defense Technology, 2013. | |
[87] | VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: a neural image caption generator[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recogni-tion, Boston, Jun 8-10, 2015. Washington: IEEE Computer Society, 2015: 3156-3164. |
[88] | VENUGOPALAN S, XU H, DONAHUE J, et al. Translat-ing videos to natural language using deep recurrent neural networks[J]. arXiv:1412.4729, 2014. |
[89] | LAN Z Z, BAO L, YU S I, et al. Multimedia classification and event detection using double fusion[J]. Multimedia Tools & Applications, 2014, 71(1): 333-347. |
[90] | CAI G Y, XIA B B. Convolutional neural networks for multi-media sentiment analysis[C]// LNCS 9362: Proceedings of the 4th CCF Conference on Natural Language Processing and Chinese Computing, Nanchang, Oct 9-13, 2015. Cham: Springer, 2015: 159-167. |
[91] | 凌海彬, 缪裕青, 张万桢, 等. 多特征融合的图文微博情感分析[J]. 计算机应用研究, 2020, 37(7): 1935-1939. |
LIN H B, MIAO Y Q, ZHANG W Z, et al. Multimedia sen-timent analysis on microblog based on multi-feature fusion[J]. Application Research of Computers, 2020, 37(7): 1935-1939. | |
[92] | CHEN X Y, WANG Y H, LIU Q J. Visual and textual senti-ment analysis using deep fusion convolutional neural net-works[C]// Proceedings of the 2017 IEEE International Con-ference on Image Processing, Beijing, Sep 17-20, 2017. Pis-cataway: IEEE, 2017: 1557-1561. |
[93] | ZHANG Y Y, LI G R, CHU L Y, et al. Cross-media topic detection: a multi-modality fusion framework[C]// Proceed-ings of the 2013 IEEE International Conference on Multi-media and Expo, San Jose, Jul 15-19, 2013. Washington: IEEE Computer Society, 2013: 1-6. |
[94] | YUAN Z Q, SANG J T, LIU Y, et al. Latent feature learning in social media network[C]// Proceedings of the 21st ACM International Conference on Multimedia, Barcelona, Oct 21-25, 2013. New York: ACM, 2013: 253-262. |
[95] | 于玉海, 林鸿飞, 孟佳娜, 等. 跨模态多标签生物医学图像分类建模识别[J]. 中国图象图形学报, 2018, 23(6): 917-927. |
YU Y H, LIN H F, MENG J N, et al. Classification modeling and recognition for cross modal and multi-label biomedical image[J]. Journal of Image and Graphics, 2018, 23(6): 917-927. | |
[96] | COOKE R E, GAETA M G, KAUFMAN D M, et al. Picture archiving and communication system: US6574629 B1[P]. 2003-06-03. |
[97] | 翟霄. 基于模态网络模型的医学数据检索技术[D]. 哈尔滨: 哈尔滨工程大学, 2017. |
ZHAI X. Medical data retrieval technology based on mode network[D]. Harbin: Harbin Engineering University, 2017. | |
[98] | ZHEN Y, YEUNG D Y. Co-regularized Hashing for multi-modal data[C]// Proceedings of the 26th Annual Conference on Neural Information Processing Systems, Lake Tahoe, Dec 3-6, 2012. Red Hook: Curran Associates, 2012: 1385-1393. |
[1] | AN Fengping, LI Xiaowei, CAO Xiang. Medical Image Classification Algorithm Based on Weight Initialization-Sliding Window CNN [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1885-1897. |
[2] | ZENG Fanzhi, XU Luqian, ZHOU Yan, ZHOU Yuexia, LIAO Junwei. Review of Knowledge Tracing Model for Intelligent Education [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1742-1763. |
[3] | LIU Yi, LI Mengmeng, ZHENG Qibin, QIN Wei, REN Xiaoguang. Survey on Video Object Tracking Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1504-1515. |
[4] | ZHAO Xiaoming, YANG Yijiao, ZHANG Shiqing. Survey of Deep Learning Based Multimodal Emotion Recognition [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1479-1503. |
[5] | XIA Hongbin, XIAO Yifei, LIU Yuan. Long Text Generation Adversarial Network Model with Self-Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1603-1610. |
[6] | SUN Fangwei, LI Chengyang, XIE Yongqiang, LI Zhongbo, YANG Caidong, QI Jin. Review of Deep Learning Applied to Occluded Object Detection [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1243-1259. |
[7] | LIU Yafen, ZHENG Yifeng, JIANG Lingyi, LI Guohe, ZHANG Wenjie. Survey on Pseudo-Labeling Methods in Deep Semi-supervised Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1279-1290. |
[8] | CHENG Weiyue, ZHANG Xueqin, LIN Kezheng, LI Ao. Deep Convolutional Neural Network Algorithm Fusing Global and Local Features [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1146-1154. |
[9] | ZHONG Mengyuan, JIANG Lin. Review of Super-Resolution Image Reconstruction Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 972-990. |
[10] | PEI Lishen, ZHAO Xuezhuan. Survey of Collective Activity Recognition Based on Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 775-790. |
[11] | XU Jia, WEI Tingting, YU Ge, HUANG Xinyue, LYU Pin. Review of Question Difficulty Evaluation Approaches [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 734-759. |
[12] | ZHU Weijie, CHEN Ying. Micro-expression Recognition Convolutional Network for Dual-stream Temporal-Domain Information Interaction [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 950-958. |
[13] | JIANG Yi, XU Jiajie, LIU Xu, ZHU Junwu. Research on Edge-Guided Image Repair Algorithm [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(3): 669-682. |
[14] | ZHANG Quangui, HU Jiayan, WANG Li. One Class Collaborative Filtering Recommendation Algorithm Coupled with User Common Characteristics [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(3): 637-648. |
[15] | LIU Liping, SUN Jian, GAO Shiyan. Overview of Blind Deblurring Methods for Single Image [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(3): 552-564. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/