[1] 刘颖, 郭莹莹, 房杰, 等. 深度学习跨模态图文检索研究综述[J]. 计算机科学与探索, 2022, 16(3): 489-511.
LIU Y, GUO Y Y, FANG J, et al. Survey of research on deep learning image-text cross-modal retrieval[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(3): 489-511.
[2] LU J, BATRA D, PARIKH D, et al. VilBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks[C]//Advances in Neural Information Processing Systems 32, Vancouver, Dec 8-14, 2019: 13-23.
[3] TAN H, BANSAL M. LXMERT: learning cross-modality encoder representations from transformers[J]. arXiv:1908. 07490, 2019.
[4] LI L H, YATSKAR M, YIN D, et al. VisualBERT: a simple and performant baseline for vision and language[J]. arXiv:1908.03557, 2019.
[5] KIRKPATRICK J, PASCANU R, RABINOWITZ N, et al. Overcoming catastrophic forgetting in neural networks[J]. Proceedings of the National Academy of Sciences, 2017, 114(13): 3521-3526.
[6] DUMFORD J, SCHEIRER W. Backdooring convolutional neural networks via targeted weight perturbations[C]//Proceedings of the 2020 IEEE International Joint Conference on Biometrics. Piscataway: IEEE, 2020: 1-9.
[7] COSTALES R, MAO C, NORWITZ R, et al. Live trojan attacks on deep neural networks[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 796-797.
[8] 钱汉伟, 孙伟松. 深度神经网络中的后门攻击与防御技术综述[J]. 计算机科学与探索, 2023, 17(5): 1038-1048.
QIAN H W, SUN W S. Survey on backdoor attacks and countermeasures in deep neural network[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(5): 1038-1048.
[9] CARLINI N, TERZIS A. Poisoning and backdooring contrastive learning[C]//Proceedings of the 10th International Conference on Learning Representations, Apr 25-29, 2022.
[10] WALMER M, SIKKA K, SUR I, et al. Dual-key multimodal backdoors for visual question answering[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 15375-15385.
[11] YAN F, MIKOLAJCZYK K. Deep correlation for matching images and text[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 3441-3450.
[12] FENG F, WANG X, LI R. Cross-modal retrieval with correspondence autoencoder[C]//Proceedings of the 22nd ACM International Conference on Multimedia. New York: ACM, 2014: 7-16.
[13] CASTREJON L, AYTAR Y, VONDRICK C, et al. Learning aligned cross-modal representations from weakly aligned data[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 2940-2949.
[14] GU W, GU X, GU J, et al. Adversary guided asymmetric hashing for cross-modal retrieval[C]//Proceedings of the 2019 International Conference on Multimedia Retrieval. New York: ACM, 2019: 159-167.
[15] YANG D, WU D, ZHANG W, et al. Deep semantic-alignment hashing for unsupervised cross-modal retrieval[C]//Proceedings of the 2020 International Conference on Multimedia Retrieval. New York: ACM, 2020: 44-52.
[16] YAO H L, ZHAN Y W, CHEN Z D, et al. Teach: attention-aware deep cross-modal hashing[C]//Proceedings of the 2021 International Conference on Multimedia Retrieval. New York: ACM, 2021: 376-384.
[17] KARPATHY A, LI F F. Deep visual-semantic alignments for generating image descriptions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2015: 3128-3137.
[18] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Image-Net classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[19] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2015: 1-9.
[20] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 770-778.
[21] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. arXiv:1409.0473, 2014.
[22] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the 38th International Conference on Machine Learning, Jul 18-24, 2021: 8748-8763.
[23] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[C]//Proceedings of the 9th International Conference on Learning Representations, Austria, May 3-7, 2021: 1-6.
[24] LI J, LI D, XIONG C, et al. Blip: bootstrapping language-image pre-training for unified vision-language understanding and generation[C]//Proceedings of the 2022 International Conference on Machine Learning, Baltimore, Jul 17-23, 2022: 12888-12900.
[25] HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 6840-6851.
[26] SONG J, MENG C, ERMON S. Denoising diffusion implicit models[J]. arXiv:2010.02502, 2020.
[27] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 10684-10695.
[28] GU T, LIU K, DOLAN-GAVITT B, et al. BadNets: evaluating backdooring attacks on deep neural networks[J]. IEEE Access, 2019, 7: 47230-47244.
[29] LI S, XUE M, ZHAO B Z H, et al. Invisible backdoor attacks on deep neural networks via steganography and regularization[J]. IEEE Transactions on Dependable and Secure Computing, 2020, 18(5): 2088-2105.
[30] ZHONG H, LIAO C, SQUICCIARINI A C, et al. Backdoor embedding in convolutional neural network models via invisible perturbation[C]//Proceedings of the 10th ACM Conference on Data and Application Security and Privacy. New York: ACM, 2020: 97-108.
[31] TURNER A, TSIPRAS D, MADRY A. Label-consistent backdoor attacks[J]. arXiv:1912.02771, 2019.
[32] LIU Y, MA X, BAILEY J, et al. Reflection backdoor: a natural backdoor attack on deep neural networks[C]//Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 34th: 182-199.
[33] SAHA A, SUBRAMANYA A, PIRSIAVASH H. Hidden trigger backdoor attacks[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2020: 11957-11965.
[34] ZHAO S, MA X, ZHENG X, et al. Clean-label backdoor attacks on video recognition models[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 14443-14452.
[35] WANG S, NEPAL S, RUDOLPH C, et al. Backdoor attacks against transfer learning with pre-trained deep learning models[J]. IEEE Transactions on Services Computing, 2020, 15(3): 1526-1539.
[36] RAKIN A S, HE Z, FAN D. TBT: targeted neural network attack with bit trojan[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 13198-13207.
[37] JIA J, LIU Y, GONG N Z. BadEncoder: backdoor attacks to pre-trained encoders in self-supervised learning[C]//Proceedings of the 2022 IEEE Symposium on Security and Privacy. Piscataway: IEEE, 2022: 2043-2059.
[38] KWON H, KIM Y. BlindNet backdoor: attack on deep neural network using blind watermark[J]. Multimedia Tools and Applications, 2022, 81(5): 6217-6234.
[39] YANG Z, IYER N, REIMANN J, et al. Design of intentional backdoors in sequential models[J]. arXiv:1902.09972, 2019.
[40] DAI J, CHEN C, LI Y. A backdoor attack against LSTM-based text classification systems[J]. IEEE Access, 2019, 7: 138872-138878.
[41] CHEN X, SALEM A, CHEN D, et al. BadNL: backdoor attacks against NLP models with semantic-preserving improvements[C]//Proceedings of the Annual Computer Security Applications Conference. New York: ACM, 2021: 554-569.
[42] KURITA K, MICHEL P, NEUBIG G. Weight poisoning attacks on pre-trained models[J]. arXiv:2004.06660, 2020.
[43] SUN L. Natural backdoor attack on text data[J]. arXiv:2006.16176, 2020.
[44] LI L, SONG D, LI X, et al. Backdoor attacks on pre-trained models by layerwise weight poisoning[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2021: 3023-3032.
[45] CHEN X, LIU C, LI B, et al. Targeted backdoor attacks on deep learning systems using data poisoning[J]. arXiv:1712.05526, 2017.
[46] PLUMMER B A, WANG L, CERVANTES C M, et al. Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Washington: IEEE Computer Society, 2015: 2641-2649.
[47] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the 13th European Conference on Computer Vision, Zurich, Sep 6-12, 2014. Cham: Springer, 2014: 740-755.
[48] CAO J, QIAN S, ZHANG H, et al. Global relation-aware attention network for image-text retrieval[C]//Proceedings of the 2021 International Conference on Multimedia Retrieval.New York: ACM, 2021: 19-28.
[49] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 779-788. |