Journal of Frontiers of Computer Science and Technology ›› 2025, Vol. 19 ›› Issue (1): 79-96.DOI: 10.3778/j.issn.1673-9418.2404009
• Constructions and Applications of Large Language Models • Previous Articles Next Articles
YUE Qi, ZHANG Chenkang
Online:
2025-01-01
Published:
2024-12-31
岳颀,张晨康
YUE Qi, ZHANG Chenkang. Survey on Applications of AIGC in Multimodal Scenarios[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(1): 79-96.
岳颀, 张晨康. 多模态场景下AIGC的应用综述[J]. 计算机科学与探索, 2025, 19(1): 79-96.
[1] BROOKS T, HOLYNSKI A, EFROS A A. InstructPix2Pix: learning to follow image editing instructions[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 18392-18402. [2] BENGIO Y, COURVILLE A, VINCENT P. Representation learning: a review and new perspectives[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1798-1828. [3] ANTOL S, AGRAWAL A, LU J, et al. VQA: visual question answering[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Washington: IEEE Computer Society, 2015: 2425-2433. [4] WANG J, SHEN H T, SONG J, et al. Hashing for similarity search: a survey[EB/OL]. [2024-03-02]. https://arxiv.org/abs/1408.2927. [5] KARPATHY A, JOULIN A, LI F F, et al. Deep fragment embeddings for bidirectional image sentence mapping[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2. New York: ACM, 2014: 1889-1897. [6] BUBECK S, MUNOS R, STOLTZ G, et al. Online optimization in X-armed bandits[C]//Proceedings of the 22nd Annual Conference on Neural Information Processing Systems, 2008: 201-208. [7] ZENG Z, PANTIC M, ROISMAN G I, et al. A survey of affect recognition methods: audio, visual, and spontaneous expressions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(1): 39-58. [8] YIN Y J, MENG F D, SU J S, et al. A novel graph-based multi-modal fusion encoder for neural machine translation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 3025-3035. [9] SPECIA L, FRANK S, SIMA’AN K, et al. A shared task on multimodal machine translation and CrosslingualImage description[C]//Proceedings of the 1st Conference on Machine Translation. Stroudsburg: ACL, 2016: 543-553. [10] EHRESMANN A C, BÉJEAN M, VANBREMEERSCH J P. A mathematical framework for enriching human-machine interactions[J]. Machine Learning and Knowledge Extraction, 2023, 5(2): 597-610. [11] RADFORD A, NARASIMHAN K. Improving language under-standing by generative pre-training[EB/OL]. (2018)[2024-03-10]. https://api.semanticscholar.org/CorpusID:49313245. [12] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4171-4186. [13] SUN Y, ZHENG Y, HAO C, et al. NSP-BERT: a prompt-based zero-shot learner through an original pre-training task-next sentence prediction[C]//Proceedings of the 29th International Conference on Computational Linguistics, 2022: 3233-3250. [14] XIA W H, ZHANG Y L, YANG Y J, et al. GAN inversion: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 45(3): 3121-3138. [15] LEE S H, BAE S H. AFI-GAN: improving feature interpolation of feature pyramid networks via adversarial training for object detection[J]. Pattern Recognition, 2023, 138: 109365. [16] SOLANO-CARRILLO E, RODRIGUEZ A B, CARRILLO-PEREZ B, et al. Look ATME: the discriminator mean entropy needs attention[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2023: 787-796. [17] RANGWANI H, BANSAL L, SHARMA K, et al. Noisy-Twins: class-consistent and diverse image generation through StyleGANs[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 5987-5996. [18] LIU H Y, SONG Y B, CHEN Q F. Delving StyleGAN inversion for image editing: a foundation latent space viewpoint[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 10072-10082. [19] NGUYEN T H, VAN LE T, TRAN A. Efficient scale-invariant generator with column-row entangled pixel synthesis[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 22408-22417. [20] ZHANG L, RAO A, AGRAWALA M. Adding conditional control to text-to-image diffusion models[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 3813-3824. [21] HUI M D, ZHANG Z Z, ZHANG X Y, et al. Unifying layout generation with a decoupled diffusion model[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 1942-1951. [22] INOUE N, KIKUCHI K, SIMO-SERRA E, et al. LayoutDM: discrete diffusion model for controllable layout generation[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 10167-10176. [23] NICHOL A, DHARIWAL P, RAMESH A, et al. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models[C]//Proceedings of the 2022 International Conference on Machine Learning, 2022: 16784-16804. [24] DHARIWAL P, NICHOL A. Diffusion models beat GANs on image synthesis[EB/OL]. [2024-03-02]. https://arxiv.org/abs/2105.05233. [25] GO H, LEE Y, KIM J, et al. Towards practical plug-and-play diffusion models[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 1962-1971. [26] PHUNG H, DAO Q, TRAN A. Wavelet diffusion models are fast and scalable image generators[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 10199-10208. [27] BAO F, NIE S, XUE K W, et al. All are worth words: a ViT backbone for diffusion models[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 22669-22679. [28] WANG H, XIANG X Y, FAN Y C, et al. Customizing 360-degree panoramas through text-to-image diffusion models[EB/OL]. [2024-03-02]. https://arxiv.org/abs/2310.18840. [29] MENG C, ROMBACH R, GAO R Q, et al. On distillation of guided diffusion models[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 14297-14306. [30] TAKAGI Y, NISHIMOTO S. High-resolution image reconstruction with latent diffusion models from human brain activity[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 14453-14463. [31] PAN X, TEWARI A, LEIMKÜHLER T, et al. Drag your GAN: interactive point-based manipulation on the generative image manifold[C]//Proceedings of the ACM SIGGRAPH 2023 Conference. New York: ACM, 2023: 1-11. [32] CHEN X Y, LIU Z J, TANG H T, et al. SparseViT: revisiting activation sparsity for efficient high-resolution vision transformer[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 2061-2070. [33] DONG B, WANG P C, WANG F. Head-free lightweight semantic segmentation with linear transformer[J]. Proceedings of the 37th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2023: 516-524. [34] HO J, SALIMANS T, GRITSENKO A, et al. Video diffusion models[EB/OL]. [2024-03-05]. https://arxiv.org/abs/2204.03458. [35] HARVEY W, NADERIPARIZI S, MASRANI V, et al. Flexible diffusion modeling of long videos[EB/OL]. [2024-03-05]. https://arxiv.org/abs/2205.11495. [36] BLATTMANN A, DOCKHORN T, KULAL S, et al. Stable video diffusion: scaling latent video diffusion models to large datasets[EB/OL]. [2024-03-05]. https://arxiv.org/abs/2311.15127. [37] SINGER U, POLYAK A, HAYES T, et al. Make-A-video: text-to-video generation without text-video data[EB/OL]. [2024-03-06]. https://arxiv.org/abs/2209.14792. [38] ZHOU D Q, WANG W M, YAN H S, et al. MagicVideo: efficient video generation with latent diffusion models[EB/OL]. [2024-03-06]. https://arxiv.org/abs/2211.11018. [39] WANG J N, YUAN H J, CHEN D Y, et al. ModelScope text-to-video technical report[EB/OL]. [2024-03-06]. https://arxiv.org/abs/2308.06571. [40] YANG R H, SRIVASTAVA P, MANDT S. Diffusion probabilistic modeling for video generation[J]. Entropy, 2023, 25(10): 1469. [41] LUO Z X, CHEN D Y, ZHANG Y Y, et al. Notice of removal: videofusion: decomposed diffusion models for high-quality video generation[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 10209-10218. [42] WENG Y T, HAN M F, HE H Y, et al. Mask propagation for efficient video semantic segmentation[EB/OL]. [2024-03-06]. https://arxiv.org/abs/2310.18954. [43] YIN S M, WU C F, YANG H, et al. NUWA-XL: diffusion over diffusion for extremely long video generation[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2023: 1309-1320. [44] CROITORU F A, HONDRU V, IONESCU R T, et al. Diffusion models in vision: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(9): 10850-10869. [45] GE S W, NAH S, LIU G L, et al. Preserve your own correlation: a noise prior for video diffusion models[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 22873-22884. [46] GUO Y W, YANG C Y, RAO A Y, et al. AnimateDiff: animate your personalized text-to-image diffusion models without specific tuning[C]//Proceedings of the 12th International Conference on Learning Representations, 2024. [47] BLATTMANN A, ROMBACH R, LING H, et al. Align your latents: high-resolution video synthesis with latent diffusion models[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 22563-22575. [48] WANG Y H, CHEN X Y, MA X, et al. LAVIE: high-quality video generation with cascaded latent diffusion models[EB/OL]. [2024-03-08]. https://arxiv.org/abs/2309.15103. [49] MEI K F, PATEL V M. VIDM: video implicit diffusion models[EB/OL]. [2024-03-08]. https://arxiv.org/abs/2212.00235. [50] YU S Y, SOHN K, KIM S, et al. Video probabilistic diffusion models in projected latent space[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 18456-18466. [51] XING Z, DAI Q, HU H, et al. SVFormer: semi-supervised video transformer for action recognition[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 18816-18826. [52] VAN DEN OORD A, DIELEMAN S, ZEN H, et al. WaveNet: a generative model for raw audio[C]//Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016: 125. [53] ENGEL J, AGRAWAL K K, CHEN S, et al. GANSynth: adversarial neural audio synthesis[C]//Proceedings of the 7th International Conference on Learning Representations, 2018. [54] RUAN L D, MA Y Y, YANG H, et al. MM-diffusion: learning multi-modal diffusion models for joint audio and video generation[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 10219-10228. [55] SU K, QIAN K Z, SHLIZERMAN E, et al. Physics-driven diffusion models for impact sound synthesis from videos[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 9749-9759. [56] ZHANG W X, CUN X D, WANG X, et al. SadTalker: learning realistic 3D motion coefficients for stylized audio-driven single image talking face animation[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 8652-8661. [57] WANG J D, QIAN X Y, ZHANG M L, et al. Seeing what you said: talking face generation guided by a lip reading expert[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 14653-14662. [58] GUO R H, YING X, CHEN Y R, et al. Audio-visual instance segmentation[EB/OL]. [2024-03-09]. https://arxiv.org/abs/2310.18709. [59] ANCIUKEVIČIUS T, XU Z, FISHER M, et al. Render-Diffusion: image diffusion for 3D reconstruction, inpainting and generation[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 12608-12618. [60] POOLE B, JAIN A, BARRON J T, et al. DreamFusion: text-to-3D using 2D Diffusion[C]//Proceedings of the 11th International Conference on Learning Representations, 2023. [61] SEO J, JANG W, KWAK M S, et al. Let 2D diffusion model know 3D-consistency for robust text-to-3D generation[EB/OL]. [2024-03-09]. https://arxiv.org/abs/2303.07937. [62] YU X, GUO Y C, LI Y G, et al. Text-to-3D with classifier score distillation[C]//Proceedings of the 12th International Conference on Learning Representations, 2024. [63] WANG Z Y, LU C, WANG Y K, et al. ProlificDreamer: high-fidelity and diverse text-to-3D generation with variational score distillation[EB/OL]. [2024-03-12]. https://arxiv.org/abs/2305.16213. [64] WANG H C, DU X D, LI J H, et al. Score Jacobian chaining: lifting pretrained 2D diffusion models for 3D generation[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 12619-12629. [65] CHUNG H, RYU D, MCCANN M T, et al. Solving 3D inverse problems using pre-trained 2D diffusion models[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 22542-22551. [66] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 10674-10685. [67] ZHANG J B, DONG R P, MA K S. CLIP-FO3D: learning free open-world 3D scene representations from 2D dense CLIP[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway: IEEE, 2023: 2040-2051. [68] KIM S W, BROWN B, YIN K, et al. NeuralField-LDM: scene generation with hierarchical latent diffusion models[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 8496-8506. [69] LIN C H, GAO J, TANG L M, et al. Magic3D: high-resolution text-to-3D content creation[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 300-309. [70] FRIDMAN R, ABECASIS A, KASTEN Y, et al. Scene-Scape: text-driven consistent scene generation[EB/OL]. [2024-03-12]. https://arxiv.org/abs/2302.01133. [71] CHENG Y C, LEE H Y, TULYAKOV S, et al. SDFusion: multimodal 3D shape completion, reconstruction, and generation[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 4456-4465. [72] DENG K, YANG G, RAMANAN D, et al. 3D-aware conditional image synthesis[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 4434-4445. [73] WANG C, CHAI M L, HE M M, et al. CLIP-NeRF: text-and-image driven manipulation of neural radiance fields[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 3825-3834. [74] KANAOKA D, SONOGASHIRA M, TAMUKOH H, et al. ManifoldNeRF: view-dependent image feature supervision for few-shot neural radiance fields[EB/OL]. [2024-03-12]. https://arxiv.org/abs/2310.13670. [75] LI G, ZHENG H H, WANG C Y, et al. 3DDesigner: towards photorealistic 3D object generation and editing with text-guided diffusion models[EB/OL]. [2024-03-15]. https://arxiv.org/abs/2211.14108. [76] BAUTISTA M A, GUO P, ABNAR S, et al. GAUDI: a neural architect for immersive 3D scene generation[EB/OL]. [2024-03-15]. https://arxiv.org/abs/2207.13751. [77] TOSI F, TONIONI A, DE GREGORIO D, et al. NeRF-supervised deep stereo[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 855-866. [78] ZHANG Z C, LIU Y L, HAN C Y, et al. Transforming radiance field with lipschitz network for photorealistic 3D scene stylization[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 20712-20721. [79] TERTIKAS K, PASCHALIDOU D, PAN B X, et al. Generating part-aware editable 3D shapes without 3D supervision[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 4466-4478. [80] CHEN Z, FUNKHOUSER T, HEDMAN P, et al. Mobile-NeRF: exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 16569-16578. [81] RAJ A, KAZA S, POOLE B, et al. DreamBooth3D: subject-driven text-to-3D generation[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 2349-2359. [82] JAIN A, MILDENHALL B, BARRON J T, et al. Zero-shot text-guided object generation with dream fields[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 857-866. [83] COHEN-BAR D, RICHARDSON E, METZER G, et al. Set-the-scene: global-local training for generating controllable NeRF scenes[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway: IEEE, 2023: 2912-2921. [84] VAN HOLLAND L, STOTKO P, KRUMPEN S, et al. Efficient 3D reconstruction, streaming and visualization of static and dynamic scene parts for multi-client live-telepresence in large-scale environments[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway: IEEE, 2023: 4260-4274. [85] XUE L, GAO M F, XING C, et al. ULIP: learning a unified representation of language, images, and point clouds for 3D understanding[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 1179-1189. [86] MELAS-KYRIAZI L, RUPPRECHT C, VEDALDI A. PC2: projection-conditioned point cloud diffusion for single-image 3D reconstruction[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 12923-12932. [87] HÖLLEIN L, CAO A, OWENS A, et al. Text2Room: extracting textured 3D meshes from 2D text-to-image models[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 7875-7886. [88] REN S H, DING Y K, LIAO J L, et al. Volumetric 3D reconstruction with window-wise global feature aggregation[C]//Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2023: 1-5. [89] AN S Z, XU H Y, SHI Y C, et al. PanoHead: geometry-aware 3D full-head synthesis in 360°[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 20950-20959. [90] WANG W, BI B, YAN M, et al. StructBERT: incorporating language structures into pre-training for deep language understanding[EB/OL]. [2024-03-18]. https://arxiv.org/abs/1908.04577. [91] DUAN J H, KONG F, WANG S Q, et al. Are diffusion models vulnerable to membership inference attacks[C]//Proceedings of the 2023 International Conference on Machine Learning, 2023: 8717-8730. [92] CARLINI N, HAYES J, NASR M, et al. Extracting training data from diffusion models[EB/OL]. [2024-03-18]. https://arxiv.org/abs/2301.13188. [93] YANG Z Q, ZHANG J, CHANG E C, et al. Neural network inversion in adversarial setting via background knowledge alignment[C]//Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2019: 225-240. [94] TRAMÈR F, ZHANG F, JUELS A, et al. Stealing machine learning models via prediction APIs[C]//Proceedings of the 25th USENIX Security Symposium. Berkeley: USENIX, 2016: 601-618. [95] FREDRIKSON M, JHA S, RISTENPART T. Model inversion attacks that exploit confidence information and basic countermeasures[C]//Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2015: : 1322-1333. [96] MAKKAR A, GHOSH U, SHARMA P K, et al. A fuzzy-based approach to enhance cyber defence security for next-generation IoT[J]. IEEE Internet of Things Journal, 2023, 1(1): 2079-2086. [97] XU J, WU Z, WANG C, et al. Machine unlearning: solutions and challenges[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2024, 8(3): 2150-2168. [98] WANG W X, YIN B J, YAO T P, et al. Delving into data: effectively substitute training for black-box attack[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 4759-4768. [99] KURITA K, MICHEL P, NEUBIG G. Weight poisoning attacks on pre-trained models[EB/OL]. [2024-03-02]. https://arxiv.org/abs/2004.06660. [100] LI L Y, SONG D M, LI X N, et al. Backdoor attacks on pre-trained models by layerwise weight poisoning[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021: 3023-3032. [101] JIN D, JIN Z J, ZHOU J T, et al. Is BERT really robust? A strong baseline for natural language attack on text classification and entailment[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 8018-8025. [102] JIA J Y, LIU Y P, HU Y P, et al. PORE: provably robust recommender systems against data poisoning attacks[EB/OL]. [2024-03-18]. https://arxiv.org/abs/2303.14601. [103] CAO X Y, ZHANG Z X, JIA J Y, et al. FLCert: provably secure federated learning against poisoning attacks[J]. IEEE Transactions on Information Forensics and Security, 2022, 17: 3691-3705. [104] BELTAGY I, PETERS M E, COHAN A. Longformer: the long-document transformer[EB/OL]. [2024-03-18]. https://arxiv.org/abs/2004.05150. [105] LAGUNAS F, CHARLAIX E, SANH V, et al. Block pruning for faster transformers[EB/OL]. [2024-03-20]. https://arxiv.org/abs/2109.04838. [106] SUN M J, LIU Z, BAIR A, et al. A simple and effective pruning approach for large language models[C]//Proceedings of the 12th International Conference on Learning Representations, 2024. [107] CHENG Y, WANG D, ZHOU P, et al. A survey of model compression and acceleration for deep neural networks[EB/OL]. [2024-03-20]. https://arxiv.org/abs/ 1710.09282. [108] LI Y G, LIANG F, ZHAO L C, et al. Supervision exists everywhere: a data efficient contrastive language-image pretraining paradigm[EB/OL]. [2024-03-20]. https://arxiv.org/abs/abs/2110.05208. [109] KIM W, SON B, KIM I. ViLT: vision-and-language transformer without convolution or region supervision[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 5583-5594. [110] YAN H, DENG B C, LI X N, et al. TENER: adapting transformer encoder for named entity recognition[EB/OL]. [2024-03-20]. https://arxiv.org/abs/1911.04474. [111] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 8748-8763. [112] HUANG S H, DONG L, WANG W H, et al. Language is not all you need: aligning perception with language models[EB/OL]. [2024-03-22]. https://arxiv.org/abs/2302.14045. [113] DRIESS D, XIA F, SAJJADI M S M, et al. PaLM-E: an embodied multimodal language model[C]//Proceedings of the 2023 International Conference on Machine Learning, 2023: 8469-8488. |
[1] | CHANG Baofa, CHE Chao, LIANG Yan. Research on Recommendation Model Based on Multi-round Dialogue of Large Language Model [J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(2): 385-395. |
[2] | WANG Xiaoyu, LI Xin, HU Mianning, XUE Di. CIL-LLM: Incremental Learning Framework Based on Large Language Models for Category Classification [J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(2): 374-384. |
[3] | FENG Tuoyu, WANG Gangliang, QIAO Zijian, LI Weiping, ZHANG Yusong, GUO Qinglang. SbSER: Step-by-Step Enhanced Reasoning Framework for Large Language Model with External Subgraph Generation [J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(2): 367-373. |
[4] | XU Fengru, LI Bohan, XU Shuai. Research Progress on Sequence Recommendation Based on Deep Learning and Large Language Model [J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(2): 344-366. |
[5] | LI Boxin. Method of Retrieval-Augmented Large Language Models with Stable Outputs for Private Question-Answering Systems [J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(1): 132-140. |
[6] | WANG Yong, QIN Jiajun, HUANG Yourui, DENG Jiangzhou. Design of University Research Management Question Answering System Integrating Knowledge Graph and Large Language Models [J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(1): 107-117. |
[7] | MA He, WANG Hairong, WANG Yiyan, SUN Chong, ZHOU Beijing. Multimodal Unsupervised Entity Alignment Approach with Progressive Strategies [J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(1): 245-252. |
[8] | YU Fengrui, DU Yanhui. Research on Generative Techniques for Identifying and Extracting Tactics, Techniques and Procedures [J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(1): 118-131. |
[9] | XU Lei, HU Yahao, CHEN Man, CHEN Jun, PAN Zhisong. Hate Speech Detection Method Integrating Prefix Tuning and Prompt Learning [J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(1): 97-106. |
[10] | XIANG Xiaowei, SHEN Yanguang, HU Minghao, YAN Tianwei, LUO Wei, LUO Zhunchen. Research on Science and Technology Policy and Regulation Q&A System Driven by Large Models [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2349-2360. |
[11] | LI Yifei, ZHANG Lingling, DONG Yuxuan, WANG Jiaxin, ZHONG Yujie, WEI Bifan. Large Language Model Augmentation and Feature Alignment Method for Few-Shot Continual Relation Extraction [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2326-2336. |
[12] | JI Guiyang, WANG Peiyan, YU Zhuo. Research on Knowledge Injection Method for Large Language Model Oriented to Process Specification Texts [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2361-2369. |
[13] | CHEN Longfei, GAO Xin, HOU Haotian, YE Chuyang, LIU Ya'ou, ZHANG Meihui. Application of Generative Large Language Models in Chinese Radiology Domain [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2337-2348. |
[14] | LI Mengyun, ZHANG Jing, ZHANG Huanxiang, ZHANG Xiaolin, LIU Luyao. Multimodal Sentiment Analysis Based on Cross-Modal Semantic Information Enhancement [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2476-2486. |
[15] | LUO Shijie, JIN Rize, HAN Shuzhen. Research on University Basic Knowledge Question-Answering Using Low-Rank Encoding to Optimize Large Language Model [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(8): 2156-2168. |
Viewed | ||||||||||||||||||||||||||||||||||||||||||||||
Full text 100
|
|
|||||||||||||||||||||||||||||||||||||||||||||
Abstract 157
|
|
|||||||||||||||||||||||||||||||||||||||||||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/