[1] LEWIS P, PEREZ E, PIKTUS A, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 9459-9474.
[2] RAJ H, GUPTA V, ROSATI D, et al. Semantic consistency for assuring reliability of large language models[EB/OL]. [2024-04-03]. https://arxiv.org/abs/2308.09138.
[3] ELAZAR Y, KASSNER N, RAVFOGEL S, et al. Measuring and improving consistency in pretrained language models[J]. Transactions of the Association for Computational Linguistics, 2021, 9: 1012-1031.
[4] ZHAO Z, WALLACE E, FENG S, et al. Calibrate before use: improving few-shot performance of language models[C]//Proceedings of the 38th International Conference on Machine Learning, Jul 18-24, 2021: 12697-12706.
[5] PEREZ E, KIELA D, CHO K. True few-shot learning with language models[C]//Advances in Neural Information Processing Systems 34, Dec 6-14, 2021: 11054-11070.
[6] YOO K M, KIM J, KIM H J, et al. Ground-truth labels matter: a deeper look into input-label demonstrations[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.Stroudsburg: ACL, 2022: 2422-2437.
[7] CHEN Y, ZHAO C, YU Z, et al. On the relation between sensitivity and accuracy in in-context learning[C]//Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg: ACL, 2023: 155-167.
[8] WANG X, WEI J, SCHUURMANS D, et al. Self-consis-tency improves chain of thought reasoning in language models[EB/OL]. [2024-04-03]. https://arxiv.org/abs/2203.11171.
[9] CHEN X, AKSITOV R, ALON U, et al. Universal self-consistency for large language model generation[EB/OL]. [2024-04-03]. https://arxiv.org/abs/2311.17311.
[10] WANG H, PRASAD A, STENGEL-ESKIN E, et al. Soft self-consistency improves language model agents[EB/OL]. [2024-04-03]. https://arxiv.org/abs/2402.13212.
[11] JANG M, LUKASIEWICZ T. Consistency analysis of ChatGPT[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 15970-15985.
[12] WEI J, WANG X, SCHUURMANS D, et al. Chain-of-thought prompting elicits reasoning in large language models[C]//Advances in Neural Information Processing Systems 35, New Orleans, Nov 28-Dec 9, 2022: 24824-24837.
[13] KENTON J D M W C, TOUTANOVA L K. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2019: 4171-4186.
[14] JANG M, KWON D S, LUKASIEWICZ T. BECEL: benchmark for consistency evaluation of language models[C]//Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Oct 12-17, 2022: 3680-3696.
[15] ZHOU C, HE J, MA X, et al. Prompt consistency for zero-shot task generalization[C]//Findings of the Association for Computational Linguistics: EMNLP 2022.Stroudsburg: ACL, 2022: 2613-2626.
[16] JANG M, KWON D S, LUKASIEWICZ T. Accurate, yet inconsistent? Consistency analysis on language understanding models[EB/OL]. [2024-04-03]. https://arxiv.org/abs/ 2108.06665.
[17] RABINOVICH E, ACKERMAN S, RAZ O, et al. Predicting question-answering performance of large language models through semantic consistency[EB/OL]. [2024-04-03]. https://arxiv.org/abs/2311.01152.
[18] BACH S, SANH V, YONG Z X, et al. PromptSource: an integrated development environment and repository for natural language prompts[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Stroudsburg: ACL, 2022: 93-104.
[19] MIN S, LEWIS M, ZETTLEMOYER L, et al. MetaICL: learning to learn in context[C]//Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2022: 2791-2809.
[20] CHEN Y, ZHONG R, ZHA S, et al. Meta-learning via language model in-context tuning[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2022: 719-730.
[21] SRIKANTH N, CARPUAT M, RUDINGER R. How often are errors in natural language reasoning due to paraphrastic variability?[EB/OL]. [2024-05-24]. https://arxiv.org/abs/2404. 11717.
[22] CHEN A, PHANG J, PARRISH A, et al. Two failures of self-consistency in the multi-step reasoning of LLMs[EB/OL]. [2024-04-03]. https://arxiv.org/abs/2305.14279.
[23] OHMER X, BRUNI E, HUPKES D. From form (s) to meaning: probing the semantic depths of language models using multisense consistency[EB/OL]. [2024-05-24]. https://arxiv.org/abs/ 2404.12145.
[24] YANG J, CHEN D, SUN Y, et al. Enhancing semantic consistency of large language models through model editing: an interpretability-oriented approach[C]//Findings of the Association for Computational Linguistics, ACL 2024. Stroudsburg: ACL, 2024: 3343-3353.
[25] RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[J]. OpenAI Blog, 2019, 1(8): 9.
[26] LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[EB/OL]. [2024-04-03]. https://arxiv.org/abs/1907.11692.
[27] JANG M, LUKASIEWICZ T. Improving language models meaning understanding and consistency by learning conceptual roles from dictionary[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 8496-8510.
[28] WANG X, LI Y, FENG S, et al. Integrate the essence and eliminate the dross: fine-grained self-consistency for free-form language generation[EB/OL]. [2024-08-10]. https://arxiv.org/abs/2407.02056.
[29] FAN A, LEWIS M, DAUPHIN Y. Hierarchical neural story generation[EB/OL]. [2024-04-03]. https://arxiv.org/abs/1805. 04833.
[30] HOLTZMAN A, BUYS J, DU L, et al. The curious case of neural text degeneration[EB/OL]. [2024-04-03]. https://arxiv.org/abs/1904.09751.
[31] RUDER S. An overview of multi-task learning in deep neural networks[EB/OL]. [2024-04-03]. https://arxiv.org/abs/1706. 05098.
[32] YANG C, WANG X, LU Y, et al. Large language models as optimizers[EB/OL]. [2024-04-03]. https://arxiv.org/abs/2309. 03409.
[33] LI X L, LIANG P. Prefix-Tuning: optimizing continuous prompts for generation[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2021: 4582-4597.
[34] LESTER B, AL-RFOU R, CONSTANT N. The power of scale for parameter-efficient prompt tuning[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2021: 3045-3059.
[35] LIU X, ZHENG Y, DU Z, et al. GPT understands, too[EB/OL]. [2024-04-03]. https://arxiv.org/abs/2103.10385.
[36] DING N, QIN Y, YANG G, et al. Parameter-efficient fine-tuning of large-scale pre-trained language models[J]. Nature Machine Intelligence, 2023, 5(3): 220-235.
[37] HU E J, SHEN Y, WALLIS P, et al. LoRA: low-rank adaptation of large language models[EB/OL]. [2024-04-03]. https://arxiv.org/abs/2106.09685.
[38] DETTMERS T, PAGNONI A, HOLTZMAN A, et al. QLoRA: efficient finetuning of quantized LLMs[C]//Advances in Neural Information Processing Systems 36, New Orleans, Dec 10-16, 2023.
[39] LIU Y, LAPATA M. Text summarization with pretrained encoders[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2019: 3730-3740.
[40] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008. |