[1] MARKOV A A. An example of statistical investigation of the text Eugene Onegin concerning the connection of samples in chains[J]. Science in Context, 2006, 19(4): 591-600.
[2] SHANNON C E. A mathematical theory of communication [J]. The Bell System Technical Journal, 1948, 27(3): 379-423.
[3] CHOMSKY N. Syntactic structures[M]. [S.l.]: Walter de Gruyter, 2002.
[4] ZHANG H, XU J, WANG J. Pretraining-based natural lang-uage generation for text summarization[J]. arXiv:1902.09243, 2019.
[5] LIU Y, LIN Z. Unsupervised pre-training for natural lang-uage generation: a literature review[J]. arXiv:1911.06171, 2019.
[6] QIU X P, SUN T X, XU Y G, et al. Pre-trained models for natural language processing: a survey[J]. Science China: Technological Sciences, 2020(10): 1872-1897.
[7] BROWN P F, DELLA PIETRA V J, DESOUZA P V, et al. Class-based n-gram models of natural language[J]. Comput-ational Linguistics, 1992, 18(4): 467-480.
[8] CAVNAR W B, TRENKLE J M. N-gram-based text cate-gorization: Ann Arbor MI 48113-4001[R]. Environmental Research Institute of Michigan, 2001.
[9] HUANG Z H, THINT M, QIN Z C. Question classification using head words and their hypernyms[C]//Proceedings of the 2008 Conference on Empirical Methods in Natural Lang-uage Processing, Hawaii, Oct 25-27, 2008. Stroudsburg: ACL, 2008: 927-936.
[10] SALTON G, WONG A, YANG C. A vector space model for automatic indexing[J]. Communications of the ACM, 1975, 18(11): 613-620.
[11] LIANG J, CHEN J H, ZHANG X Q, et al. Anomaly detec-tion based on one-hot encoding and convolutional neural network[J]. Journal of Tsinghua University (Science and Technology), 2019, 59(7): 523-529.
梁杰, 陈嘉豪, 张雪芹, 等. 基于独热编码和卷积神经网络的异常检测[J]. 清华大学学报(自然科学版), 2019, 59(7): 523-529.
[12] VAPNIK V C A. A note on class of perceptron[J]. Automa-tion and Remote Control, 1964, 25(1).
[13] WANG S C. Artificial neural network[M]//Interdisciplinary Computing in Java Programming. Berlin, Heidelberg: Spr-inger, 2003.
[14] ALTMAN N S. An introduction to kernel and nearest-neighbor nonparametric regression[J]. The American Statis-tician, 1992, 46(3): 175-185.
[15] JONES K S. A statistical interpretation of term specificity and its application in retrieval[J]. Journal of Documentation, 2004, 60(5): 493-502.
[16] JONES K S. IDF term weighting and IR research lessons[J]. Journal of Documentation, 2004, 60(5): 521-523.
[17] KENT J T. Information gain and a general measure of correla-tion[J]. Biometrika, 1983, 70(1): 163-173.
[18] WILSON E B, HILFERTY M M. The distribution of chi-square[J]. Proceedings of the National Academy of Sciences of the United States of America, 1931, 17(12): 684.
[19] GIERLICHS B, BATINA L, TUYLS P, et al. Mutual infor-mation analysis[C]//LNCS 5154: Proceedings of the 2008 International Workshop on Cryptographic Hardware and Embedded Systems, Washington, Aug 10-13, 2008. Berlin, Heidelberg: Springer, 2008: 426-442.
[20] GORSHKOVA T A, SAL'NIKOV V V, CHEMIKOSOVA S B, et al. The snap point: a transition point in Linum usita-tissimum bast fiber development[J]. Industrial Crops and Products, 2003, 18(3): 213-221.
[21] KULICK J, LIECK R, TOUSSAINT M. Active learning of hyperparameters: an expected cross entropy criterion for active model selection[J]. arXiv:1409.7552, 2014.
[22] BLAND J M, ALTMAN D G. The odds ratio[J]. British Medical Journal, 2000, 320(7247): 1468.
[23] MIHALCEA R, TARAU P. Textrank: bringing order into text[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Jul 25-26, 2004. Stroudsburg: ACL, 2004: 404-411.
[24] PAGE L, BRIN S, MOTWANI R, et al. The PageRank cita-tion ranking: bringing order to the web[R]. Stanford Info Lab, 1999.
[25] REHDER B, SCHREINER M E, WOLFE M B, et al. Using latent semantic analysis to assess knowledge: some technical considerations[J]. Discourse Processes, 1998, 25(2/3): 337-354.
[26] LANDAUER T K, DUMAIS S T. A solution to plato??s pro-blem: the latent semantic analysis theory of acquisition, in-duction, and representation of knowledge[J]. Psychological Review, 1997, 104(2): 211-240.
[27] BRAND M. Incremental singular value decomposition of uncertain data with missing values[C]//LNCS 2350: Proceed-ings of the 7th European Conference on Computer Vision, Copenhagen, May 28-31, 2002. Berlin, Heidelberg: Springer, 2002: 707-720.
[28] HOFMANN T. Probabilistic latent semantic analysis[J]. arXiv:1301.6705, 2013.
[29] DEERWESTER S, DUMAIS S T, FURNAS G W, et al. In-dexing by latent semantic analysis[J]. Journal of the American Society for Information Science, 1990, 41(6): 391-407.
[30] MOON T K. The expectation-maximization algorithm[J]. IEEE Signal Processing Magazine, 1996, 13(6): 47-60.
[31] BAYES T. An essay towards solving a problem in the doc-trine of chances. By the Late Rev. Mr. Bayes, F. R. S. Com-municated by Mr. Price, in a Letter to John Canton, A. M. F. R. S.[J]. Philosophical Transactions of the Royal Society of London, 1763, 53: 370-418.
[32] RABINER L R. A tutorial on hidden Markov models and selected applications in speech recognition[J]. Proceedings of the IEEE, 1989, 77(2): 257-286.
[33] SEYMORE K, MCCALLUM A, ROSENFELD R. Learning hidden Markov model structure for information extraction [C]//Proceedings of the 16th National Conference on Arti-ficial Intelligence, Florida, Jul 18-22, 1999. Menlo Park: AAAI, 1999: 37-42.
[34] LAFFERTY J D, McCALLUM A, PEREIRA F C. Condi-tional random fields: probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th In-ternational Conference on Machine Learning, Williamstown, Jun 28-Jul 1, 2001. San Mateo: Morgan Kaufmann, 2001: 282-289.
[35] CROSS G R, JAIN A K. Markov random field texture models[J]. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 1983, 5(1): 25-39.
[36] LIU R H, YE X, YUE Z Y. A survey of pre-trained models for natural language processing tasks[J/OL]. Journal of Computer Applications [2021-03-04]. http://kns.cnki.net/kcms/detail/51.1307.TP.20201203.0859.004.html.
刘睿珩, 叶霞, 岳增营. 面向自然语言处理任务的预训练模型综述[J/OL]. 计算机应用 [2021-03-04]. http://kns.cnki.net/kcms/detail/51.1307.TP.20201203.0859.004.html.
[37] YU T R, JIN R, HAN X Z, et al. Review of pre-training models for natural language processing[J]. Computer Eng-ineering and Applications, 2020, 56(23): 12-22.
余同瑞, 金冉, 韩晓臻, 等. 自然语言处理预训练模型的研究综述[J]. 计算机工程与应用, 2020, 56(23): 12-22.
[38] LI Z J, FAN Y, WU X J. Survey of natural language pro-cessing pre-training techniques[J]. Computer Science, 2020, 47(3): 162-173.
李舟军, 范宇, 吴贤杰. 面向自然语言处理的预训练技术研究综述[J]. 计算机科学, 2020, 47(3): 162-173.
[39] BENGIO Y, DUCHARME R, VINCENT P, et al. A neural probabilistic language model[J]. Journal of Machine Learn-ing Research, 2003, 3: 1137-1155.
[40] KOMBRINK S, MIKOLOV T, KARAFIáT M, et al. Recur-rent neural network based language modeling in meeting recognition[C]//Proceedings of the 12th Annual Conference of the International Speech Communication Association, Florence, Aug 27-31, 2011: 2877-2880.
[41] MNIH A, HINTON G E. Three new graphical models for statistical language modelling[C]//Proceedings of the 24th International Conference, Corvallis, Jun 20-24, 2007. New York: ACM, 2007: 641-648.
[42] PODSIADLO P, ARRUDA E M, KHENG E, et al. LBL ass-embled laminates with hierarchical organization from nano-to microscale: high-toughness nanomaterials and deforma-tion imaging[J]. ACS Nano, 2009, 3(6): 1564-1572.
[43] MNIH A, HINTON G E. A scalable hierarchical distributed language model[C]//Proceedings of the 21st Annual Confer-ence on Neural Information Processing Systems, Vancouver, Dec 8-11, 2008. Red Hook: Curran Associates, 2008: 1081-1088.
[44] COLLOBERT R, WESTON J. A unified architecture for na-tural language processing: deep neural networks with multi-task learning[C]//Proceedings of the 25th International Con-ference on Machine Learning, Helsinki, Jul 5-9, 2008. New York: ACM, 2008: 160-167.
[45] HUANG E H, SOCHER R, MANNING C D, et al. Improv-ing word representations via global context and multiple word prototypes[C]//Proceedings of the 50th Annual Meet-ing of the Association for Computational Linguistics, Jeju Island, Jul 8-14, 2012. Stroudsburg: ACL, 2012: 873-882.
[46] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient esti-mation of word representations in vector space[J]. arXiv:1301.3781, 2013.
[47] KENTER T, BORISOV A, DE RIJKE M. Siamese CBOW: optimizing word embeddings for sentence representations [J]. arXiv:1606.04640, 2016.
[48] MCCORMICK C. Word2vec tutorial-the skip-gram model[EB/OL]. [2020-09-26]. http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model.
[49] JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[J]. arXiv:1607.01759, 2016.
[50] BOJANOWSKI P, GRAVE E, JOULIN A, et al. Enriching word vectors with subword information[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 135-146.
[51] WEINBERGER K Q, DASGUPTA A, LANGFORD J, et al. Feature Hashing for large scale multitask learning[C]//Pro-ceedings of the 26th Annual International Conference on Machine Learning, Montreal, Jun 14-18, 2009. New York: ACM, 2009: 1113-1120.
[52] PENNINGTON J, SOCHER R, MANNING C D. Glove: global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Oct 25-29, 2014. Qatar: Asso-ciation for Computational Linguistics, 2014: 1532-1543.
[53] YOSINSKI J, CLUNE J, BENGIO Y, et al. How transfer-able are features in deep neural networks?[J]. arXiv:1411.1792, 2014.
[54] PAN S J, YANG Q. A survey on transfer learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 22(10): 1345-1359.
[55] MARGOLIS A. A literature review of domain adaptation with unlabeled data[R]. Washington: University of Wash-ington, 2011: 1-42.
[56] HOWARD J, RUDER S. Universal language model fine-tuning for text classification[J]. arXiv:1801.06146, 2018.
[57] PETERS M E, NEUMANN M, IYYER M, et al. Deep con-textualized word representations[J]. arXiv:1802.05365, 2018.
[58] SUNDERMETER M, SCHLüTER R, NEY H. LSTM neural networks for language modeling[C]//Proceedings of the 13th Annual Conference of the International Speech Communication Association, Portland, Sep 9-13, 2012: 194-197.
[59] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[EB/OL]. [2020-09-26]. https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf.
[60] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 30th Annual Con-ference on Neural Information Processing Systems, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 5998-6008.
[61] RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[J]. OpenAI, 2019, 1(8): 9.
[62] BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[J]. arXiv: 2005.14165, 2020.
[63] DEVLIN J, CHANG M, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understand-ing[J]. arXiv:1810.04805, 2018.
[64] LIU X, HE P, CHEN W, et al. Multi-task deep neural networks for natural language understanding[J]. arXiv:1901.11504, 2019.
[65] SONG K, TAN X, QIN T, et al. Mass: masked sequence to sequence pre-training for language generation[J]. arXiv:1905.02450, 2019.
[66] DONG L, YANG N, WANG W, et al. Unified language model pre-training for natural language understanding and generation[J]. arXiv:1905.03197, 2019.
[67] ZHANG Z, HAN X, LIU Z, et al. ERNIE: enhanced language representation with informative entities[J]. arXiv:1905.07129, 2019.
[68] BORDES A, USUNIER N, GARCIA-DURAN A, et al. Translating embeddings for modeling multi-relational data[C]//Proceedings of the 26th Annual Conference on Neural Information Processing Systems, Lake Tahoe, Dec 5-8, 2013. Red Hook: Curran Associates, 2013: 2787-2795.
[69] SUN Y, WANG S H, LI Y K, et al. ERNIE 2.0: a continual pre-training framework for language understanding[C]//Pro-ceedings of the 34th AAAI Conference on Artificial Intellig-ence, the 32nd Innovative Applications of Artificial Intellig-ence Conference, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 8968-8975.
[70] YANG Z, DAI Z, YANG Y, et al. XLNet: generalized auto-regressive pretraining for language understanding[C]//Proceed-ings of the 32nd Annual Conference on Neural Information Processing Systems, Vancouver, Dec 8-14, 2019. Red Hook: Curran Associates, 2019: 5754-5764.
[71] PUSKORIUS G V, FELDKAMP L A. Truncated backpropa-gation through time and Kalman filter training for neuroco-ntrol[C]//Proceedings of the 1994 IEEE International Con-ference on Neural Networks, Orlando, Jun 27-Jul 2, 1994. Piscataway: IEEE, 1994: 2488-2493.
[72] CUI Y, CHE W, LIU T, et al. Pre-training with whole word masking for Chinese BERT[J]. arXiv:1906.08101, 2019.
[73] LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly op-timized BERT pretraining approach[J]. arXiv:1907.11692, 2019.
[74] JOSHI M, CHEN D, LIU Y, et al. SpanBERT: improving pre-training by representing and predicting spans[J]. Trans-actions of the Association for Computational Linguistics, 2020, 8: 64-77.
[75] LIU W J, ZHOU P, ZHAO Z, et al. K-BERT: enabling lang-uage representation with knowledge graph[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Innovative Applications of Artificial Intelligence Con-ference, the 10th AAAI Symposium on Educational Adv-ances in Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 2901-2908.
[76] ZHANG Z S, WU Y W, ZHAO H, et al. Semantics-aware BERT for language understanding[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Innovative Applications of Artificial Intelligence Confer-ence, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 9628-9635.
[77] WANG W, BI B, YAN M, et al. StructBERT: incorporating language structures into pre-training for deep language un-derstanding[J]. arXiv:1908.04577, 2019.
[78] CLARK K, LUONG M, LE Q V, et al. Electra: pre-training text encoders as discriminators rather than generators[J]. arXiv:2003.10555, 2020.
[79] GORDON M A, DUH K, ANDREWS N. Compressing BERT: studying the effects of weight pruning on transfer learning[J]. arXiv:2002.08307, 2020.
[80] MICHEL P, LEVY O, NEUBIG G. Are sixteen heads really better than one?[J]. arXiv:1905.10650, 2019.
[81] CORDONNIER J B, LOUKAS A, JAGGI M. Multi-head attention: collaborate instead of concatenate[J]. arXiv:2006. 16362, 2020.
[82] MCCARLEY J S, CHAKRAVARTI R, SIL A. Structured pruning of a BERT-based question answering model[J]. arXiv:1910.06360, 2019.
[83] FAN A, GRAVE E, JOULIN A. Reducing transformer depth on demand with structured dropout[J]. arXiv:1909.11556, 2019.
[84] GUO F, LIU S, MUNGALL F S, et al. Reweighted proximal pruning for large-scale language representation[J]. arXiv: 1909.12486, 2019.
[85] HUANG S C. An efficient palette generation method for color image quantization[J]. Applied Sciences, 2021, 11(3): 1043.
[86] CHUANG J C, HU Y C, CHEN C M, et al. Joint index coding and reversible data hiding methods for color image quantization[J]. Multimedia Tools and Applications, 2019, 78(24): 35537-35558.
[87] SHEN S, DONG Z, YE J, et al. Q-BERT: Hessian based ultra low precision quantization of BERT[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Innovative Applications of Artificial Intelligence Con-ference, the 10th AAAI Symposium on Educational Adv-ances in Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 8815-8821.
[88] ZAFRIR O, BOUDOUKH G, IZSAK P, et al. Q8BERT: quantized 8bit BERT[J]. arXiv:1910.06188, 2019.
[89] ZHANG W, HOU L, YIN Y, et al. TernaryBERT: distillation-aware ultra-low bit BERT[J]. arXiv:2009.12812, 2020.
[90] LI F, ZHANG B, LIU B. Ternary weight networks[J]. arXiv:1605.04711, 2016.
[91] HOU L, KWOK J T. Loss-aware weight quantization of deep networks[J]. arXiv:1802.08635, 2018.
[92] HINTON G, VINYALS O, DEAN J. Distilling the know-ledge in a neural network[J]. arXiv:1503.02531, 2015.
[93] ZHAO S, GUPTA R, SONG Y, et al. Extreme language model compression with optimal subwords and shared pro-jections[J]. arXiv:1909.11687, 2019.
[94] SUN S, CHENG Y, GAN Z, et al. Patient knowledge distill-ation for BERT model compression[J]. arXiv:1908.09355, 2019.
[95] SANH V, DEBUT L, CHAUMOND J, et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter[J]. arXiv:1910.01108, 2019.
[96] WOLF T, DEBUT L, SANH V, et al. Transformers: state-of-the-art natural language processing[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Lang-uage Processing: System Demonstrations, Demos, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 38-45.
[97] MUKHERJEE S, AWADALLAH A H. Distilling transfor-mers into simple neural networks with unlabeled transfer data[J]. arXiv:1910.01769, 2019.
[98] WANG W, WEI F, DONG L, et al. MiniLM: deep self-attention distillation for task-agnostic compression of pre-trained transformers[J]. arXiv:2002.10957, 2020.
[99] JIAO X, YIN Y, SHANG L, et al. Tinybert: distilling BERT for natural language understanding[J]. arXiv:1909.10351, 2019.
[100] CHEN X, HE B, HUI K, et al. Simplified TinyBERT: knowledge distillation for document retrieval[J]. arXiv:2009.07531, 2020.
[101] LEE Y, SAXE J, HARANG R. CATBERT: context-aware tiny BERT for detecting social engineering emails[J]. arXiv:2010.03484, 2020.
[102] SUN Z, YU H, SONG X, et al. MobileBERT: a compact task-agnostic BERT for resource-limited devices[J]. arXiv:2004.02984, 2020.
[103] DE W A, PERRY D J. Optimal subarchitecture extraction for BERT[J]. arXiv:2010.10499, 2020.
[104] LAN Z, CHEN M, GOODMAN S, et al. ALBERT: a lite BERT for self-supervised learning of language representa-tions[J]. arXiv:1909.11942, 2019.
[105] XU C, ZHOU W, GE T, et al. BERT-of-Theseus: com-pressing BERT by progressive module replacing[J]. arXiv:2002.02925, 2020.
[106] SUNDHEIM B M. Named entity task definition[C]//Pro-ceedings of the 6th Conference on Message Understand-ing, Maryland, Nov 6-8, 1995. San Mateo: Morgan Kauf-mann, 1995: 319-332.
[107] LIANG C, YU Y, JIANG H M, et al. BOND: BERT-assisted open-domain named entity recognition with distant super-vision[C]//Proceedings of the 26th ACM SIGKDD Confer-ence on Knowledge Discovery and Data Mining, Virtual Event, Aug 23-27, 2020. New York: ACM, 2020: 1054-1064.
[108] LI X, ZHANG H, ZHOU X H. Chinese clinical named entity recognition with variant neural structures based on BERT methods[J]. Journal of Biomedical Informatics, 2020, 107: 103422.
[109] LUOMA J, PYYSALO S. Exploring cross-sentence con-texts for named entity recognition with BERT[J]. arXiv: 2006.01563, 2020.
[110] SU L X, GUO J F, FAN Y X, et al. A reading comprehen-sion model for multiple-span answers[J]. Chinese Journal of Computers, 2020, 43(5): 856-867.
苏立新, 郭嘉丰, 范意兴, 等. 面向多片段答案的抽取式阅读理解模型[J]. 计算机学报, 2020, 43(5): 856-867.
[111] EFRAT A, SEGAL E, SHOHAM M. Tag-based multi-span extraction in reading comprehension[J]. arXiv:1909.13375, 2019.
[112] HU M H, PENG Y X, HUANG Z, et al. A multi-type multi-span network for reading comprehension that requires dis-crete reasoning[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Lang-uage Processing, Hong Kong, China, Nov 3-7, 2019. Str-oudsburg: ACL, 2019: 1596-1606.
[113] CHEN D, MA Z, WEI L, et al. MTQA: text-based multi-type question and answer reading comprehension model[J]. Computational Intelligence and Neuroscience, 2021: 1-12.
[114] WENG R, YU H, HUANG S, et al. Acquiring knowledge from pre-trained model to neural machine translation[C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Innovative Applications of Artificial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 9266-9273.
[115] MAGER M, ASTUDILLO R F, NASEEM T, et al. GPT-too: a language-model-first approach for AMR-to-text generation[J]. arXiv:2005.09123, 2020.
[116] GONZáLEZ-CARVAJAL S, GARRIDO-MERCHáN E C. Comparing BERT against traditional machine learning text classification[J]. arXiv:2005.13012, 2020.
[117] SUN C, QIU X, XU Y, et al. How to fine-tune BERT for text classification?[J]. arXiv:1905.05583, 2019.
[118] LU Z B, DU P, NIE J Y. VGCN-BERT: augmenting BERT with graph embedding for text classification[C]//LNCS 12035: Proceedings of the 42nd European Conference on IR Research Advances in Information Retrieval, Lisbon, Apr 14-17, 2020. Berlin, Heidelberg: Springer, 2020: 369-382.
[119] TOPAL M O, BAS A, VAN H I. Exploring transformers in natural language generation: GPT, BERT, and XLNet[J]. arXiv:2102.08036, 2021.
[120] QU Y B, LIU P H, SONG W, et al. A text generation and prediction system: pre-training on new corpora using BERT and GPT-2[C]//Proceedings of the IEEE 10th International Conference on Electronics Information and Emergency Communication, Beijing, Jul 17-19, 2020: 323-326.
[121] CHI Z W, DONG L, WEI F R, et al. Cross-lingual natural language generation via pre-training[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Innovative Applications of Artificial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 7570-7577.
[122] VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extrac-ting and composing robust features with denoising auto-encoders[C]//Proceedings of the 25th International Conference on Machine Learning, Helsinki, Jul 5-9, 2008. New York: ACM, 2008: 1096-1103.
[123] HUANG W C, WU C H, LUO S B, et al. Speech recogni-tion by simply fine-tuning BERT[J]. arXiv:2102.00291, 2021.
[124] SU W, ZHU X, CAO Y, et al. VL-BERT: pre-training of generic visual-linguistic representations[J]. arXiv:1908.08530, 2019.
[125] YANG S, ZHANG Y H, FENG D L, et al. LRW-1000: a naturally-distributed large-scale benchmark for lip reading in the wild[C]//Proceedings of the 14th IEEE International Conference on Automatic Face & Gesture Recognition, Lille, May 14-18, 2019. Piscataway: IEEE, 2019: 1-8.
[126] RIBEIRO M T, WU T, GUESTRIN C, et al. Beyond accu-racy: behavioral testing of NLP models with CheckList[J]. arXiv:2005.04118, 2020. |