[1] YU D, DENG L, YU K, et al. Analyzing deep learning: speech recognition practice[M]. Beijing: Electronic Industry Press, 2016.
俞栋, 邓力, 俞凯, 等. 解析深度学习: 语音识别实践[M]. 北京:电子工业出版社, 2016.
[2] GALES M J F, WOODLAND P C. Mean and variance adap-tation within the MLLR framework[J]. Computer Speech & Language, 1996, 10(4): 249-264.
[3] GALES M J F, PYE D, WOODLAND P C. Variance com-pensation within the MLLR framework for robust speech recognition and speaker adaptation[C]//Proceedings of the 4th International Conference on Spoken Language Processing, Philadelphia, Oct 3-6, 1996: 1832-1835.
[4] NGUYEN P, WELLEKENS C, JUNQUA J C. Maximum likelihood eigenspace and MLLR for speech recognition in noisy environments[C]//Proceedings of the 6th European Conference on Speech Communication and Technology, Bu-dapest, Sep 5-9, 1999: 2519-2522.
[5] VARADARAJAN B, POVEY D, CHU S M. Quick fMLLR for speaker adaptation in speech recognition[C]//Proceedings of the 2008 IEEE International Conference on Acoustics, Speech, and Signal Processing, Caesars Palace, Mar 30-Apr 4, 2008. Piscataway: IEEE, 2008: 4297-4300.
[6] JAFARI K, ALMASGANJ F, SHEKOFTEH Y. Combination of fMLLR with clustering and fMLLR with MLLR clustering for rapid speaker adaptation[C]//Proceedings of the 2010 2nd International Conference on Electronic Computer Technology, Kuala Lumpur, May 7-10, 2010. Piscataway: IEEE, 2010: 133-136.
[7] WANG D, NARAYANAN S S. A confidence-score based un-supervised MAP adaptation for speech recognition[C]//Pro-ceedings of the 2002 Conference Record of the 36th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, Nov 3-6, 2002. Piscataway: IEEE, 2002: 222-226.
[8] CHESTA C, SIOHAN O, LEE C H. Maximum a posteriori linear regression for hidden Markov model adaptation[C]//Proceedings of the 6th European Conference on Speech Communication and Technology, Budapest, Sep 5-9, 1999: 211-214.
[9] LEE L, ROSE R C. A frequency warping approach to speaker normalization[J]. IEEE Transactions on Speech and Audio Processing, 1998, 6(1): 49-60.
[10] SANAND D R, KUMAR D D, UMESH S. Linear trans-formation approach to VTLN using dynamic frequency warping[C]//Proceedings of the 8th Annual Conference of the International Speech Communication Association, Antwerp, Aug 27-31, 2007: 1138-1141.
[11] CUI X D, ALWAN A. MLLR-like speaker adaptation based on linearization of VTLN with MFCC features[C]//Procee-dings of the 9th European Conference on Speech Communi-cation and Technology, Lisbon, Sep 4-8, 2005: 273-276.
[12] NETO J P, ALMEIDA L B, HOCHBERG M, et al. Speaker-adaptation for hybrid HMM-ANN continuous speech reco-gnition system[C]//Proceedings of the 4th European Confer-ence on Speech Communication and Technology, Madrid, Sep 18-21, 1995: 2171-2174.
[13] PAN J. Research on adaptive methods in deep learning speech recognition system[D]. Hefei: University of Science and Technology of China, 2020.
潘嘉. 深度学习语音识别系统中的自适应方法研究[D]. 合肥: 中国科学技术大学, 2020.
[14] GEMELLO R, MANA F, SCANZIO S, et al. Adaptation of hybrid ANN/HMM models using linear hidden transforma-tions and conservative training[C]//Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, Toulouse, May 14-19, 2006. Piscataway: IEEE, 2006: 1189-1192.
[15] LI B, SIM K C. Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems[C]//Proceedings of the 11th Annual Con-ference of the International Speech Communication Associa-tion, Makuhari, Sep 26-30, 2010: 526-529.
[16] TRMAL J, ZELINKA J, MüLLER L. Adaptation of a feed-forward artificial neural network using a linear transform[C]//LNCS 6231: Proceedings of the 13th International Conference on Text, Speech and Dialogue, Brno, Sep 6-10, 2010. Berlin, Heidelberg: Springer, 2010: 423-430.
[17] XIAO Y M, ZHANG Z, CAI S, et al. A initial attempt on task-specific adaptation for deep neural network-based large vocabulary continuous speech recognition[C]//Proceedings of the 13th Annual Conference of the International Speech Communication Association, Portland, Sep 9-13, 2012: 2574-2577.
[18] YAO K S, DONG Y, SEIDE F, et al. Adaptation of context-dependent deep neural networks for automatic speech reco-gnition[C]//Proceedings of the 2012 IEEE Spoken Language Technology Workshop, Miami, Dec 2-5, 2012. Piscataway: IEEE, 2012: 366-369.
[19] KARTHICK B M, KOLHAR P, UMESH S. Speaker adapta-tion of convolutional neural network using speaker specific subspace vectors of SGMM[C]//Proceedings of the 16th Annual Conference of the International Speech Communi-cation Association, Dresden, Sep 6-10, 2015: 1096-1100.
[20] YI J Y, TAO J H. Batch normalization based unsupervised speaker adaptation for acoustic models[C]//Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, Lanzhou, Nov 18-21, 2019. Piscataway: IEEE, 2019: 176-180.
[21] TOMASHENKO N A, ESTèVE Y. Evaluation of feature-space speaker adaptation for end-to-end acoustic models[C]//Proceedings of the 11th International Conference on Language Resources and Evaluation, Miyazaki, May 7-12, 2018.
[22] SAON G, SOLTAU H, NAHAMOO D, et al. Speaker ada-ptation of neural network acoustic models using i-vectors[C]//Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Dec 8-12, 2013. Piscataway: IEEE, 2013: 55-59.
[23] KARANASOU P, WANG Y Q, GALES M J F, et al. Adaptation of deep neural network acoustic models using factorised i-vectors[C]//Proceedings of the 15th Annual Con-ference of the International Speech Communication Asso-ciation, Singapore, Sep 14-18, 2014: 2180-2184.
[24] MIAO Y J, ZHANG H, METZE F. Speaker adaptive training of deep neural network acoustic models using i-vectors[J]. IEEE Transactions on Audio, Speech and Language Pro-cessing, 2015, 23(11): 1938-1949.
[25] CARDINAL P, DEHAK N, ZHANG Y, et al. Speaker ada-ptation using the i-vector technique for bottleneck features[C]//Proceedings of the 16th Annual Conference of the International Speech Communication Association, Dresden, Sep 6-10, 2015: 2867-2871.
[26] CUI X D, GOEL V, SAON G, et al. Embedding-based speaker adaptive training of deep neural networks[C]//Proceedings of the 18th Annual Conference of the International Speech Communication Association, Stockholm, Aug 20-24, 2017: 122-126.
[27] JIN C, GONG C, LI H. Speaker adaptation research of neural network acoustic model in speech recognition[J]. Computer Applications and Software, 2018, 35(2): 200-205.
金超, 龚铖, 李辉. 语音识别中神经网络声学模型的说话人自适应研究[J]. 计算机应用与软件, 2018, 35(2): 200-205.
[28] HUANG H G, SIM K C. An investigation of augmenting speaker representations to improve speaker normalisation for DNN-based speech recognition[C]//Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, South Brisbane, Apr 19-24, 2015. Piscataway: IEEE, 2015: 4610-4613.
[29] TAN T, QIAN Y M, YU D, et al. Speaker-aware training of LSTM-RNNs for acoustic modelling[C]//Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, Mar 20-25, 2016. Piscata-way: IEEE, 2016: 5280-5284.
[30] ROWNICKA J, BELL P, RENALS S. Embeddings for DNN speaker adaptive training[C]//Proceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop, Singapore, Dec 14-18, 2019. Piscataway: IEEE, 2019: 479-486.
[31] VARIANI E, LEI X, MCDERMOTT E, et al. Deep neural networks for small footprint text-dependent speaker verifica-tion[C]//Proceedings of the 2014 IEEE International Con-ference on Acoustics, Speech and Signal Processing, Florence, May 4-9, 2014. Piscataway: IEEE, 2014: 4052-4056.
[32] SNYDER D, GARCIA-ROMERO D, SELL G, et al. X-Vectors: robust DNN embeddings for speaker recognition[C]//Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, Apr 15-20, 2018. Piscataway: IEEE, 2018: 5329-5333.
[33] KHOKHLOV Y Y, ZATVORNITSKIY A, MEDENNIKOV I, et al. R-Vectors: new technique for adaptation to room acoustics[C]//Proceedings of the 20th Annual Conference of the International Speech Communication Association, Graz, Sep 15-19, 2019: 1243-1247.
[34] SHI Y P, HUANG Q, HAIN T. H-Vectors: utterance-level speaker embedding using a hierarchical attention model[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, May 4-8, 2020. Piscataway: IEEE, 2020: 7579-7583.
[35] VESELY K, WATANABE S, ZMOLíKOVá K, et al. Se-quence summarizing neural network for speaker adaptation[C]//Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, Mar 20-25, 2016. Piscataway: IEEE, 2016: 5315-5319.
[36] SARI L, THOMAS S, HASEGAWA-JOHNSON M A. Lear-ning speaker aware offsets for speaker adaptation of neural networks[C]//Proceedings of the 20th Annual Conference of the International Speech Communication Association, Graz, Sep 15-19, 2019: 769-773.
[37] XIE X R, LIU X Y, TAN L, et al. Fast DNN acoustic model speaker adaptation by learning hidden unit contribution features[C]//Proceedings of the 20th Annual Conference of the International Speech Communication Association, Graz, Sep 15-19, 2019: 759-763.
[38] PAN J, WAN G S, DU J, et al. Online speaker adaptation using memory-aware networks for speech recognition[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1025-1037.
[39] SARI L, MORITZ N, HORI T, et al. Unsupervised speaker adaptation using attention-based speaker memory for end-to-end ASR[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, May 4-8, 2020. Piscataway: IEEE, 2020: 7384-7388.
[40] ABDEL-HAMID O, JIANG H. Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code[C]//Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, May 26-31, 2013. Pis-cataway: IEEE, 2013: 7942-7946.
[41] GU D, LI H. Research on speaker adaptation method based on deep neural network[J]. Information Technology and Network Security, 2018, 37(4): 60-64.
古典, 李辉. 基于深度神经网络的说话人自适应方法研究[J]. 信息技术与网络安全, 2018, 37(4): 60-64.
[42] XUE S F, ABDEL-HAMID O, HUI J, et al. Direct adaptation of hybrid DNN/HMM model for fast speaker adaptation in LVCSR based on speaker code[C]//Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, May 4-9, 2014. Piscataway: IEEE, 2014: 6339-6343.
[43] MüLLER M, STüKER S, WAIBEL A. Language adaptive multilingual CTC speech recognition[C]//LNCS 10458: Pro-ceedings of the 19th International Conference on Speech and Computer, Hatfield, Sep 12-16, 2017. Cham: Springer, 2017: 473-482.
[44] STADERMANN J, RIGOLL G. Two-stage speaker adaptation of hybrid tied-posterior acoustic models[C]//Proceedings of the 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, Mar 18-23, 2005. Pis-cataway: IEEE, 2005: 977-980.
[45] SINISCALCHI S M, LI J Y, LEE C H. Hermitian-based hidden activation functions for adaptation of hybrid HMM/ANN models[C]//Proceedings of the 13th Annual Conference of the International Speech Communication Association, Portland, Sep 9-13, 2012: 2590-2593.
[46] WANG Z Q, WANG D L. Unsupervised speaker adaptation of batch normalized acoustic models for robust ASR[C]//Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, Mar 5-9, 2017. Piscataway: IEEE, 2017: 4890-4894.
[47] MANA F, WENINGER F, GEMELLO R, et al. Online batch normalization adaptation for automatic speech recognition[C]//Proceedings of the 2019 IEEE Automatic Speech Re-cognition and Understanding Workshop, Singapore, Dec 14-18, 2019. Piscataway: IEEE, 2019: 875-880.
[48] LIU C J, WANG Y Q, KUMAR K, et al. Investigations on speaker adaptation of LSTM RNN models for speech reco-gnition[C]//Proceedings of the 2016 IEEE International Con-ference on Acoustics, Speech and Signal Processing, Shanghai, Mar 20-25, 2016. Piscataway: IEEE, 2016: 5020-5024.
[49] SAMARAKOON L, SIM K C. Factorized hidden layer adaptation for deep neural network based acoustic modeling[J]. IEEE Transactions on Audio, Speech and Language Pro-cessing, 2016, 24(12): 2241-2250.
[50] JIAN X, LI J Y, GONG Y F. Restructuring of deep neural network acoustic models with singular value decomposition[C]//Proceedings of the 14th Annual Conference of the Inter-national Speech Communication Association, Lyon, Aug 25-29, 2013: 2365-2369.
[51] JIAN X, LI J Y, DONG Y, et al. Singular value decompo-sition based low-footprint speaker adaptation and persona-lization for deep neural network[C]//Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, May 4-9, 2014. Piscataway: IEEE, 2014: 6359-6363.
[52] SWIETOJANSKI P, RENALS S. Learning hidden unit con-tributions for unsupervised speaker adaptation of neural net-work acoustic models[C]//Proceedings of the 2014 IEEE Spoken Language Technology Workshop, South Lake, Dec 7-10, 2014. Piscataway: IEEE, 2014: 171-176.
[53] XIE X R, LIU X Y, TAN L, et al. BLHUC: Bayesian learning of hidden unit contributions for deep neural network speaker adaptation[C]//Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, May 12-17, 2019. Piscataway: IEEE, 2019: 5711-5715.
[54] DONG Y, YAO K S, HANG S, et al. KL-divergence regula-rized deep neural network adaptation for improved large vocabulary speech recognition[C]//Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, May 26-31, 2013. Piscataway: IEEE, 2013: 7893-7897.
[55] LIAO H. Speaker adaptation of context dependent deep neural networks[C]//Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Van-couver, May 26-31, 2013. Piscataway: IEEE, 2013: 7947-7951.
[56] XIAO L, BILMES J A. Regularized adaptation of discrimina-tive classifiers[C]//Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, Toulouse, May 14-19, 2006. Piscataway: IEEE, 2006: 237-240.
[57] PRICE R, ISO K I, SHINODA K. Speaker adaptation of deep neural networks using a hierarchy of output layers[C]//Proceedings of the 2014 IEEE Spoken Language Technology Workshop, South Lake Tahoe, Dec 7-10, 2014. Piscataway: IEEE, 2014: 153-158.
[58] SWIETOJANSKI P, BELL P, RENALS S. Structured output layer with auxiliary targets for context-dependent acoustic modelling[C]//Proceedings of the 16th Annual Conference of the International Speech Communication Association, Dre-sden, Sep 6-10, 2015: 3605-3609.
[59] MENG Z, LI J Y, GONG Y F. Adversarial speaker adapta-tion[C]//Proceedings of the 2019 IEEE International Confer-ence on Acoustics, Speech and Signal Processing, Brighton, May 12-17, 2019. Piscataway: IEEE, 2019: 5721-5725.
[60] HUANG Z, LI J Y, SINISCALCHI S M, et al. Rapid adapta-tion for deep neural networks through multi-task learning[C]//Proceedings of the 16th Annual Conference of the Inter-national Speech Communication Association, Dresden, Sep 6-10, 2015: 3625-3629.
[61] TóTH L, GOSZTOLYA G. Adaptation of DNN acoustic models using KL-divergence regularization and multi-task training[C]//LNCS 9811: Proceedings of the 18th Interna-tional Conference on Speech and Computer, Budapest, Aug 23-27, 2016. Cham: Springer, 2016: 108-115.
[62] WENINGER F, ANDRéS-FERRER J, LI X W, et al. Listen, attend, spell and adapt: speaker adapted sequence-to-sequence ASR[C]//Proceedings of the 20th Annual Confer-ence of the International Speech Communication Associa-tion, Graz, Sep 15-19, 2019: 3805-3809.
[63] LI K, LI J Y, ZHAO Y, et al. Speaker adaptation for end-to-end CTC models[C]//Proceedings of the 2018 IEEE Spoken Language Technology Workshop, Athens, Dec 18-21, 2018. Piscataway: IEEE, 2018: 542-549.
[64] MENG Z, GAUR Y, LI J Y, et al. Speaker adaptation for attention-based end-to-end speech recognition[C]//Procee-dings of the 20th Annual Conference of the International Speech Communication Association, Graz, Sep 15-19, 2019: 241-245. |