[1] KINNUNEN T, LI H. An overview of text-independent spea-ker recognition: from features to supervectors[J]. Speech Com-munication, 2010, 52(1): 12-40.
[2] REYNOLDS D A, QUATIERI T F, DUNN R B. Speaker verification using adapted Gaussian mixture models[J]. Digital Signal Processing, 2000, 10(1): 19-41.
[3] KENNY P, BOULIANNE G, DUMOUCHEL P. Eigenvoice modeling with sparse training data[J]. IEEE Transactions on Speech and Audio Processing, 2005, 13(3): 345-354.
[4] DEHAK N, KENNY P J, DEHAK R, et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech and Language Processing, 2011, 19(4): 788-798.
[5] SUN N, ZHANG Y, LIN H B, et al. Short speech speaker recognition algorithm based on multi feature i-vector[J]. Computer Application, 2018, 38(10): 93-97.
孙念, 张毅, 林海波, 等. 基于多特征i-vector的短语音说话人识别算法[J]. 计算机应用, 2018, 38(10): 93-97.
[6] FAROOQ M U, ADEEBA F, HUSSAIN S. X-vectors based Urdu speaker identification for short utterances[C]//Procee-dings of the 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standar-disation of Speech Databases and Assessment Techniques, Cebu, Oct 25-27, 2019. Piscataway: IEEE, 2019: 1-5.
[7] WANG Z, FU S. Short speech speaker verification based on improved identity vector extraction[J]. Journal of Chinese Computer Systems, 2019, 40(11): 2264-2268.
王铮, 傅山. 基于改进身份向量提取的短语音说话人确认[J]. 小型微型计算机系统, 2019, 40(11): 2264-2268.
[8] SNYDER D, GARCIA-ROMERO D, POVEY D, et al. Deep neural network embeddings for text-independent speaker verification[C]//Proceedings of the 18th Annual Conference of the International Speech Communication Association, Sto-ckholm, Aug 20-24, 2017: 999-1003.
[9] CAI G D. Research on speaker recognition based on x-vector[D]. Beijing: Beijing Jiaotong University, 2019.
蔡国都. 基于x-vector的说话人识别研究[D]. 北京: 北京交通大学, 2019.
[10] LIN L, CHEN H, CHEN J, et al. Short speech speaker reco-gnition based on multi-core SVM-GMM[J]. Journal of Jilin University (Engineering Edition), 2013, 43(2): 504-509.
林琳, 陈虹, 陈建, 等. 基于多核SVM-GMM的短语音说话人识别[J]. 吉林大学学报(工学版), 2013, 43(2): 504-509.
[11] BHATTACHARYA G, ALAM M J, KENNY P. Deep speaker embeddings for short duration speaker verification[C]//Pro-ceedings of the 18th Annual Conference of the International Speech Communication Association, Stockholm, Aug 20-24, 2017: 1517-1521.
[12] ZHANG J C, INOUE N, SHINODA K. I-vector transforma-tion using conditional generative adversarial networks for short utterance speaker verification[C]//Proceedings of the 19th Annual Conference of the International Speech Com-munication Association, Hyderabad, Sep 2-6, 2018: 3613-3617.
[13] JUNG Y, KYE S M, CHOI Y, et al. Improving multi-scale aggregation using feature pyramid module for robust speaker verification of variable-duration utterances[J]. arXiv:2004. 03194, 2020.
[14] SNYDER D, GHAHREMANI P, POVEY D, et al. Deep neural network-based speaker embeddings for end-to-end speaker verification[C]//Proceedings of the 2016 IEEE Spo-ken Language Technology Workshop, San Diego, Dec 13-16, 2016. Piscataway: IEEE, 2016: 165-170.
[15] MATROUF D, SCHEFFER N, FAUVE B G B, et al. A straightforward and efficient implementation of the factor analysis model for speaker verification[C]//Proceedings of the 8th Annual Conference of the International Speech Com-munication Association, Antwerp, Aug 27-31, 2007: 1242-1245.
[16] PEROZZI B, AL-RFOU R, SKIENA S, et al. DeepWalk: online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining, New York, Aug 24-27, 2014. New York: ACM, 2014: 701-710.
[17] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv:1301.3781, 2013.
[18] WAIBEL A, HANAZAWA T, HINTON G, et al. Phoneme recognition using time-delay neural networks[J]. IEEE Trans-actions on Acoustics, Speech, and Signal Processing, 2002, 37(3): 328-339.
[19] PRINCE S J D, ELDER J H. Probabilistic linear discri-minant analysis for inferences about identity[C]//Proceedings of the 11th International Conference on Computer Vision, Rio de Janeiro, Oct 14-20, 2007. Washington: IEEE Com-puter Society, 2007: 1-8.
[20] HARDOON D R, SZEDMAK S, SHAWE-TAYLOR J. Canonical correlation analysis: an overview with application to learning methods[J]. Neural Computation, 2004, 16(12): 2639-2664.
[21] SNYDER D, GARCIA-ROMERO D, SELL G, et al. X-vectors: robust DNN embeddings for speaker recognition[C]//Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, Apr 15-20, 2018. Piscataway: IEEE, 2018: 5329-5333. |