Journal of Frontiers of Computer Science and Technology ›› 2021, Vol. 15 ›› Issue (12): 2241-2255.DOI: 10.3778/j.issn.1673-9418.2104068

• Surveys and Frontiers • Previous Articles     Next Articles

Survey of Speaker Adaptation Methods in Speech Recognition

ZHU Fangyuan, MA Zhiqiang, CHEN Yan, ZHANG Xiaoxu, WANG Hongbin, BAO Caijilahu   

  1. 1. College of Data Science and Application, Inner Mongolia University of Technology, Hohhot 010080, China
    2. Inner Mongolia Autonomous Region Engineering & Technology Research Centre of Big Data Based Software Service, Inner Mongolia University of Technology, Hohhot 010080, China
  • Online:2021-12-01 Published:2021-12-09

语音识别中说话人自适应方法研究综述

朱方圆,马志强,陈艳,张晓旭,王洪彬,宝财吉拉呼   

  1. 1. 内蒙古工业大学 数据科学与应用学院,呼和浩特 010080
    2. 内蒙古工业大学 内蒙古自治区基于大数据的软件服务工程技术研究中心,呼和浩特 010080

Abstract:

Speech is one of the ways of human-computer interaction, and speech recognition technology is an important part of artificial intelligence. In recent years, the application of neural network technology in the field of speech recognition has developed rapidly, and it has become the mainstream acoustic modeling technology in the field of speech recognition. However, there is a difference between target speaker??s voice and training data in the test conditions, which leads to the problem of model incompatibility. Therefore, the speaker adaptation (SA) method is to solve the mismatch problem caused by the speaker difference, and the research on the speaker adaptation method has become a popular direction in the field of speech recognition. Compared with the speaker adaptation method in the traditional speech recognition system, the self-adaptation in the speech recognition system using neural network has the characteristics of huge model parameters and relatively small amount of data. Therefore, the speaker adaptation method in the neural network-based speech recognition system becomes a challenge. Firstly, this paper reviews the development history of the speaker adaptation method and the various problems encountered in the research of the neural network-based speaker adaptation method. Secondly, the speaker adaptation method is divided into the speaker adaptation method based on feature domain and the speaker adaptation method based on model domain. It also introduces the corresponding principles and improvement methods, and finally points out the pro-blems that still exist in the speaker adaptation method in speech recognition and the future development direction.

Key words: speech recognition, speaker adaptation (SA), neural network

摘要:

语音是人机交互方式之一,语音识别技术是人工智能的重要组成部分。近年来神经网络技术在语音识别领域的应用快速发展,已经成为语音识别领域中主流的声学建模技术。然而测试条件中目标说话人语音与训练数据存在差异,导致模型不适配的问题。因此说话人自适应(SA)方法是为了解决说话人差异导致的不匹配问题,研究说话人自适应方法成为语音识别领域的一个热门方向。相比传统语音识别模型中的说话人自适应方法,使用神经网络的语音识别系统中的自适应存在着模型参数庞大,而自适应数据量相对较少等特点,这使得基于神经网络的语音识别系统中的说话人自适应方法成为一个研究难题。首先回顾说话人自适应方法的发展历程和基于神经网络的说话人自适应方法研究遇到的各种问题,其次将说话人自适应方法分为基于特征域和基于模型域的说话人自适应方法并介绍对应原理和改进方法,最后指出说话人自适应方法在语音识别中仍然存在的问题及未来的发展方向。

关键词: 语音识别, 说话人自适应(SA), 神经网络