计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (8): 1341-1350.DOI: 10.3778/j.issn.1673-9418.1807003

• 人工智能 • 上一篇    下一篇

基于自编码特征的语音增强声学特征提取

张涛,任相赢,刘阳,耿彦章   

  1. 天津大学 电气自动化与信息工程学院,天津 300072
  • 出版日期:2019-08-01 发布日期:2019-08-07

Acoustic Features Extraction of Speech Enhancement Based on Auto-Encoder Feature

ZHANG Tao, REN Xiangying, LIU Yang, GENG Yanzhang   

  1. School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
  • Online:2019-08-01 Published:2019-08-07

摘要: 利用监督性学习算法进行语音增强时,特征提取是至关重要的步骤。现有的组合特征和多分辨率特征等听觉特征是常用的声学特征,基于这些特征的增强语音虽然可懂度得到了较大提升,但是仍然残留大量噪声,语音质量(用信噪比衡量)很低。在不影响可懂度的情况下,为了提高语音增强后语音质量,提出了一种基于自编码特征的综合特征。首先利用自编码器提取自编码特征,然后利用Group Lasso算法验证自编码特征与听觉特征的互补性和冗余性,将特征重新组合得到综合特征,最后将综合特征作为语音增强系统的输入特征进行语音增强。在TIMIT语料库和Noisex-92噪声库上进行了仿真实验,结果表明,与传统的语音增强方法以及现有的组合特征和多分辨率特征分别作为语音增强系统输入特征的深度学习等方法相比,提出的增强算法的语音质量得到了较大提升。

关键词: 自编码特征, 深度神经网络, 特征提取, 信噪比

Abstract: In speech enhancement with supervised learning, feature extraction is a key step. Auditory features such as existing combined features and MRCG (multi-resolution cochleagram) are commonly used. Although the intelligibility of the enhanced speech based on these features is greatly improved, there is still a lot of noise in the enhanced speech and the quality (expressed as SNR) is low. In order to improve the quality after speech enhancement, without affecting the intelligibility, the IF (integrated feature) based on AEF (auto-encoder feature) is proposed. Firstly, the AEF is extracted by auto-encoder. Then, Group Lasso algorithm is used to verify the complementarity and redundancy between the AEF and the auditory features. These features are recombined to form IF. Finally, IF is used as input feature of speech enhancement system. Experiments are carried out on TIMIT corpus and Noisex-92 noise libraries. Compared with traditional speech enhancement methods, as well as deep learning methods using the existing combined features and MRCG as input features of speech enhancement system, the experimental results show that the speech effect of the proposed algorithm is greatly improved.

Key words: auto-encoder feature, deep neural network, feature extraction, signal noise ratio (SNR)