计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (8): 1351-1359.DOI: 10.3778/j.issn.1673-9418.1812012

• 人工智能 • 上一篇    下一篇

非线性幂变换Gammachirp滤波器的鲁棒语音特征提取

李聪,葛洪伟   

  1. 1.江南大学 轻工过程先进控制教育部重点实验室,江苏 无锡 214122
    2.江南大学 物联网工程学院,江苏 无锡 214122
  • 出版日期:2019-08-01 发布日期:2019-08-07

Robust Speech Feature Extraction Based on Nonlinear Power-Function Gammachirp Filter

LI Cong, GE Hongwei   

  1. 1.Ministry of Education Key Laboratory of  Advanced Process Control for Light Industry, Jiangnan University, Wuxi, Jiangsu 214122, China
    2.School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2019-08-01 Published:2019-08-07

摘要: 针对归一化功率倒谱系数(PNCC)在较低信噪比噪声环境下说话人识别鲁棒性不佳的问题,提出了非线性幂函数变换伽马啁啾频率倒谱系数(NPGFCC)的抗噪语音特征提取算法。相比PNCC,NPGFCC的不同之处在于其采用符合人耳听觉特性的归一化压缩Gammachirp滤波器组代替Gammatone滤波器组进行滤波,并在特征参数中融合了分段式非线性幂函数变换的方式。另外,算法中利用了均值方差归一化和时间序列滤波等技术的方法,进一步提高了其在噪声环境下的鲁棒性,并在改进的i-vector+PLDA模型下进行了测试。实验结果表明,相较于目前常用的一些说话人语音特征提取算法,在不同噪声和不同信噪比下,NPGFCC特征具有最佳抗噪性能,特别是在信噪比较低的情况下,与其他语音特征相比,NPGFCC特征具有更大的优势。

关键词: 特征提取, 说话人识别, 伽马啁啾滤波器, 高斯混合模型-通用背景模型(GMM-UBM), 辨识向量 ,  ,  , (i-vector), 概率线性判别分析(PLDA)

Abstract: To solve the problem of poor speaker recognition robustness with power normalized cepstral coefficients (PNCC) feature in low-SNR noisy environment, this paper presents the anti-noise speech feature extraction algorithm for nonlinear power-function Gammachirp frequency cepstral coefficients (NPGFCC). Compared with PNCC, NPGFCC uses normalized compression Gammachirp filter bank which conforms to human auditory characteristics to replace Gammatone filter bank for the filtering and also combines piecewise nonlinear power-function transformation in characteristic parameters. This algorithm also adopts mean variance normalization and time series filtering methods to further improve its robustness in noisy environment, which is tested with an improved i-vector + PLDA model. The result shows that compared with currently common speech feature extraction algorithms, NPGFCC has the best anti-noise performance against different noises and different SNRs. Especially in the case of low SNRs, the NPGFCC feature has greater advantage over other speech features.

Key words: feature extraction, speaker recognition, Gammachirp filter, Gaussian mixture model-universal background model (GMM-UBM), identity-vector (i-vector), probabilistic linear discriminant analysis (PLDA)