计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (9): 1534-1542.DOI: 10.3778/j.issn.1673-9418.1809029

• 人工智能与模式识别 • 上一篇    下一篇

基于长时信号功率谱变化的语音端点检测

张涛,刘阳,任相赢   

  1. 天津大学 电气自动化与信息工程学院,天津 300072
  • 出版日期:2019-09-01 发布日期:2019-09-06

Voice Activity Detection Based on Long-Term Power Spectrum Variability

ZHANG Tao, LIU Yang, REN Xiangying   

  1. School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
  • Online:2019-09-01 Published:2019-09-06

摘要: 语音端点检测是语音信号处理的基础,为了提高在低信噪比及非平稳噪声下语音端点检测的准确性,提出了一种基于长时信号功率谱变化的语音特征,利用阈值判决法验证了这一特征在语音端点检测中的应用前景。该方法首先统计信号在长时段下功率谱的变化量;然后进行阈值判决,在初始化后可依据每次的判决结果自适应更新阈值;最后通过投票决策机制来判定当前是否为语音帧。仿真结果表明,与两种经典的基于长时特征(长时段信号变化率和长时段信号谱平坦度)的语音端点检测方法相比,所提方法在不同噪声环境及信噪比下,均具有更高的检测准确率,尤其在非平稳噪声条件下的检测效果提升明显,例如在机枪噪声环境下,平均检测准确率提高超过10%。

关键词: 语音端点检测, 长时信号频谱变化, 低信噪比, 非平稳噪声

Abstract: Voice activity detection is the basic work in speech signal processing. In order to improve the accuracy of voice activity detection in low signal-to-noise ratio (SNR) and nonstationary noise, a speech feature based on long-term power spectrum variability (LPSV) is proposed, and the application prospect of this feature in voice activity detection is tested by the threshold decision method. Firstly, the long-term power spectrum variability of the input signal is calculated. Then, a judgment is made with the initial threshold, and the threshold is updated adaptively according to the judgment result. Finally, whether current target frame is voice or not depends on the result of a voting mechanism. The simulation results show that compared with two classical algorithms using long-term feature (long-term signal variability, LTSV and long-term spectral flatness measure, LSFM), the proposed method can achieve higher accuracy in different noise and SNR conditions. Especially in the non-stationary noise environment, the accuracy of voice activity detection is significantly improved: in machine gun noise condition, the average accuracy increases more than 10%.

Key words: voice activity detection, long-term power spectrum variability (LPSV), low signal-to-noise ratio, nonstationary noise