Journal of Frontiers of Computer Science and Technology ›› 2014, Vol. 8 ›› Issue (9): 1120-1128.DOI: 10.3778/j.issn.1673-9418.1407004

Previous Articles     Next Articles

Chinese Accent Detection Method Research Based on Short-Time Spectrum Features

ZHAO Yunxue1, ZHANG Long1,2+, ZHENG Shijie1   

  1. 1. College of Computer Science and Information Engineering, Harbin Normal University, Harbin 150025, China
    2. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
  • Online:2014-09-01 Published:2014-09-03


赵云雪1,张  珑1,2+,郑世杰1   

  1. 1. 哈尔滨师范大学 计算机科学与信息工程学院,哈尔滨 150025
    2. 哈尔滨工业大学 计算机科学与技术学院,哈尔滨 150001

Abstract: Accent is a critically important component of spoken communication, and plays a very important role in spoken communication. In order to verify the effect of short-time spectrum feature set based on auditory model in Chinese accent detection method, this paper uses MFCC (Mel frequency cepstrum coefficient) algorithm and RASTA-
PLP (relative spectra perceptual linear prediction) algorithm to extract each voice segment of short-time spectrum information, and builds short-time spectrum feature sets based on MFCC algorithm and RASTA-PLP algorithm. Then, it chooses NaiveBayes classifier to model the two feature sets, and chooses the classes with maximum a posteriori probability as the object’s class. This classification method makes full use of the related phonetic features of speech segment. Short-time spectrum feature set based on MFCC and short-time spectrum feature set based on RASTA-
PLP respectively achieve 82.1% and 80.8% accent detection accuracy on ASCCD (annotated speech corpus of Chinese discourse). The experimental results indicate that short-time spectrum features based on MFCC and short-time spectrum features based on RASTA - PLP can be used for Chinese accent detection research.

Key words: accent detection, Mel frequency cepstrum coefficient (MFCC), relative spectra perceptual linear prediction (RASTA-PLP), short-time spectrum features

摘要: 重音是语言交流中不可或缺的部分,在语言交流中扮演着非常重要的角色。为了验证基于听觉模型的短时谱特征集在汉语重音检测方法中的应用效果,使用MFCC(Mel frequency cepstrum coefficient)和RASTA-
PLP(relative spectra perceptual linear prediction)算法提取每个语音段的短时谱信息,分别构建了基于MFCC算法的短时谱特征集和基于RASTA-PLP算法的短时谱特征集;选用NaiveBayes分类器对这两类特征集进行建模,把具有最大后验概率的类作为该对象所属的类,这种分类方法充分利用了当前语音段的相关语音特性;基于MFCC的短时谱特征集和基于RASTA-PLP的短时谱特征集在ASCCD(annotated speech corpus of Chinese discourse)上能够分别得到82.1%和80.8%的汉语重音检测正确率。实验结果证明,基于 MFCC的短时谱特征和基于RASTA-PLP的短时谱特征能用于汉语重音检测研究。

关键词: 重音检测, Mel频率倒谱系数(MFCC), 相关谱感知线性预测(RASTA-PLP), 短时谱特征