计算机科学与探索 ›› 2014, Vol. 8 ›› Issue (2): 186-199.DOI: 10.3778/j.issn.1673-9418.1309002

• 人工智能与模式识别 • 上一篇    下一篇

融合功能性副语言的语音情感识别新方法

赵小蕾1,2,毛启容1+,詹永照1   

  1. 1. 江苏大学 计算机科学与通信工程学院,江苏 镇江 212013
    2. 中山大学新华学院,广州 510520
  • 出版日期:2014-02-01 发布日期:2014-01-26

New Method of Speech Emotion Recognition Fusing Functional Paralanguages

ZHAO Xiaolei1,2, MAO Qirong1+, ZHAN Yongzhao1   

  1. 1. School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, Jiangsu 212013, China
    2. Xinhua College of Sun Yat-sen University, Guangzhou 510520, China
  • Online:2014-02-01 Published:2014-01-26

摘要: 针对声音突发特征(笑声、哭声、叹息声等,称之为功能性副语言)携带大量情感信息,而包含这类突发特征的语句由于特征突发性的干扰整体情感识别率不高的问题,提出了融合功能性副语言的语音情感识别方法。该方法首先对待识别语句进行功能性副语言自动检测,根据检测结果将功能性副语言从语句中分离,从而得到较为纯净的两类信号:功能性副语言信号和传统语音信号,最后将两类信号的情感信息使用自适应权重融合方法进行融合,从而达到提高待识别语句情感识别率和系统鲁棒性的目的。在包含6种功能性副语言和6种典型情感的情感语料库上的实验表明:该方法在与人无关的情况下得到的情感平均识别率为67.41%,比线性加权融合、Dempster-Shafer(DS)证据理论、贝叶斯融合方法分别提高了4.2%、2.8%和2.4%,比融合前平均识别率提高了8.08%,该方法针对非特定人语音情感识别具有较好的鲁棒性及识别准确率。

关键词: 语音情感识别, 功能性副语言, 自动检测, 自适应权重, 融合识别

Abstract: According to the problem that sound burst features (laughter, cries, sighs, called functional paralanguages) contain a great deal of emotional information while the sentences containing emotional paralanguages have lower recognition accuracy, this paper proposes a method of speech emotion recognition fusing functional paralanguages. In this method, firstly the automatic detection of functional paralanguages is utilized for sentences. Then the functional paralanguages are separated from sentences based on detection results. Then two more pure types of signals: functional paralanguage and traditional speech are gotten. Finally, the emotional information of functional paralanguage and traditional speech is adaptively fused.?The experimental results on speaker-independent emotion corpus containing six functional paralanguages and six typical emotions show that: average recognition rate of the proposed method is 67.41%, which is higher than the results of linear weighted fusion, Dempster-Shafer (DS) evidence?theory, Bayesian fusion method and before the fusion by 4.2%, 2.8%, 2.4% and 8.08%. Thus, the method has better robustness and recognition accuracy for speaker independent speech emotion recognition.

Key words: speech emotion recognition, functional paralanguage, automatic detection, adaptive weight, fusion recognition