[1] 张会云, 黄鹤鸣, 李伟, 等. 语音情感识别研究综述[J]. 计算机仿真, 2021, 38(8): 7-17.
ZHANG H Y, HUANG H M, LI W, et al. An overview of speech emotion recognition[J]. Computer Simulation, 2021, 38(8): 7-17.
[2] 孙晓虎, 李洪均. 语音情感识别综述[J]. 计算机工程与应用, 2020, 56(11): 1-9.
SUN X H, LI H J. Overview of speech emotion recognition[J]. Computer Engineering and Applications, 2020, 56(11): 1-9.
[3] LIU J J, WU X F. Prototype of educational affective arousal evaluation system based on facial and speech emotion recognition[J]. International Journal of Information and Education Technology, 2019, 9(9): 645-651.
[4] LI H C, PAN T, LEE M H, et al. Make patient consultation warmer: a clinical application for speech emotion recognition[J]. Applied Sciences, 2021, 11(11): 4782.
[5] BADSHAH A M, RAHIM N, ULLAH N, et al. Deep features-based speech emotion recognition for smart affective services[J]. Multimedia Tools and Applications, 2019, 78(5): 5571-5589.
[6] TAN L, YU K P, LIN L, et al. Speech emotion recognition enhanced traffic efficiency solution for autonomous vehicles in a 5G-enabled space-air-ground integrated intelligent transportation system[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(3): 2830-2842.
[7] NASRI H, OUARDA W, ALIMI A M. ReLiDSS: novel lie detection system from speech signal[C]//Proceedings of the 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications. Piscataway: IEEE, 2016: 1-8.
[8] 崔晨露, 崔琳. 面向数据增强的轻量化语音情感识别[J]. 计算机与现代化, 2023(4): 83-89.
CUI C L, CUI L. Lightweight speech emotion recognition for data enhancement[J]. Computer and Modernization, 2023(4): 83-89.
[9] RAYHAN AHMED M, ISLAM S, MUZAHIDUL ISLAM A K M, et al. An ensemble 1D-CNN-LSTM-GRU model with data augmentation for speech emotion recognition[J]. Expert Systems with Applications, 2023, 218: 119633.
[10] 李茜茜, 沈晓燕, 任福继, 等. 面向数据增强的多种语音情感分类算法研究[J]. 智能系统学报, 2021, 16(1): 170-177.
LI Q Q, SHEN X Y, REN F J, et al. Investigation of multiple speech emotion classification algorithms based on data enhancement[J]. CAAI Transactions on Intelligent Systems, 2021, 16(1): 170-177.
[11] TU Z W, LIU B, ZHAO W, et al. A feature fusion model with data augmentation for speech emotion recognition[J]. Applied Sciences, 2023, 13(7): 4124.
[12] YI L, MAK M W. Adversarial data augmentation network for speech emotion recognition[C]//Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. Piscataway: IEEE, 2019: 529-534.
[13] PADI S, MANOCHA D, SRIRAM R D. Multi-window data augmentation approach for speech emotion recognition[EB/OL]. [2024-12-15]. https://arxiv.org/abs/2010.09895.
[14] SINGH P, SRIVASTAVA R, RANA K P S, et al. A multimodal hierarchical approach to speech emotion recognition from audio and text[J]. Knowledge-Based Systems, 2021, 229: 107316.
[15] CHEN Z Z, LIN M T, WANG Z F, et al. Spatio-temporal representation learning enhanced speech emotion recognition with multi-head attention mechanisms[J]. Knowledge-Based Systems, 2023, 281: 111077.
[16] BHANGALE K B, KOTHANDARAMAN M. Speech emotion recognition using the novel PEmoNet (parallel emotion network)[J]. Applied Acoustics, 2023, 212: 109613.
[17] ATILA O, ?ENGüR A. Attention guided 3D CNN-LSTM model for accurate speech based emotion recognition[J]. Applied Acoustics, 2021, 182: 108260.
[18] MORAIS E, HOORY R, ZHU W Z, et al. Speech emotion recognition using self-supervised features[C]//Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2022: 6922-6926.
[19] CAI X Y, YUAN J H, ZHENG R J, et al. Speech emotion recognition with multi-task learning[C]//Proceedings of the Interspeech 2021, 2021: 4508-4512.
[20] 杨锁荣, 杨洪朝, 申富饶, 等. 面向深度学习的图像数据增强综述[J]. 软件学报, 2025, 36(3): 1390-1412.
YANG S R, YANG H C, SHEN F R, et al. Image data augmentation for deep learning: a survey[J]. Journal of Software, 2025, 36(3): 1390-1412.
[21] BAUTISTA J L, LEE Y K, SHIN H S. Speech emotion recognition based on parallel CNN-attention networks with multi-fold data augmentation[J]. Electronics, 2022, 11(23): 3935.
[22] JESTEADT W, NEFF D L. A signal-detection-theory measure of pitch shifts in sinusoids as a function of intensity[J]. The Journal of the Acoustical Society of America, 1982, 72(6): 1812-1820.
[23] ZHANG J, JIA H. Design of speech corpus for mandarin text to speech[C]//Proceedings of the Blizzard Challenge 2008, 2008.
[24] LIVINGSTONE S R, RUSSO F A. The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English[J]. PLoS One, 2018, 13(5): e0196391.
[25] ZHU R F, SUN C X, WEI X P, et al. Speech emotion recognition using channel attention mechanism[C]//Proceedings of the 2023 4th International Conference on Computer Engineering and Application. Piscataway: IEEE, 2023: 680-684.
[26] WANG Z Y, GUO X. Research on mandarin Chinese in speech emotion recognition[C]//Proceedings of the 2022 5th International Conference on Machine Learning and Natural Language Processing. New York: ACM, 2022: 99-103.
[27] 张少华, 冯炎, 余仁杰, 等. 基于SE注意力机制和深度卷积的语音情感识别[J]. 现代电子技术, 2024, 47(22): 64-70.
ZHANG S H, FENG Y, YU R J, et al. Speech emotion recognition based on SE attention mechanism and deep convolution[J]. Modern Electronics Technique, 2024, 47(22): 64-70.
[28] 杜晨阳, 张雪英, 黄丽霞, 等. 基于改进高效通道注意力机制的多特征语音情感识别[J]. 计算机工程, 2025, 51(4): 97-106.
DU C Y, ZHANG X Y, HUANG L X, et al. Multi-feature speech emotion recognition based on improved efficient channel attention mechanism[J]. Computer Engineering, 2025, 51(4): 97-106.
[29] PATEL N, PATEL S, MANKAD S H. Impact of autoencoder based compact representation on emotion detection from audio[J]. Journal of Ambient Intelligence and Humanized Computing, 2022, 13(2): 867-885.
[30] DUTT A, GADER P. Wavelet multiresolution analysis based speech emotion recognition system using 1D CNN LSTM networks[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31: 2043-2054.
[31] ONG K L, LEE C P, LIM H S, et al. Mel-MViTv2: enhanced speech emotion recognition with Mel spectrogram and improved multiscale vision transformers[J]. IEEE Access, 2023, 11: 108571-108579.
[32] 张雨萌, 张欣, 高谋, 等. 融合动态卷积和注意力机制的多层感知机语音情感识别[J]. 计算机科学与探索, 2025, 19(4): 1065-1075.
ZHANG Y M, ZHANG X, GAO M, et al. Incorporating dynamic convolution and attention mechanism in multilayer perceptron for speech emotion recognition[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(4): 1065-1075. |