Journal of Frontiers of Computer Science and Technology ›› 2024, Vol. 18 ›› Issue (12): 3300-3310.DOI: 10.3778/j.issn.1673-9418.2402013
• Artificial Intelligence·Pattern Recognition • Previous Articles Next Articles
ZHANG Shiqing, CHEN Chen, ZHAO Xiaoming
Online:
2024-12-01
Published:
2024-11-29
张石清,陈晨,赵小明
ZHANG Shiqing, CHEN Chen, ZHAO Xiaoming. Speech Emotion Recognition Using Two-Stage Multiple Instance Learning Networks[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(12): 3300-3310.
张石清, 陈晨, 赵小明. 采用双阶段多示例学习网络的语音情感识别[J]. 计算机科学与探索, 2024, 18(12): 3300-3310.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2402013
[1] JIANG H, HU B, LIU Z, et al. Investigation of different speech types and emotions for detecting depression using different classifiers[J]. Speech Communication, 2017, 90: 39-46. [2] DESCHAMPS-BERGER T, LAMEL L, DEVILLERS L. Investigating transformer encoders and fusion strategies for speech emotion recognition in emergency call center conversa-tions[C]//Proceedings of the 2022 International Conference on Multimodal Interaction, Bengaluru, Nov 7-11, 2022: 144-153. [3] DISSANAYAKE V, ZHANG H, BILLINGHURST M, et al. Speech emotion recognition ‘in the wild’ using an auto-encoder[C]//Proceedings of the 21st Annual Conference of the International Speech Communication Association, Shanghai, Oct 25-29, 2020: 526-530. [4] HOSSAIN M S, MUHAMMAD G, SONG B, et al. Audio-visual emotionaware cloud gaming framework[J]. IEEE Tran-sactions on Circuits and Systems for Video Technology, 2015, 25(12): 2105-2118. [5] BANDELA S R, KUMAR T K. Stressed speech emotion recognition using feature fusion of teager energy operator and MFCC[C]//Proceedings of the 2017 8th International Conference on Computing, Communication and Networking Technologies. Piscataway: IEEE, 2017: 1-5. [6] EL AYADI M, KAMEL M S, KARRAY F. Survey on speech emotion recognition: features, classification schemes, and databases[J]. Pattern recognition, 2011, 44(3): 572-587. [7] 赵小明, 杨轶娇, 张石清. 面向深度学习的多模态情感识别研究进展[J]. 计算机科学与探索, 2022, 16(7): 1479-1503. ZHAO X M, YANG Y J, ZHANG S Q. Survey of deep learning based multimodal emotion recognition[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1479-1503. [8] 刘振焘, 徐建平, 吴敏, 等. 语音情感特征提取及其降维方法综述[J]. 计算机学报, 2018, 41(12): 2833-2851. LIU Z T, XU J P, WU M, et al. Review of emotional feature extraction and dimension reduction method for speech emotion recognition[J]. Chinese Journal of Computers, 2018, 41(12): 2833-2851. [9] 韩文静, 李海峰, 阮华斌,等. 语音情感识别研究进展综述[J]. 软件学报, 2014, 25(1): 37-50. HAN W J, LI H F, RUAN H B, et al. Review on speech emotion recognition[J]. Journal of Software, 2014, 25(1): 37-50. [10] 郑纯军, 王春立, 贾宁. 语音任务下声学特征提取综述[J]. 计算机科学, 2020, 47(5): 110-119. ZHENG C J, WANG C L, JIA N. Survey of acoustic feature extraction in speech tasks[J]. Computer Science, 2020, 47(5): 110-119. [11] SCHMIDHUBER J. Deep learning in neural networks: an overview[J]. Neural Networks, 2015, 61: 85-117. [12] GU J, WANG Z, KUEN J, et al. Recent advances in convolutional neural networks[J]. Pattern Recognition, 2018, 77: 354-377. [13] GUIZZO E, WEYDE T, SCARDAPANE S, et al. Learning speech emotion representations in the quaternion domain[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31: 1200-1212. [14] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Confe-rence on Computer Vision and Pattern Recognition. Washington:IEEE Computer Society, 2016: 770-778. [15] YU Y, SI X, HU C, et al. A review of recurrent neural networks: LSTM cells and network architectures[J]. Neural Computation, 2019, 31(7): 1235-1270. [16] LIU Z, KANG X, REN F. Dual-TBNet: improving the robust-ness of speech features via dual-Transformer-BiLSTM for speech emotion recognition[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31: 2193-2203. [17] HU J, LIU Y, ZHAO J, et al. MMGCN: multimodal fusion via deep graph convolution network for emotion recognition in conversation[EB/OL]. [2023-12-03]. https://arxiv.org/abs/2107.06779. [18] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008. [19] LIANG J, LI R, JIN Q. Semi-supervised multi-modal emotion recognition with cross-modal distribution matching[C]//Proceedings of the 28th ACM International Conference on Multimedia. New York: ACM, 2020: 2852-2861. [20] LIAN Z, LIU B, TAO J. CTNet: conversational transformer network for emotion recognition[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 985-1000. [21] CHUDASAMA V, KAR P, GUDMALWAR A, et al. M2FNet: multi-modal fusion network for emotion recognition in conversation[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 4652-4661. [22] ZHAO J, MAO X, CHEN L. Speech emotion recognition using deep 1D & 2D CNN LSTM networks[J]. Biomedical Signal Processing and Control, 2019, 47: 312-323. [23] ZHANG S, ZHAO X, TIAN Q. Spontaneous speech emotion recognition using multiscale deep convolutional LSTM[J]. IEEE Transactions on Affective Computing, 2019, 13(2): 680-688. [24] 李锦, 夏鸿斌, 刘渊. 基于BERT的双特征融合注意力的方面情感分析模型[J]. 计算机科学与探索, 2024, 18(1): 205-216. LI J, XIA H B, LIU Y. Dual features local-global attention model with BERT for aspect sentiment analysis[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(1): 205-216. [25] AK?AY M B, O?UZ K. Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers[J]. Speech Communication, 2020, 116: 56-76. [26] DE LOPE J, GRA?A M. An ongoing review of speech emotion recognition[J]. Neurocomputing, 2023, 528: 1-11. [27] HOU M, ZHANG Z, LU G. Multi-modal emotion recognition with self-guided modality calibration[C]//Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2022: 4688-4692. [28] FAN W, XU X, CAI B, et al. ISNet: individual standardization network for speech emotion recognition[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 30: 1803-1814. [29] HU D, HOU X, WEI L, et al. MM-DFN: multimodal dynamic fusion network for emotion recognition in conversations[C]//Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway:IEEE, 2022: 7037-7041. [30] ILSE M, TOMCZAK J, WELLING M. Attention-based deep multiple instance learning[C]//Proceedings of the 35th International Conference on Machine Learning, Stockholmsm?s-san, Jul 10-15, 2018: 2132-2141. [31] MAO S, CHING P C, LEE T. Deep learning of segment-level feature representation with multiple instance learning for utterance-level speech emotion recognition[C]//Proceedings of the 20th Annual Conference of the International Speech Communication Association, Graz, Sep 15-19, 2019: 1686-1690. [32] FU C, LIU C, ISHI C T, et al. MAEC: multi-instance learning with an adversarial auto-encoder-based classifier for speech emotion recognition[C]//Proceedings of the 2021 IEEE Inter-national Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2021: 6299-6303. [33] ZOU H, SI Y, CHEN C, et al. Speech emotion recognition with co-attention based multi-level acoustic information[C]//Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2022: 7367-7371. [34] BUSSO C, BULUT M, LEE C C, et al. IEMOCAP: interactive emotional dyadic motion capture database[J]. Language Res-ources and Evaluation, 2008, 42: 335-359. [35] PORIA S, HAZARIKA D, MAJUMDER N, et al. MELD: a multimodal multi-party dataset for emotion recognition in conversations[EB/OL]. [2023-12-03]. https://arxiv.org/abs/1810.02508. [36] LI B, LI Y, ELICEIRI K W. Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 14318-14328. [37] WANG X, YAN Y, TANG P, et al. Revisiting multiple instance neural networks[J]. Pattern Recognition, 2018, 74: 15-24. [38] LIU Y, GADEPALLI K, NOROUZI M, et al. Detecting cancer metastases on gigapixel pathology images[EB/OL]. [2023-12-03]. https://arxiv.org/abs/1703.02442. [39] BAEVSKI A, ZHOU Y, MOHAMED A, et al. wav2vec 2.0: a framework for self-supervised learning of speech representations[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 12449-12460. [40] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems 25,Lake Tahoe, Dec 3-6, 2012: 1106-1114. [41] ZHANG H, MENG Y, ZHAO Y, et al. DTFD-MIL: double-tier feature distillation multiple instance learning for histopathology whole slide image classification[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 18802-18812. [42] CAO Q, HOU M, CHEN B, et al. Hierarchical network based on the fusion of static and dynamic features for speech emotion recognition[C]//Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2021: 6334-6338. [43] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Washington: IEEE Computer Society, 2017: 2980-2988. |
[1] | TAN Lijun, HU Yanli, CAO Jianwei, TAN Zhen. Document-Level Event Detection Method Based on Information Aggregation and Data Augmentation [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(11): 3015-3026. |
[2] | ZHAO Xiaoyan, SONG Wei. Attention Learning Particle Swarm Optimization Algorithm Guided by Aggrega-tion Indicator [J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(8): 1852-1866. |
[3] | LYU Jia, MA Chao, CHENG Chao. Improved U-Net Network for Retinal Vascular Segmentation [J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(3): 657-666. |
[4] | TU Xiaomei, BAO Xiao'an, WU Biao, JIN Yuting, ZHANG Qingqi. Object Detection Algorithm for 3D Coordinate Attention Path Aggregation Network [J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(12): 2984-2998. |
[5] | WANG Tiedan, ZHANG Yuqing, PENG Dinghong. Hierarchical Multi-attribute Decision-Making Method with Twofold Integral Operator of Cloud Model [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1898-1909. |
[6] | GUO Xiaowang, XIA Hongbin, LIU Yuan. Hybrid Recommendation Model of Knowledge Graph and Graph Convolutional Network [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1343-1353. |
[7] | WANG Baoliang, PAN Wencai. Two-Terminal Neighbor Information Fusion Recommendation Algorithm Based on Knowledge Graph [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1354-1361. |
[8] | GU Jia, FANG Zhijun, TIAN Fangzheng. Global Feature and Multi-level Feature Aggregation Segmentation Algorithm for Coronary [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(5): 958-970. |
[9] | JI Zhong, LI Huihui, HE Yuqing. Zero-Shot Multi-Label Image Classification Based on Deep Instance Differentiation [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(1): 97-105. |
[10] | SHAN Xiaohuan, WANG Guangxiang, SONG Baoyan, DING Linlin, XU Yan. Frequent Subgraph Top-K Query with Label Constraint on Large-Scale Dynamic Graph [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(11): 1740-1747. |
[11] | SHEN Jinxin, WU Ye, CHEN Luo, JING Ning. Parallel Approximate Aggregation Query for Spatial Online Analysis [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(10): 1559-1570. |
[12] | SU Hui, GE Hongwei, ZHANG Tao. Density Adaptive Data Competition Clustering Algorithm [J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(10): 1439-1450. |
[13] | HAN Erdong, GUO Peng, ZHAO Jing. Method for Multi-Attribute Group Decision Making Based on Interval Grey Uncertain Linguistic Information [J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(1): 93-102. |
[14] | ZHAN Hang, SU Yong, LIU Huawen. Migrativity of Discrete Conjunctive Aggregation Operations [J]. Journal of Frontiers of Computer Science and Technology, 2015, 9(6): 756-760. |
[15] | GU Yanhui, ZHAO Bin, ZHOU Junsheng, QU Weiguang. Efficient Top-k Similar Short Texts Extraction Algorithm [J]. Journal of Frontiers of Computer Science and Technology, 2014, 8(8): 919-932. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/