[1] CHEN Z, LIU B. Lifelong machine learning[M]. San Rafael: Morgan & Claypool Publishers, 2018.
[2] PICARD R W. Affective computing for HCI[C]//Human-Computer Interaction: Ergonomics and User Interfaces, Proceedings of the 8th International Conference on Human-Computer Interaction, 1999: 829-833.
[3] SCHULLER B W. Speech emotion recognition[J]. Communications of the ACM, 2018, 61(5): 90-99.
[4] 赵小明, 杨轶娇, 张石清. 面向深度学习的多模态情感识别研究进展[J]. 计算机科学与探索, 2022, 16(7): 1479-1503.
ZHAO X M, YANG Y J, ZHANG S Q. Survey of deep learning based multimodal emotion recognition[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1479-1503.
[5] RISH I. An empirical study of the naive Bayes classifier[C]//Proceedings of the 2001 Workshop on Empirical Methods in Artificial Intelligence, 2001: 41-46.
[6] WINDEATT T. Accuracy/diversity and ensemble MLP classifier design[J]. IEEE Transactions on Neural Networks, 2006, 17(5): 1194-1211.
[7] 陈闯, RYAD C, 邢尹. 改进GWO优化SVM的语音情感识别研究[J]. 计算机工程与应用, 2018, 54(16): 113-118.
CHEN C, CHELLALI R, XING Y. Research on speech emotion recognition based on improved GWO optimized SVM[J]. Computer Engineering and Applications, 2018, 54(16): 113-118.
[8] MIRSAMADI S, BARSOUM E, ZHANG C. Automatic speech emotion recognition using recurrent neural networks with local attention[C]//Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2017: 2227-2231.
[9] GUO L L, WANG L B, DANG J W, et al. A feature fusion method based on extreme learning machine for speech emotion recognition[C]//Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2018: 2666-2670.
[10] TZIRAKIS P, ZHANG J H, SCHULLER B W. End-to-end speech emotion recognition using deep neural networks[C]//Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2018: 5089-5093.
[11] 张石清, 陈晨, 赵小明. 采用双阶段多示例学习网络的语音情感识别[J]. 计算机科学与探索, 2024, 18(12): 3300-3310.
ZHANG S Q, CHEN C, ZHAO X M. Speech emotion recognition using two-stage multiple instance learning networks[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(12): 3300-3310.
[12] 李锦, 夏鸿斌, 刘渊. 基于BERT的双特征融合注意力的方面情感分析模型[J]. 计算机科学与探索, 2024, 18(1): 205-216.
LI J, XIA H B, LIU Y. Dual features local-global attention model with BERT for aspect sentiment analysis[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(1): 205-216.
[13] SCHULLER B, VLASENKO B, EYBEN F, et al. Cross-corpus acoustic emotion recognition: variances and strategies[J]. IEEE Transactions on Affective Computing, 2010, 1(2): 119-131.
[14] WANG Y Z, BOUMADANE A, HEBA A. A fine-tuned Wav2vec 2.0/HuBERT benchmark for speech emotion recognition, speaker verification and spoken language understanding[EB/OL]. [2024-06-19]. https://arxiv.org/abs/2111. 02735.
[15] CHEN L W, RUDNICKY A. Exploring Wav2vec 2.0 fine tuning for improved speech emotion recognition[C]//Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2023: 1-5.
[16] GONG Y, LAI C I, CHUNG Y A, et al. SSAST: self-supervised audio spectrogram transformer[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(10): 10699-10709.
[17] LIU S, MALLOL-RAGOLTA A, PARADA-CABALEIRO E, et al. Audio self-supervised learning: a survey[J]. Patterns, 2022, 3(12): 100616.
[18] PEPINO L, RIERA P, FERRER L. Emotion recognition from speech using wav2vec 2.0 embeddings[C]//Proceedings of the Interspeech 2021, 2021: 3400-3404.
[19] KAKOUROS S, STAFYLAKIS T, MO?NER L, et al. Speech-based emotion recognition with self-supervised models using attentive channel-wise correlations and label smoothing[C]//Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2023: 1-5.
[20] SADOK S, LEGLAIVE S, SéGUIER R. A vector quantized masked autoencoder for speech emotion recognition[C]//Proceedings of the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops. Piscataway: IEEE, 2023: 1-5.
[21] XIAO Y F, BO Y Q, ZHENG Z L. Speech emotion recognition based on semi-supervised adversarial variational autoencoder[C]//Proceedings of the 2023 IEEE 10th International Conference on Cyber Security and Cloud Computing/2023 IEEE 9th International Conference on Edge Computing and Scalable Cloud. Piscataway: IEEE, 2023: 275-280.
[22] LU C, ZONG Y, ZHENG W M, et al. Domain invariant feature learning for speaker-independent speech emotion recognition[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 30: 2217-2230.
[23] YU Y, SI X S, HU C H, et al. A review of recurrent neural networks: LSTM cells and network architectures[J]. Neural Computation, 2019, 31(7): 1235-1270.
[24] TARANTINO L, GARNER P N, LAZARIDIS A. Self-attention for speech emotion recognition[C]//Proceedings of the Interspeech 2019, 2019: 2578-2582.
[25] LU C, LIAN H L, ZHENG W M, et al. Learning local to global feature aggregation for speech emotion recognition[C]//Proceedings of the Interspeech 2023, 2023: 1908-1912.
[26] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[27] VAN DE VEN G M, TOLIAS A S. Three scenarios for continual learning[EB/OL]. [2024-06-19]. https://arxiv.org/abs/1904. 07734.
[28] ZENKE F, POOLE B, GANGULI S. Continual learning through synaptic intelligence[J]. Proceedings of Machine Learning Research, 2017, 70: 3987-3995.
[29] WU Y, CHEN Y P, WANG L J, et al. Large scale incremental learning[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 374-382.
[30] KONISHI T, KUROKAWA M, ONO C, et al. Parameter-level soft-masking for continual learning[C]//Proceedings of the 40th International Conference on Machine Learning, 2023: 17492-17505.
[31] XIANG Y, FU Y, JI P, et al. Incremental learning using conditional adversarial networks[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 6618-6627.
[32] QIN Q, HU W, PENG H, et al. BNS: building network structures dynamically for continual learning[C]//Advances in Neural Information Processing Systems 34, 2021: 20608- 20620.
[33] KINGMA D P, WELLING M. Auto-encoding variational Bayes [EB/OL]. [2024-06-23]. https://arxiv.org/abs/1312.6114.
[34] DING X H, CHEN H H, ZHANG X Y, et al. Re-parameterizing your optimizers rather than architectures[EB/OL]. [2024-06-23]. https://arxiv.org/abs/2205.15242.
[35] VAN D OORD A, VINYALS O. Neural discrete representation learning[C]//Advances in Neural Information Processing Systems 30, 2017: 6306-6315.
[36] BUSSO C, BULUT M, LEE C C, et al. IEMOCAP: interactive emotional dyadic motion capture database[J]. Language Resources and Evaluation, 2008, 42(4): 335-359.
[37] FAYEK H M, LECH M, CAVEDON L. Evaluating deep learning architectures for speech emotion recognition[J]. Neural Networks, 2017, 92: 60-68.
[38] MCFEE B, RAFFEL C, LIANG D W, et al. Librosa: audio and music signal analysis in Python[C]//Proceedings of the 14th Python in Science Conference, 2015: 18-24.
[39] GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks[C]//Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, 2010: 249-256.
[40] YENIGALLA P, KUMAR A, TRIPATHI S, et al. Speech emotion recognition using spectrogram & phoneme embedding[C]//Proceedings of the Interspeech 2018, 2018: 3688-3692.
[41] BHOSALE S, CHAKRABORTY R, KOPPARAPU S K. Deep encoded linguistic and acoustic cues for attention based end to end speech emotion recognition[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2020: 7189-7193.
[42] ZHANG H R, MIMURA M, KAWAHARA T, et al. Selective multi-task learning for speech emotion recognition using corpora of different styles[C]//Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2022: 7707-7711.
[43] SANTOSO J, YAMADA T, MAKINO S, et al. Speech emotion recognition based on attention weight correction using word-level confidence measure[C]//Proceedings of the Interspeech 2021, 2021: 1947-1951.
[44] CHEN M, ZHAO X. A multi-scale fusion framework for bimodal speech emotion recognition[C]//Proceedings of the Interspeech 2020, 2020: 374-378. |