计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (6): 1225-1248.DOI: 10.3778/j.issn.1673-9418.2210114
张如琳,王海龙,柳林,裴冬梅
出版日期:
2023-06-01
发布日期:
2023-06-01
ZHANG Rulin, WANG Hailong, LIU Lin, PEI Dongmei
Online:
2023-06-01
Published:
2023-06-01
摘要: 音乐是目前最受欢迎的艺术和娱乐形式之一,是表达或寄托人们感情的艺术语言,然而随着数字音乐急剧增加,通过浅层的信息管理与筛选音乐十分困难。音乐自动标注作为一种组织海量音乐与丰富音乐信息的有效手段,可克服音乐信息检索语义鸿沟,健全音乐信息,使音乐具有更直观的表达,并推动音乐分类、音乐推荐、乐器识别等音乐信息检索任务的深入研究。当前音乐自动标注主要聚焦于解决特征提取、模型选择两类问题,结合目前研究重点,阐述了音乐自动标注的相关知识;系统地梳理了音乐自动标注领域的各类音频特征表示及特征提取方法,并对每类提取方法进行了定量分析与定性分析;归纳了该领域相关研究成果,从机器学习与深度学习两个角度着重分析了不同模型方法的差异性;介绍了常用数据集与性能评价指标,总结了不同数据集特点,并对评价指标进行了归类分析;最后指出了音乐自动标注领域研究面临的难点与挑战,并对未来研究方向进行了展望。
张如琳, 王海龙, 柳林, 裴冬梅. 音乐自动标注分类方法研究综述[J]. 计算机科学与探索, 2023, 17(6): 1225-1248.
ZHANG Rulin, WANG Hailong, LIU Lin, PEI Dongmei. Survey of Research on Automatic Music Annotation and Classification Methods[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(6): 1225-1248.
[1] 李伟, 李子晋, 高永伟. 理解数字音乐——音乐信息检索技术综述[J]. 复旦学报(自然科学版), 2018, 57(3): 271-313. LI W, LI Z J, GAO Y W. Understanding digital music—a review of music information retrieval technology[J]. Journal of Fudan University (Natural Science), 2018, 57(3): 271-313. [2] 李伟, 高智辉. 音乐信息检索技术: 音乐与人工智能的融合[J]. 艺术探索, 2018, 32(5): 112-116. LI W, GAO Z H. Music information retrieval technology: fusion of music and artificial intelligence[J]. Arts Exploration, 2018, 32(5): 112-116. [3] ECK D, LAMERE P, BERTIN-MAHIEUX T, et al. Automatic generation of social tags for music recommendation[C]//Advances in Neural Information Processing Systems 20: Proceedings of the 21st Annual Conference on Neural Information Processing Systems, Vancouver, Dec 3-6, 2007: 385-392. [4] FU Z, LU G, TING K M, et al. A survey of audio-based music classification and annotation[J]. IEEE Transactions on Multimedia, 2010, 13(2): 303-319. [5] SONG Y, ZHUANG Z, LI H, et al. Real-time automatic tag recommendation[C]//Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, Jul 20-24, 2008. New York: ACM, 2008: 515-522. [6] NAM J, HERRERA J, LEE K. A deep bag-of-features model for music auto-tagging[J]. arXiv:1508.04999, 2015. [7] BERTIN-MAHIEUX T, ECK D, MANDEL M. Automatic tagging of audio: the state-of-the-art[M]//Machine Audition: Principles, Algorithms and Systems. Hershey: IGI Global, 2011: 334-352. [8] GONÇALVES T, QUARESMA P. A preliminary approach to the multilabel classification problem of portuguese juridical documents[C]//LNCS 2902: Proceedings of the 11th Portuguese Conference on Artificial Intelligence, Beja, Dec 4-7, 2003. Berlin, Heidelberg: Springer, 2003: 435-444. [9] READ J, PFAHRINGER B, HOLMES G, et al. Classifier chains for multi-label classification[J]. Machine Learning, 2011, 85(3): 333-359. [10] 林博俊. 基于文本的音乐标签标注[D]. 广州: 华南理工大学, 2020. LIN B J. Music tagging based on text[D]. Guangzhou: South China University of Technology, 2020. [11] 孟镇, 王昊, 虞为, 等. 基于特征融合的声音分类研究[J]. 数据分析与知识发现, 2021, 5(5): 59-70. MENG Z, WANG H, YU W, et al. Vocal music classification based on multi-category feature fusion[J]. Data Analysis and Knowledge Discovery, 2021, 5(5): 59-70. [12] WANG J C, YANG Y H, WANG H M, et al. The acoustic emotion Gaussians model for emotion-based music annotation and retrieval[C]//Proceedings of the 20th ACM International Conference on Multimedia, Nara, Oct 29-Nov 2, 2012. New York: ACM, 2012: 89-98. [13] WANG J C, LEE Y S, CHIN Y H, et al. Hierarchical Dirichlet process mixture model for music emotion recognition[J]. IEEE Transactions on Affective Computing, 2015, 6(3): 261-271. [14] SORDO M. Semantic annotation of music collections: a computational approach[D]. Barcelona: Universitat Pompeu Fabra, 2012. [15] NESS S R, THEOCHARIS A, TZANETAKIS G, et al. Improving automatic music tag annotation using stacked generalization of probabilistic svm outputs[C]//Proceedings of the 17th ACM International Conference on Multimedia, Beijing, Oct 19-24, 2009. New York: ACM, 2009: 705-708. [16] RAJENDRAN S, ANANDARAJ S P. Implementation of computationally efficient and accurate music auto tagging[C]//Proceedings of the 2022 8th International Conference on Advanced Computing and Communication Systems, Coimbatore, Mar 25-26, 2022. Piscataway: IEEE, 2022: 1630-1635. [17] ULLRICH K, SCHLüTER J, GRILL T. Boundary detection in music structure analysis using convolutional neural networks[C]//Proceedings of the 15th International Society for Music Information Retrieval Conference, Taipei, China, Oct 27-31, 2014: 417-422. [18] SCHLüTER J, B?CK S. Improved musical onset detection with convolutional neural networks[C]//Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, May 4-9, 2014. Piscataway: IEEE, 2014: 6979-6983. [19] NAYYAR R K, NAIR S, PATIL O, et al. Content-based auto-tagging of audios using deep learning[C]//Proceedings of the 2017 International Conference on Big Data, IoT and Data Science, Pune, Dec 20-22, 2017. Piscataway: IEEE, 2017: 30-36. [20] CHOI K, FAZEKAS G, SANDLER M B. Automatic tagging using deep convolutional neural networks[C]//Proceedings of the 17th International Society for Music Information Retrieval Conference, New York, Aug 7-11, 2016: 805-811. [21] FERRARO A, BOGDANOV D, JAY X S, et al. How low can you go? Reducing frequency and time resolution in current CNN architectures for music auto-tagging[C]//Proceedings of the 2020 28th European Signal Processing Conference, Amsterdam, Jan 18-21, 2021. Piscataway: IEEE, 2021: 131-135. [22] CHOI K, FAZEKAS G, SANDLER M, et al. A comparison of audio signal preprocessing methods for deep neural networks on music tagging[C]//Proceedings of the 2018 26th European Signal Processing Conference, Rome, Sep 3-7, 2018. Piscataway: IEEE, 2018: 1870-1874. [23] WON M, CHUN S, NIETO O, et al. Data-driven harmonic filters for audio representation learning[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, May 4-8, 2020. Piscataway: IEEE, 2020: 536-540. [24] NAM J, HERRERA J, SLANEY M, et al. Learning sparse feature representations for music annotation and retrieval[C]//Proceedings of the 13th International Society for Music Information Retrieval Conference, Porto, Oct 8-12, 2012: 565-570. [25] DIELEMAN S, SCHRAUWEN B. Multiscale approaches to music audio feature learning[C]//Proceedings of the 14th International Society for Music Information Retrieval Conference, Curitiba, Nov 4-8, 2013: 3-8. [26] VAN DEN OORD A, DIELEMAN S, SCHRAUWEN B. Transfer learning by supervised pre-training for audio-based music classification[C]//Proceedings of the 15th International Society for Music Information Retrieval Conference, Taipei, China, Oct 27-31, 2014: 29-34. [27] JU C, HAN L, PENG G. Music auto-tagging based on attention mechanism and multi-label classification[C]//Proceedings of the 2021 International Conference on Image, Vision and Intelligent Systems, Changsha, Jun 18-20, 2021. Singapore: Springer, 2022: 245-255. [28] DIELEMAN S, SCHRAUWEN B. End-to-end learning for music audio[C]//Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, May 4-9, 2014. Piscataway: IEEE, 2014: 6964-6968. [29] LEE J, PARK J, KIM K L, et al. Sample-level deep convolutional neural networks for music auto-tagging using raw waveforms[J]. arXiv:1703.01789, 2017. [30] LEE J, PARK J, KIM K L, et al. SampleCNN: end-to-end deep convolutional neural networks using very small filters for music classification[J]. Applied Sciences, 2018, 8(1): 150. [31] PONS J, NIETO O, PROCKUP M, et al. End-to-end learning for music audio tagging at scale[C]//Proceedings of the 19th International Society for Music Information Retrieval Conference, Paris, Sep 23-27, 2018: 634-644. [32] PONS J, SERRA X. Designing efficient architectures for modeling temporal features with convolutional neural networks[C]//Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, Mar 5-9, 2017. Piscataway: IEEE, 2017: 2472-2476. [33] PONS J, SLIZOVSKAIA O, GONG R, et al. Timbre analysis of music audio signals with convolutional neural networks[C]//Proceedings of the 2017 25th European Signal Processing Conference, Kos island, Aug 28-Sep 2, 2017. Piscataway: IEEE, 2017: 2744-2748. [34] SONG G, WANG Z, HAN F, et al. Music auto-tagging using scattering transform and convolutional neural network with self-attention[J]. Applied Soft Computing, 2020, 96: 106702. [35] 马春艳, 刘永坚, 解庆, 等. 自动图像标注技术综述[J]. 计算机研究与发展, 2020, 57(11): 2348-2374. MA C Y, LIU Y J, XIE Q, et al. Review of automatic image annotation technology[J]. Journal of Computer Research and Development, 2020, 57(11): 2348-2374. [36] MANDEL M I, ELLIS D P W. Multiple-instance learning for music information retrieval[C]//Proceedings of the 9th International Society for Music Information Retrieval Conference, Philadelphia, Sep 14-18, 2008: 577-582. [37] WANG Q, XIONG Y, SU F. Semantic music annotation by label-specific conditional random fields[C]//Proceedings of the 2018 24th International Conference on Pattern Recognition, Beijing, Jan 10-15, 2021. Piscataway: IEEE, 2018: 2941-2946. [38] TURNBULL D, BARRINGTON L, TORRES D, et al. Semantic annotation and retrieval of music and sound effects[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16(2): 467-476. [39] CHEN Z S, JANG J S R. On the use of anti-word models for audio music annotation and retrieval[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2009, 17(8): 1547-1556. [40] MIOTTO R, LANCKRIET G. A generative context model for semantic music annotation and retrieval[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 20(4): 1096-1108. [41] 王诗俊, 陈宁. 基于混合判别受限波兹曼机的音乐自动标注算法[J]. 华东理工大学学报(自然科学版), 2017, 43(4): 540-545. WANG S J, CHEN N. Annotating music with hybrid discriminative restricted Boltzmann machines[J]. Journal of East China University of Science and Technology (Natural Science Edition), 2017, 43(4): 540-545. [42] 方晔玮, 王铭涛, 陈文亮, 等. 基于自动弱标注数据的跨领域命名实体识别[J]. 中文信息学报, 2022, 36(3): 73-81. FANG Y W, WANG M T, CHEN W L, et al. Cross-domain NER using automatically partial-annotated data[J]. Journal of Chinese Information Processing, 2022, 36(3): 73-81. [43] NING K P, ZHAO X, LI Y, et al. Active learning for open-set annotation[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, Jun 18-24, 2022, Piscataway: IEEE, 2022: 41-49. [44] 汤亦凡. 基于胶囊网络的音乐自动标注算法及其应用研究[D]. 成都: 电子科技大学, 2021. TANG Y F. Research on music auto-tagging based on capsule network and its application[D]. Chengdu: University of Electronic Science and Technology of China, 2021. [45] KIM T, LEE J, NAM J. Sample-level CNN architectures for music auto-tagging using raw waveforms[C]//Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, Apr 15-20, 2018. Piscataway: IEEE, 2018: 366-370. [46] YU Y B, QI M H, TANG Y F, et al. A sample-level DCNN for music auto-tagging[J]. Multimedia Tools and Applications, 2021, 80(8): 11459-11469. [47] OORD A, DIELEMAN S, ZEN H, et al. WaveNet: a generative model for raw audio[J]. arXiv:1609.03499, 2016. [48] LEE J, NAM J. Multi-level and multi-scale feature aggregation using pretrained convolutional neural networks for music auto-tagging[J]. IEEE Signal Processing Letters, 2017, 24(8): 1208-1212. [49] LIU J Y, YANG Y H. Event localization in music auto-tagging[C]//Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, Oct 15-19, 2016. New York: ACM, 2016: 1048-1057. [50] CHOI K, FAZEKAS G, CHO K, et al. The effects of noisy labels on deep convolutional neural networks for music tagging[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2018, 2(2): 139-149. [51] 王振宇, 张瑞, 高宇轩, 等. 基于标签深度分析的音乐自动标注算法[J]. 华南理工大学学报(自然科学版), 2019, 47(8): 71-76. WANG Z Y, ZHANG R, GAO Y X, et al. Music auto-tagging algorithm based on deep analysis on labels[J]. Journal of South China University of Technology (Natural Science Edition), 2019, 47(8): 71-76. [52] LU R, ZHENG B, HAI J, et al. Progressive teacher-student training framework for music tagging[C]//Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, Singapore, May 23-27, 2022. Piscataway: IEEE, 2022: 3129-3133. [53] LIN Y H, CHUNG C H, CHEN H H. Playlist-based tag propagation for improving music auto-tagging[C]//Proceedings of the 2018 26th European Signal Processing Conference, Rome, Sep 3-7, 2018. Piscataway: IEEE, 2018: 2270-2274. [54] LIN Y H, CHEN H H. Tag propagation and cost-sensitive learning for music auto-tagging[J]. IEEE Transactions on Multimedia, 2020, 23: 1605-1616. [55] WON M, FERRARO A, BOGDANOV D, et al. Evaluation of CNN-based automatic music tagging models[J]. arXiv: 2006.00751, 2020. [56] WON M, CHUN S, NIETO CABALLERO O, et al. Automatic music tagging with harmonic CNN[EB/OL]. (2019)[2023-02-22]. http: //won2019ismirlbd.pdf (sanghyukchun. github.io). [57] HUA C, WU S, GUAN X. New robust stability condition for discrete-time recurrent neural networks with time-varying delays and nonlinear perturbations[J]. Neurocomputing, 2017, 219: 203-209. [58] WANG Z, WANG J, WU Y. State estimation for recurrent neural networks with unknown delays: a robust analysis approach[J]. Neurocomputing, 2017, 227: 29-36. [59] CHOI K, FAZEKAS G, SANDLER M, et al. Convolutional recurrent neural networks for music classification[C]//Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, Mar 5-9, 2017. Piscataway: IEEE, 2017: 2392-2396. [60] SONG G, WANG Z, HAN F, et al. Music auto-tagging using deep recurrent neural networks[J]. Neurocomputing, 2018, 292: 104-110. [61] 王倩倩. 基于内容的音乐自动标注方法研究[D]. 南京: 南京大学, 2019. WANG Q Q. Research on content-based automatic music annotation methods[D]. Nanjing: Nanjing University, 2019. [62] WANG Q, SU F, WANG Y. A hierarchical attentive deep neural network model for semantic music annotation integrating multiple music representations[C]//Proceedings of the 2019 International Conference on Multimedia Retrieval, Ottawa, Jun 10-13, 2019. New York: ACM, 2019: 150-158. [63] WON M, CHUN S, SERRA X. Toward interpretable music tagging with self-attention[J]. arXiv:1906.04972, 2019. [64] WON M, CHOI K, SERRA X. Semi-supervised music tagging transformer[C]//Proceedings of the 22nd International Society for Music Information Retrieval Conference, Nov 7-12, 2021: 769-776. [65] ZHAO H, ZHANG C, ZHU B, et al. S3T: self-supervised pre-training with swin transformer for music classification[C]//Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, Singapore, May 22-27, 2022. Piscataway: IEEE, 2022: 606-610. [66] 陈培培, 邵曦. 基于生成对抗网络的音乐标签自动标注[J]. 南京信息工程大学学报(自然科学版), 2018, 10(6): 754-759. CHEN P P, SHAO X. Music auto-tagging based on generative adversarial networks[J]. Joumal of Nanjing University of Information Science and Technology (Natural Science Edition), 2018, 10(6): 754-759. [67] 陈培培. 基于生成对抗网络的音乐标签自动标注[D]. 南京: 南京邮电大学, 2019. CHEN P P. Music auto-tagging based on generative adversarial network[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2019. [68] 杨帆. 基于音乐内容的自动标签算法研究[D]. 武汉: 长江大学, 2021. YANG F. Research on automatic label algorithm based on music content[D]. Wuhan: Yangtze University, 2021. [69] 韩凝. 基于深度神经网络的音乐自动标注技术研究[D]. 北京: 北京邮电大学, 2018. HAN N. Research on music automatic annotation based on deep neural network[D]. Beijing: Beijing University of Posts and Telecommunications, 2018. [70] WANG H C, SYU S W, WONGCHAISUWAT P. A method of music autotagging based on audio and lyrics[J]. Multimedia Tools and Applications, 2021, 80(10): 15511-15539. [71] AVRAMIDIS K, STEWART S, NARAYANAN S. On the role of visual context in enriching music representations[J]. arXiv:2210.15828, 2022. [72] LIU C, FENG L, LIU G, et al. Bottom-up broadcast neural network for music genre classification[J]. Multimedia Tools and Applications, 2021, 80: 7313-7331. [73] LIU K, DEMORI J, ABAYOMI K. Open set recognition for music genre classification[J]. arXiv:2209.07548, 2022. [74] LAW E, WEST K, MANDEL M I, et al. Evaluation of algorithms using games: the case of music tagging[C]//Proceedings of the 10th International Society for Music Information Retrieval Conference, Kobe, Oct 26-30, 2009: 387-392. [75] VAHIDI C, SAITIS C, FAZEKAS G. A modulation front-end for music audio tagging[C]//Proceedings of the 2021 International Joint Conference on Neural Networks, Shenzhen, Jul 18-23, 2021. Piscataway: IEEE, 2021. [76] BERTIN-MAHIEUX T, ELLIS D P W, WHITMAN B, et al. The million song dataset[C]//Proceedings of the 12th International Society for Music Information Retrieval Conference, Miami, Oct 24-28, 2011: 591-596. [77] TURNBULL D, BARRINGTON L, TORRES D, et al. Towards musical query-by-semantic-description using the CAL500 data set[C]//Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, Jul 23-27, 2007. New York: ACM, 2007: 439-446. [78] BOGDANOV D, WON M, TOVSTOGAN P, et al. The MTG-Jamendo dataset for automatic music tagging[C]//Proceedings of the 2019 International Conference on Machine Learning, Machine Learning for Music Discovery Workshop, Long Beach, Jun 9-15, 2019. [79] 萧永乐. 基于标签深度分析的音乐自动标注算法[D]. 广州: 华南理工大学, 2018. XIAO Y L. Deep analysis on labels for music auto-tagging[D]. Guangzhou: South China University of Technology, 2018. [80] 陈景霞, 王丽艳, 贾小云, 等. 基于深度卷积神经网络的脑电信号情感识别[J]. 计算机工程与应用, 2019, 55(18): 103-110. CHEN J X, WANG L Y, JIA X Y, et al. EEG-based emotion recognition using deep convolutional neural network[J]. Computer Engineering and Applications, 2019, 55(18): 103-110. [81] 宋扬. 基于Transformer的蒙古族音乐分类研究[D]. 呼和浩特: 内蒙古师范大学, 2022. SONG Y. Research on Mongolian music classification based on Transformer[D]. Hohhot: Inner Mongolia Normal University, 2022. |
[1] | 季长清, 王兵兵, 秦静, 汪祖民. 深度特征的实例图像检索算法综述[J]. 计算机科学与探索, 2023, 17(7): 1565-1575. |
[2] | 吴水秀, 罗贤增, 熊键, 钟茂生, 王明文. 知识追踪研究综述[J]. 计算机科学与探索, 2023, 17(7): 1506-1525. |
[3] | 马妍, 古丽米拉·克孜尔别克. 图像语义分割方法在高分辨率遥感影像解译中的研究综述[J]. 计算机科学与探索, 2023, 17(7): 1526-1548. |
[4] | 刘卫光, 刘东, 王璐. 可变形卷积网络研究综述[J]. 计算机科学与探索, 2023, 17(7): 1549-1564. |
[5] | 梁宏涛, 刘硕, 杜军威, 胡强, 于旭. 深度学习应用于时序预测研究综述[J]. 计算机科学与探索, 2023, 17(6): 1285-1300. |
[6] | 杨艳艳, 李雷孝, 林浩. 提取驾驶员面部特征的疲劳驾驶检测研究综述[J]. 计算机科学与探索, 2023, 17(6): 1249-1267. |
[7] | 刘京, 赵薇, 董泽浩, 王少华, 王余. 融合多尺度自注意力机制的运动想象信号解析[J]. 计算机科学与探索, 2023, 17(6): 1427-1440. |
[8] | 蒋凌云, 杨金龙. 检测优化的标签多伯努利视频多目标跟踪算法[J]. 计算机科学与探索, 2023, 17(6): 1343-1358. |
[9] | 曹义亲, 饶哲初, 朱志亮, 万穗. 双通道四元数卷积网络去噪方法[J]. 计算机科学与探索, 2023, 17(6): 1359-1372. |
[10] | 曹斯铭, 王晓华, 王弘堃, 曹轶. MSV-Net:面向科学模拟面体混合数据的超分重建方法[J]. 计算机科学与探索, 2023, 17(6): 1321-1328. |
[11] | 吕佳, 许鹏程. 多尺度自适应上采样的图像超分辨率重建算法[J]. 计算机科学与探索, 2023, 17(4): 879-891. |
[12] | 黄涛, 李华, 周桂, 李少波, 王阳. 实例分割方法研究综述[J]. 计算机科学与探索, 2023, 17(4): 810-825. |
[13] | 安胜彪, 郭昱岐, 白 宇, 王腾博. 小样本图像分类研究综述[J]. 计算机科学与探索, 2023, 17(3): 511-532. |
[14] | 焦磊, 云静, 刘利民, 郑博飞, 袁静姝. 封闭域深度学习事件抽取方法研究综述[J]. 计算机科学与探索, 2023, 17(3): 533-548. |
[15] | 周燕, 韦勤彬, 廖俊玮, 曾凡智, 冯文婕, 刘翔宇, 周月霞. 自然场景文本检测与端到端识别:深度学习方法[J]. 计算机科学与探索, 2023, 17(3): 577-594. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||