Journal of Frontiers of Computer Science and Technology ›› 2021, Vol. 15 ›› Issue (7): 1207-1219.DOI: 10.3778/j.issn.1673-9418.2012062

• Surveys and Frontiers • Previous Articles     Next Articles

Survey on Sequence Data Augmentation

GE Yizhou, XU Xiang, YANG Suorong, ZHOU Qing, SHEN Furao   

  1. 1. Science and Technology on Communication Information Security Control Laboratory, Jiaxing, Zhejiang 314033, China
    2. No.36 Research Institute, China Electronics Technology Group Corporation, Jiaxing, Zhejiang 314033, China
    3. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
  • Online:2021-07-01 Published:2021-07-09



  1. 1. 通信信息控制和安全技术重点实验室,浙江 嘉兴 314033
    2. 中国电子科技集团公司 第三十六研究所,浙江 嘉兴 314033
    3. 计算机软件新技术国家重点实验室(南京大学),南京 210023


To pursue higher accuracy, the structure of deep learning model is getting more and more complex, with deeper and deeper network. The increase in the number of parameters means that more data are needed to train the model. However, manually labeling data is costly, and it is not easy to collect data in some specific fields limited by objective reasons. As a result, data insufficiency is a very common problem. Data augmentation is here to alleviate the problem by artificially generating new data. The success of data augmentation in the field of computer vision leads people to consider using similar methods on sequence data. In this paper, not only the time-domain methods such as flipping and cropping but also some augmentation methods in frequency domain are described. In addition to experience-based or knowledge-based methods, detailed descriptions on machine learning models used for automatic data generation such as GAN are also included. Methods that have been widely applied to various sequence data such as text, audio and time series are mentioned with their satisfactory performance in issues like medical diagnosis and emotion classification. Despite the difference in data type, these methods are designed with similar ideas. Using these ideas as a clue, various data augmentation methods applied to different types of sequence data are introduced, and some discussions and prospects are made.

Key words: sequence data, data augmentation, deep learning



关键词: 序列数据, 数据增强, 深度学习