计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (9): 2011-2029.DOI: 10.3778/j.issn.1673-9418.2110073

• 综述·探索 • 上一篇    下一篇

基于深度学习的代码表征及其应用综述

张祥平1,2, 刘建勋1,2,+()   

  1. 1.湖南科技大学 服务计算与软件服务新技术湖南省重点实验室,湖南 湘潭 411201
    2.湖南科技大学 计算机科学与工程学院,湖南 湘潭 411201
  • 收稿日期:2021-10-28 修回日期:2022-04-21 出版日期:2022-09-01 发布日期:2022-09-15
  • 通讯作者: + E-mail: ljx529@gmail.com
  • 作者简介:张祥平(1993—),男,福建三明人,博士研究生,主要研究方向为代码表征、代码克隆检测。
    刘建勋(1970—),男,湖南衡阳人,博士,教授,主要研究方向为服务计算、云计算。
  • 基金资助:
    国家自然科学基金(61872139)

Overview of Deep Learning-Based Code Representation and Its Applications

ZHANG Xiangping1,2, LIU Jianxun1,2,+()   

  1. 1. Hunan Key Lab for Services Computing and Novel Software Technology, Hunan University of Science and Technology, Xiangtan, Hunan 411201, China
    2. School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, Hunan 411201, China
  • Received:2021-10-28 Revised:2022-04-21 Online:2022-09-01 Published:2022-09-15
  • About author:ZHANG Xiangping, born in 1993, Ph.D. candidate. His research interests include code representation and code clone detection.
    LIU Jianxun, born in 1970, Ph.D., professor. His research interests include service computing and cloud computing.
  • Supported by:
    National Natural Science Foundation of China(61872139)

摘要:

对程序进行分析、推理能够对软件开发、维护、迁移起到重要作用。如何高效地从程序代码中获取高质量信息成为了当前研究的热点。近几年有许多学者将基于深度学习的表征技术引入到程序代码分析任务中。深度学习模型能够自动地提取代码中所包含的隐含特征,降低对人工制定特征的依赖。首先介绍了代码表征的背景知识和基本概念,从代码静态信息分析角度出发,总结了基于深度学习的代码表征研究工作。之后进一步介绍了代码表征在代码克隆检测、代码搜索和代码补全三个任务上的具体应用。最后分析现有基于深度学习的代码表征工作中仍然存在的问题,并展望了未来可能的研究方向。

关键词: 代码表征, 表征学习, 软件工程, 代码分析, 深度学习

Abstract:

The analysis and inference of program play an important role in software development, maintenance and migration. How to efficiently obtain high quality information from program code has become a hot research topic. In recent years, a large number of researchers have introduced the deep learning-based representation technology into the code analysis tasks. The deep learning model can automatically extract the implicit and useful features implicit in the source code, which can alleviate the dependence on the manual construct feature. This paper first introduces the background and basic concepts of code representation, and summarizes the recent research work on deep learning-based code representation learning from the perspective of code static information analysis. Furthermore, this paper introduces the application of code representation on three tasks, code clone detection, code search and code completion. Finally, it discusses the challenges of deep learning-based code representation and the possible research directions in this field.

Key words: code representation, representation learning, software engineering, code analysis, deep learning

中图分类号: