计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (5): 731-739.DOI: 10.3778/j.issn.1673-9418.1907057

• 系统软件与软件工程 • 上一篇    下一篇

基于序列到序列模型的代码片段推荐

闫鑫,周宇,黄志球   

  1. 1. 南京航空航天大学 计算机科学与技术学院,南京 210016
    2. 南京航空航天大学 高安全系统的软件开发与验证技术工信部重点实验室,南京 210016
  • 出版日期:2020-05-01 发布日期:2020-05-08

Code Snippets Recommendation Based on Sequence to Sequence Model

YAN Xin, ZHOU Yu, HUANG Zhiqiu   

  1. 1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
    2. Key Laboratory for Safety-Critical Software Development and Verification, Ministry of Industry and Information Tech-nology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
  • Online:2020-05-01 Published:2020-05-08

摘要:

在软件开发过程中,开发者经常会以复用代码的方式,提高软件开发效率。已有的研究通常采用传统的信息检索技术来实现代码推荐。这些方法存在自然语言查询的高层级的意图与代码的低层级的实现细节不匹配的问题。提出了一种基于序列到序列模型的代码片段推荐方法DeepCR。该方法结合程序静态分析技术与序列到序列模型,训练自然语言查询生成模型,为代码片段生成查询,通过计算生成的查询和开发者输入的自然语言查询的相似度得分来实现代码片段推荐。所构建的代码库的数据来源于Stack Overflow问答网站,确保了数据的真实性。通过计算代码片段推荐结果的平均倒数排名(MRR)和Hit@K来验证方法的有效性。实验结果表明,DeepCR优于现有研究工作,能够有效提高代码片段推荐效果。

关键词: 程序静态分析, 序列到序列模型, 代码片段推荐

Abstract:

In the process of software development, developers often reuse code so as to improve the efficiency of software development. Existing researches usually leverage information retrieval technologies to implement the performance of code recommendation. There exists mismatch between the high-level intent in natural language queries and the low-level implementation details for these traditional approaches. This paper proposes DeepCR which is a code snippets recommendation approach based on sequence to sequence model. This approach leverages program static analysis and sequence to sequence model to train a query generation model which can generate queries for code snippets. Code snippets recommendation is then implemented by calculating the similarity between generated queries and natural language queries from developers. The data in the code repository origin from Stack Overflow website to ensure the reality of the collected data. The effectiveness of DeepCR is evaluated by calculating mean reciprocal rank (MRR) and Hit@K scores of code snippets recommendation results. The experimental results show that DeepCR is superior to existing approaches and can improve the performance of code snippets recommendation effectively.

Key words: program static analysis, sequence to sequence model, code snippets recommendation