Journal of Frontiers of Computer Science and Technology ›› 2017, Vol. 11 ›› Issue (10): 1591-1598.DOI: 10.3778/j.issn.1673-9418.1609028

Previous Articles     Next Articles

Refine Software Q&A Document Search Results Based on Code Pattern

HUA Chenyan1,2,3, ZOU Yanzhen1,2,3+, ZHU Zixiao1,2,3, XIE Bing1,2,3   

  1. 1. School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
    2. Key Laboratory of High Confidence Software Technologies, Ministry of Education, Beijing 100871, China
    3. Peking University Information Technology Institute (Tianjin Binhai), Tianjin 300450, China
  • Online:2017-10-01 Published:2017-10-20


华晨彦1,2,3,邹艳珍1,2,3+,朱子骁1,2,3,谢  冰1,2,3   

  1. 1. 北京大学 信息科学技术学院,北京 100871
    2. 高可信软件技术教育部重点实验室,北京 100871
    3. 北京大学(天津滨海)新一代信息技术研究院,天津 300450

Abstract: Developers often need to search related software Q&A documents in Q&A website. In the search results, the Q&A documents which contain good code snippets (usage examples) are preferred. However, how to metric those code snippets in document is still a big challenge. To address this issue, this paper proposes an approach for refining software Q&A document search results based on code pattern. Firstly, code snippets are extracted from each document in the search results. Then, the common code patterns are mined and used to measure the quality of those code snippets. Finally, the documents with high quality are recommended and ranked at the top of the search results. In the experiments, this paper carries out some evaluations with 10 real problems that software developers meet in practice. Compared to the search results of StackOverflow, the proposed approach has an increment of 40% at NDCG@5.

Key words: code pattern, software Q&A document, document search

摘要: 开发人员通常通过问答网站的搜索引擎进行相关软件问答文档的搜索。在检索结果中,包含优质代码片段(使用示例)的问答文档往往更受青睐,但如何度量这些文档中代码片段的质量仍是个巨大的挑战。针对这个问题,提出了一种基于代码模式的软件问答文档检索优化方法。该方法能够基于当前检索结果,抽取文档中的代码片段,分析代码片段中的公共代码模式,并基于代码模式度量文档中代码片段的质量,从原有检索结果中向用户推荐高质量的软件问答文档。以软件开发人员在实践过程中遇到的真实问题为基础进行了实验,对比StackOverflow的搜索结果,所提方法在准确率指标NDCG@5上提升了40%。

关键词: 代码模式, 软件问答文档, 文档检索