计算机科学与探索 ›› 2017, Vol. 11 ›› Issue (6): 851-862.DOI: 10.3778/j.issn.1673-9418.1609026

• 学术研究 • 上一篇    下一篇

面向开源软件项目的软件知识图谱构建方法

李文鹏1,2,3,王建彬1,2,3,林泽琦1,2,3,赵俊峰1,2,3+,邹艳珍1,2,3,谢  冰1,2,3   

  1. 1. 北京大学 信息科学技术学院,北京 100871
    2. 高可信软件技术教育部重点实验室,北京 100871
    3. 北京大学(天津滨海)新一代信息技术研究院,天津 300450
  • 出版日期:2017-06-01 发布日期:2017-06-07

Software Knowledge Graph Building Method for Open Source Project

LI Wenpeng1,2,3, WANG Jianbin1,2,3, LIN Zeqi1,2,3, ZHAO Junfeng1,2,3+, ZOU Yanzhen1,2,3, XIE Bing1,2,3   

  1. 1. School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
    2. Key Laboratory of High Confidence Software Technologies, Ministry of Education, Beijing 100871, China 
    3. Peking University Information Technology Institute (Tianjin Binhai), Tianjin 300450, China
  • Online:2017-06-01 Published:2017-06-07

摘要:

软件复用是软件开发中避免重复劳动的解决方案。开源软件的源代码、邮件列表、缺陷报告和问答文档等软件资源中蕴含了规模庞大、结构复杂、语义关联丰富的软件知识。如何获取知识、组织知识,以及如何在软件复用过程中方便地检索软件知识是亟待解决的问题。为了解决这些问题,面向开源软件项目,构建了软件知识图谱,并提供了基于软件知识图谱的软件知识检索。主要工作包括:针对4种不同类型的软件资源,提出了软件知识实体的提取原则与方法;提出了软件知识实体之间关联关系构建的方法;实现了两种软件知识检索机制,并以文字列表和图形可视化相结合的方式展现检索结果;设计了软件知识图谱构建框架。基于上述工作,设计并实现了一个面向开源软件项目的软件知识图谱构建工具。实例证明,所构建的软件知识图谱可以更好地帮助软件开发人员进行软件知识的检索与应用。

关键词: 软件复用, 开源软件, 软件知识图谱, 图数据库

Abstract: Software reuse is a solution to reduce the duplication of efforts during software development and improve the efficiency and quality of the process. Open source projects’ source code, mailing lists, issue reports, Q&A documents and other software resources contain software knowledge with complex structure and rich semantic association on a large scale. How to obtain and organize software knowledge and retrieve it effectively in the process of software reuse have become urgent problems. In order to solve these problems, this paper constructs software knowledge graph, whose goal is to organize and manage the structural knowledge of a software project, and provides software knowledge graph based knowledge retrieval. The contributions of this paper are as follows: Providing the extraction principles and methods of software knowledge entities, and extracting software knowledge entities from four different kinds of software resources respectively; Providing the methods of building the relationships between software knowledge entities; Providing two software knowledge retrieval mechanisms, and displaying the retrieval       results by the combination of word list and graph visualization; Designing the implementation framework of software knowledge graph. On the basic of the work above, this paper designs and implements a software knowledge graph building tool for open source project. Instances prove that software knowledge graph can help developers to better retrieve and use knowledge.

Key words: software reuse, open source software, software knowledge graph, graph database