计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (5): 740-748.DOI: 10.3778/j.issn.1673-9418.1906031

• 系统软件与软件工程 • 上一篇    下一篇

军用软件测试领域的命名实体识别技术研究

韩鑫鑫,贲可荣,张献   

  1. 海军工程大学 电子工程学院,武汉 430033
  • 出版日期:2020-05-01 发布日期:2020-05-08

Research on Named Entity Recognition Technology in Military Software Testing

HAN Xinxin, BEN Kerong, ZHANG Xian   

  1. College of Electronic Engineering, Navy University of Engineering, Wuhan 430033, China
  • Online:2020-05-01 Published:2020-05-08

摘要:

命名实体识别是构建知识图谱的重要阶段。基于国军标及软件测试文档,完成了实体类型分类以及数据集的构建和标注。在软件测试领域,针对字词联合实体识别方法准确率不高的问题,进行字符级特征提取方法的改进,提出了CWA-BiLSTM-CRF识别框架。该框架包含两部分:第一部分构建预训练的字词融合字典,将字词一起输入给双向长短期记忆网络进行训练,并加入注意力机制衡量词内各字对特征的语义贡献,提取出字符级特征;第二部分将字符级特征与词向量等特征进行拼接,输入给双向长短期记忆网络进行训练,再通过条件随机场解决标签结果序列不合理的问题,识别出文中的实体。实验结果分别与三种常用的深度学习字符级特征提取方法进行比较,准确率和召回率均有提升,最优F1值为88.93%。实验表明,改进后的方法适用于军用软件测试领域命名实体识别任务,为下一步知识图谱的构建打下了基础。

关键词: 软件测试, 知识图谱, 命名实体识别, 双向长短期记忆网络, 条件随机场

Abstract:

Named entity recognition is an important stage in the construction of knowledge graph. Based on the national military standard and software testing documents, the entity type classification and the data set construction and labeling are completed. In the field of software testing, aiming at the problem that the character and word joint entity recognition method has low recognition precision, the character level feature extraction method is improved, and the CWA-BiLSTM-CRF (character and word attention- bi-directional long short term memory-conditional random field) recognition framework is proposed. The framework consists of two parts: the first part constructs a pre-trained word fusion dictionary, inputs the words and characters together to the bi-directional long short term memory network for training, and adds attention mechanism to measure the semantic contribution of each character in the word to extract the character-level features; the second part, the character-level features and word vectors are spliced, input to the bi-directional long short term memory network for training, and then through the conditional random field to solve the problem of unreasonable sequence of label results, the entities in the text are identified. The experimental results are compared with 3 commonly used deep learning character-level feature extraction methods. Both accuracy and recall rates are improved, and the optimal F1 value is 88.93%. Experiments show that the improved method is suitable for the named entity recognition task in the military software testing field, which lays the foundation for the next construction of the knowledge graph.

Key words: software testing, knowledge graph, named entity recognition, bi-directional long short term memory (BiLSTM), conditional random field (CRF)