Journal of Frontiers of Computer Science and Technology ›› 2019, Vol. 13 ›› Issue (12): 2085-2093.DOI: 10.3778/j.issn.1673-9418.1902010

Previous Articles     Next Articles

Research on Named Entity Recognition Method in Plant Attribute Text

LI Dongmei, TAN Wen   

  1. School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China
  • Online:2019-12-01 Published:2019-12-10

植物属性文本的命名实体识别方法研究

李冬梅檀稳   

  1. 北京林业大学 信息学院,北京 100083

Abstract: Named entity recognition of plant attribute texts plays a significant role in the information extraction and the construction of knowledge graph in the field of forestry. This paper proposes a named entity recognition method BCC-P (BiLSTM-CNN-CRF model in plant), which is based on bi-directional long short term memory (BiLSTM) model, convolutional neural network (CNN) model, and conditional random fields (CRF) model. This paper analyzes the characteristics of plant attribute texts, does the work of pre-processing and labeling, and constructs a dataset. The BCC-P method can effectively extract the context features in plant attribute texts by modeling the input texts with BiLSTM model. Furthermore, the obtained features are transferred to the CNN model to further extract the implicit feature. Finally, the CRF model is used to label plant attribute texts, and the optimal label result on the sentence sequence is output. The experiment on plant attribute texts shows that, the accuracy of BCC-P method achieves 91.8%. Therefore, BCC-P method can be effectively applied to named entity recognition in plant attribute texts.

Key words: named entity recognition, bi-directional long short term memory (BiLSTM), convolutional neural net-work (CNN), conditional random fields (CRF)

摘要: 植物属性文本的命名实体识别对林业领域的信息抽取和知识图谱的构建起着重要的作用,针对该问题,提出了一种基于双向长短时记忆网络(BiLSTM)、卷积神经网络(CNN)和条件随机场(CRF)模型的植物属性文本命名实体识别方法BCC-P。分析了植物属性文本的特点,并进行预处理和标注,完成数据集的构建。BCC-P方法通过BiLSTM模型对植物属性文本进行建模,有效捕捉植物属性文本中的上下文语义特征。将获得的特征传递到CNN模型,进一步提取深度特征。最后使用了CRF模型进行植物属性文本的标注,输出在句子序列上最优的标注结果。在植物属性文本语料上的实验表明,该方法的准确率达到了91.8%,因此能够有效应用于植物属性文本的命名实体识别任务。

关键词: 命名实体识别, 双向长短时记忆网络(BiLSTM), 卷积神经网络(CNN), 条件随机场(CRF)