融合词向量的多特征句子相似度计算方法研究

doi:10.3778/j.issn.1673-9418.1604029

计算机科学与探索 ›› 2017, Vol. 11 ›› Issue (4): 608-618.DOI: 10.3778/j.issn.1673-9418.1604029

融合词向量的多特征句子相似度计算方法研究

李峰1,2+，侯加英3，曾荣仁1，凌晨1

1. 中国人民解放军后勤科学研究所，北京 100166
2. 北京航空航天大学计算机学院，北京 100191
3. 昆明理工大学信息工程与自动化学院，昆明 650504

出版日期:2017-04-12 发布日期:2017-04-12

Research on Multi-Feature Sentence Similarity Computing Method with Word Embedding

LI Feng1,2+, HOU Jiaying3, ZENG Rongren1, LING Chen1

1. Logistics Science Research Institute of PLA, Beijing 100166, China
2. School of Computer Science and Engineering, Beihang University, Beijing 100191, China
3. School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504, China

Online:2017-04-12 Published:2017-04-12

摘要/Abstract

摘要： 在归纳常见的句子相似度计算方法后，基于《人民日报》3.4万余份文本训练了用于语义相似度计算的词向量模型，并设计了一种融合词向量的多特征句子相似度计算方法。该方法在词方面，考虑了句子中重叠的词数和词的连续性，并运用词向量模型测量了非重叠词间的相似性；在结构方面，考虑了句子中重叠词的语序和两个句子的长度一致性。实验部分设计实现了4种句子相似度计算方法，并开发了相应的实验系统。结果表明：提出的算法能够取得相对较好的实验结果，对句子中词的语义特征和句子结构特征进行组合处理和优化，能够提升句子相似度计算的准确性。

关键词: 词向量, 句子相似度, Word2vec, 算法设计

Abstract: Based on the summarization of sentence similarity computing methods, this paper applies 34 000 pieces of texts of People's Daily to train word vector space model for semantic similarity computing. Then, based on the trained word vector model, this paper designs a multi-feature sentence similarity computing method, which takes both word and sentence structure features into consideration. Firstly, the method takes note of possible effects of the number of overlapping words and word continuity, and then applies word vector model to calculate the semantic similarity of non-overlapping words. Regarding the aspect of sentence structure, the method takes both overlapping word order and sentence length conformity into consideration. Finally, this paper designs and implements four different sentence similarity calculating methods, and further develops an experimental system. The experimental results show that the method proposed in this paper can get satisfactory results and the combination and optimization upon the features of words and sentence structures can improve the accuracy of sentence similarity calculating.

Key words: word embedding, sentence similarity, Word2vec, algorithm design

李峰，侯加英，曾荣仁，凌晨. 融合词向量的多特征句子相似度计算方法研究[J]. 计算机科学与探索, 2017, 11(4): 608-618.

LI Feng, HOU Jiaying, ZENG Rongren, LING Chen. Research on Multi-Feature Sentence Similarity Computing Method with Word Embedding[J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(4): 608-618.

[1]	王玉荣，林民，李艳玲. BERT跨语言词向量学习研究[J]. 计算机科学与探索, 2021, 15(8): 1405-1417.
[2]	程琪芩，万良. BiLSTM在跨站脚本检测中的应用研究[J]. 计算机科学与探索, 2020, 14(8): 1338-1347.
[3]	喻涛，罗可. 利用动态多池卷积神经网络的情感分析模型[J]. 计算机科学与探索, 2018, 12(7): 1182-1190.

融合词向量的多特征句子相似度计算方法研究

Research on Multi-Feature Sentence Similarity Computing Method with Word Embedding

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 3

编辑推荐

Metrics