计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (1): 137-143.DOI: 10.3778/j.issn.1673-9418.2008096

• 人工智能 • 上一篇    下一篇

BERT辅助金融领域人物关系图谱构建

张纯鹏, 辜希武, 李瑞轩+(), 李玉华, 刘伟   

  1. 华中科技大学 计算机科学与技术学院,武汉 430074
  • 收稿日期:2020-09-07 修回日期:2021-07-02 出版日期:2022-01-01 发布日期:2021-07-15
  • 通讯作者: + E-mail: rxli@hust.edu.cn
  • 作者简介:张纯鹏(1995—),男,山东德州人,硕士研究生,主要研究方向为自然语言处理、深度学习。
    辜希武(1967—),男,博士,副研究员,主要研究方向为分布式计算、云计算、信息检索、社会网络分析。
    李瑞轩(1974—),男,湖北宜昌人,博士,教授,主要研究方向为大数据管理与分析、数据挖掘与机器学习、云计算与边缘计算。
    李玉华(1968—),女,博士,教授,主要研究方向为机器学习、大数据。
    刘伟(1997—),男,湖北天门人,博士研究生,主要研究方向为自然语言处理、机器学习。
  • 基金资助:
    国家重点研发计划(2016QY01W0202);国家自然科学基金(U1836204);国家自然科学基金(U1936108);国家社会科学基金(16ZDA092)

Construction Method for Financial Personal Relationship Graphs Using BERT

ZHANG Chunpeng, GU Xiwu, LI Ruixuan+(), LI Yuhua, LIU Wei   

  1. School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
  • Received:2020-09-07 Revised:2021-07-02 Online:2022-01-01 Published:2021-07-15
  • About author:ZHANG Chunpeng, born in 1995, M.S. can-didate. His research interests include natural language processing and deep learning.
    GU Xiwu, born in 1967, Ph.D., associate rese-arch fellow. His research interests include distri-buted computing, cloud computing, information retrieval and social network analysis.
    LI Ruixuan, born in 1974, Ph.D., professor. His research interests include big data manage-ment and analysis, data mining and machine learning, cloud computing and edge computing.
    LI Yuhua, born in 1968, Ph.D., professor. Her research interests include machine learning and big data.
    LIU Wei, born in 1997, Ph.D. candidate. His research interests include natural language pro-cessing and machine learning.
  • Supported by:
    National Key Research and Development Program of China(2016QY01W0202);National Natural Science Foundation of China(U1836204);National Natural Science Foundation of China(U1936108);National Social Science Foundation of China(16ZDA092)

摘要:

现有的人员简历信息抽取方法无法针对金融公告中非结构化人员简历进行人员属性以及事件的抽取,无法发现金融公告中跨文档的人员之间关系。针对以上问题,将非结构化的人员简历抽取成结构化的人员信息模板,提出一种金融领域人物关系图谱构建方法。通过对BERT预训练语言模型进行训练,抽取出非结构化人员简历文本中的人员属性实体,利用训练好的BERT预训练模型获取事件实例向量,对事件实例向量进行准确的分类,填充层次化的人员信息模板,准确地关联人员属性。进一步地,通过填充好的人员信息模板,提取人员关系,构建人物关系图谱。通过构建人工标注的数据集,进行实验验证。实验表明所提出的方法可以有效解决非结构化金融人员简历文本信息提取问题,有效地构建金融领域人物关系图谱。

关键词: 深度学习, 信息提取, 预训练语言模型, 人物关系图谱

Abstract:

Existing personnel resume information extraction methods cannot extract personnel attributes and events from unstructured personnel resumes in financial announcements, and cannot find relationships between personnel in financial cross-documents. In response to above problems, unstructured personnel resumes are extracted into structured personnel information templates, and a method for constructing personal relationship graphs in the financial domain is proposed. By training the BERT (bidirectional encoder representation from transformers) pre-trained language model, the personnel attribute entities in the unstructured personnel resume text are extracted, and the trained BERT pre-trained model is used to obtain the event instance vector. The event instance vector is carried out accurate classification. Personnel attributes are associated by filling the hierarchical personnel information templates, and further through the filled personnel information templates to extract personnel relationships and construct personal relationship graphs. To verify the method, an experiment is carried out by constructing a manually labeled dataset. The experiment shows that the method can effectively solve the problem of extracting information from unstructured financial personnel resume, and effectively construct the financial personal relationship graphs.

Key words: deep learning, information extraction, pre-trained language models, personal relationship graphs

中图分类号: