Construction Method for Financial Personal Relationship Graphs Using BERT

doi:10.3778/j.issn.1673-9418.2008096

Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (1): 137-143.DOI: 10.3778/j.issn.1673-9418.2008096

• Artificial Intelligence • Previous Articles Next Articles

Construction Method for Financial Personal Relationship Graphs Using BERT

ZHANG Chunpeng, GU Xiwu, LI Ruixuan⁺(), LI Yuhua, LIU Wei

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China

Received:2020-09-07 Revised:2021-07-02 Online:2022-01-01 Published:2021-07-15
About author:ZHANG Chunpeng, born in 1995, M.S. can-didate. His research interests include natural language processing and deep learning.
GU Xiwu, born in 1967, Ph.D., associate rese-arch fellow. His research interests include distri-buted computing, cloud computing, information retrieval and social network analysis.
LI Ruixuan, born in 1974, Ph.D., professor. His research interests include big data manage-ment and analysis, data mining and machine learning, cloud computing and edge computing.
LI Yuhua, born in 1968, Ph.D., professor. Her research interests include machine learning and big data.
LIU Wei, born in 1997, Ph.D. candidate. His research interests include natural language pro-cessing and machine learning.
Supported by:
National Key Research and Development Program of China(2016QY01W0202);National Natural Science Foundation of China(U1836204);National Natural Science Foundation of China(U1936108);National Social Science Foundation of China(16ZDA092)

BERT辅助金融领域人物关系图谱构建

张纯鹏, 辜希武, 李瑞轩⁺(), 李玉华, 刘伟

华中科技大学计算机科学与技术学院,武汉 430074

通讯作者: + E-mail: rxli@hust.edu.cn
作者简介:张纯鹏（1995—）,男,山东德州人,硕士研究生,主要研究方向为自然语言处理、深度学习。
辜希武（1967—）,男,博士,副研究员,主要研究方向为分布式计算、云计算、信息检索、社会网络分析。
李瑞轩（1974—）,男,湖北宜昌人,博士,教授,主要研究方向为大数据管理与分析、数据挖掘与机器学习、云计算与边缘计算。
李玉华（1968—）,女,博士,教授,主要研究方向为机器学习、大数据。
刘伟（1997—）,男,湖北天门人,博士研究生,主要研究方向为自然语言处理、机器学习。
基金资助:
国家重点研发计划(2016QY01W0202);国家自然科学基金(U1836204);国家自然科学基金(U1936108);国家社会科学基金(16ZDA092)

Abstract

Abstract:

Existing personnel resume information extraction methods cannot extract personnel attributes and events from unstructured personnel resumes in financial announcements, and cannot find relationships between personnel in financial cross-documents. In response to above problems, unstructured personnel resumes are extracted into structured personnel information templates, and a method for constructing personal relationship graphs in the financial domain is proposed. By training the BERT (bidirectional encoder representation from transformers) pre-trained language model, the personnel attribute entities in the unstructured personnel resume text are extracted, and the trained BERT pre-trained model is used to obtain the event instance vector. The event instance vector is carried out accurate classification. Personnel attributes are associated by filling the hierarchical personnel information templates, and further through the filled personnel information templates to extract personnel relationships and construct personal relationship graphs. To verify the method, an experiment is carried out by constructing a manually labeled dataset. The experiment shows that the method can effectively solve the problem of extracting information from unstructured financial personnel resume, and effectively construct the financial personal relationship graphs.

Key words: deep learning, information extraction, pre-trained language models, personal relationship graphs

摘要：

现有的人员简历信息抽取方法无法针对金融公告中非结构化人员简历进行人员属性以及事件的抽取,无法发现金融公告中跨文档的人员之间关系。针对以上问题,将非结构化的人员简历抽取成结构化的人员信息模板,提出一种金融领域人物关系图谱构建方法。通过对BERT预训练语言模型进行训练,抽取出非结构化人员简历文本中的人员属性实体,利用训练好的BERT预训练模型获取事件实例向量,对事件实例向量进行准确的分类,填充层次化的人员信息模板,准确地关联人员属性。进一步地,通过填充好的人员信息模板,提取人员关系,构建人物关系图谱。通过构建人工标注的数据集,进行实验验证。实验表明所提出的方法可以有效解决非结构化金融人员简历文本信息提取问题,有效地构建金融领域人物关系图谱。

关键词: 深度学习, 信息提取, 预训练语言模型, 人物关系图谱

CLC Number:

TP391

ZHANG Chunpeng, GU Xiwu, LI Ruixuan, LI Yuhua, LIU Wei. Construction Method for Financial Personal Relationship Graphs Using BERT[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(1): 137-143.

张纯鹏, 辜希武, 李瑞轩, 李玉华, 刘伟. BERT辅助金融领域人物关系图谱构建[J]. 计算机科学与探索, 2022, 16(1): 137-143.

Figures/Tables 8

References 20

[1]	SANG E T K, DE MEULDER F. Introduction to the CoNLL-2003 shared task: language-independent named entity reco-gnition[C]// Proceedings of the 7th Conference on Natural Language Learning, Edmonton, May 31-Jun 1, 2003. Strou-dsburg: ACL, 2003: 142-147.
[2]	KUMAR S. A survey of deep learning methods for relation extraction[J]. arXiv:1705.03645, 2017.
[3]	DODDINGTON G R, MITCHELL A, PRZYBOCKI M A, et al. The automatic content extraction (ACE) program-tasks, data, and evaluation[C]// Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon, May 26-28, 2004: 1-4.
[4]	KALCHBRENNER N, GREFENSTETTE E, BLUNSOM P. A convolutional neural network for modelling sentences[C]// Proceedings of the 52nd Annual Meeting of the Asso-ciation for Computational Linguistics, Baltimore, Jun 22-27, 2014. Stroudsburg: ACL, 2014: 655-665.
[5]	ZAREMBA W, SUTSKEVER I, VINYALS O. Recurrent neural network regularization[J]. arXiv:1409.2329, 2014.
[6]	HUANG Z, XU W, YU K. Bidirectional LSTM-CRF mo-dels for sequence tagging[J]. arXiv:1508.01991, 2015.
[7]	STRUBELL E, VERGA P, BELANGER D, et al. Fast and accurate entity recognition with iterated dilated convolutions[C]// Proceedings of the 2017 Conference on Empirical Met-hods in Natural Language Processing, Copenhagen, Sep 9-11, 2017. Stroudsburg: ACL, 2017: 2670-2680.
[8]	PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langu-age Technologies, New Orleans, Jun 1-6, 2018. Stroudsburg:ACL, 2018: 2227-2237.
[9]	RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[EB/OL]. [2020-05-24]. https://s3-us-west-2.amazon-aws.com/openai-assets/researchcovers/languageunsupervised/ language understanding paper.pdf.
[10]	DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Com-putational Linguistics: Human Language Technologies, Minneapolis, Jun 2-7, 2019. Stroudsburg: ACL, 2019: 4171-4186.
[11]	郭喜跃, 何婷婷, 胡小华, 等. 基于句法语义特征的中文实体关系抽取[J]. 中文信息学报, 2014, 28(6):183-189.
	GUO X Y, HE T T, HU X H, et al. Chinese entity relationship extraction based on syntactic and semantic features[J]. Journal of Chinese Information Processing, 2014, 28(6):183-189.
[12]	ZENG D, LIU K, CHEN Y, et al. Distant supervision for relation extraction via piecewise convolutional neural net-works[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Sep 17-21, 2015. Stroudsburg: ACL, 2015: 1753-1762.
[13]	KATIYAR A, CARDIE C. Going out on a limb: joint extraction of entity mentions and relations without depen-dency trees[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Jul 30-Aug 4, 2017. Stroudsburg: ACL, 2017: 917-928.
[14]	DIB F, LINDBERG S, NUGUES P. Extraction of career profiles from Wikipedia[C]// Proceedings of the 1st Con-ference on Biographical Data in a Digital World 2015, Amsterdam, Apr 9, 2015: 33-38.
[15]	PLUM A, ZAMPIERI M, ORASAN C, et al. Large-scale data harvesting for biographical data[J]//Proceedings of the 3rd Conference on Biographical Data in a Digital World, Varna, 2019: 1-12.
[16]	YANG H, CHEN Y B, LIU K, et al. DCFEE: a document-level Chinese financial event extraction system based on automatically labeled training data[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Jul 15-20, 2018. Stroudsburg: ACL, 2018: 50-55.
[17]	LI D Y, HUANG L F, JI H, et al. Biomedical event extraction based on knowledge-driven tree-LSTM[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Jun 2-7, 2019. Stroudsburg: ACL, 2019: 1421-1430.
[18]	ZHANG T T, JI H, SIL A. Joint entity and event extraction with generative adversarial imitation learning[J]. Data Intelligence, 2019, 1(2):99-120. DOI URL
[19]	ZENG Y, FENG Y S, MA R, et al. Scale up event extraction learning via automatic training data generation[C]// Pro-ceedings of the 32nd AAAI Conference on Artificial Int-elligence, the 30th Innovative Applications of Artificial Int-elligence, and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, Feb 2-7, 2018. Menlo Park: AAAI, 2018: 6045-6052.
[20]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the Annual Conference on Neural Information Processing Systems, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 5998-6008.

统计指标	数量
人员实体总数	1 694
总字符数	597 157
人员属性实体总数	86 066
句子总数	14 666

统计指标	数量
人员实体总数	1 694
总字符数	597 157
人员属性实体总数	86 066
句子总数	14 666

统计指标	数量
人员实体总数	1 694
教育经历事件实例总数	2 407
任职经历事件实例总数	26 756
同事关系总数	52 202
校友关系总数	2 264

统计指标	数量
人员实体总数	1 694
教育经历事件实例总数	2 407
任职经历事件实例总数	26 756
同事关系总数	52 202
校友关系总数	2 264

方法	查准率	查全率	F1值
启发式规则	0.811 8	0.776 7	0.793 9
BiLSTM-CRF	0.901 5	0.921 3	0.911 3
BERT	0.921 1	0.934 0	0.927 5

Construction Method for Financial Personal Relationship Graphs Using BERT

BERT辅助金融领域人物关系图谱构建

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 8

References 20

Related Articles 15

Recommended Articles

Metrics

事件类型	方法	查准率	查全率	F1	正确率
任职经历	BERT-Template	0.82	0.92	0.87	0.86
任职经历	BiLSTM-CRF	0.84	0.84	0.84	0.84
教育经历	BERT-Template	0.81	0.91	0.86	0.82
教育经历	BiLSTM-CRF	0.78	0.89	0.83	0.79

关系类型	方法	查准率	查全率	F1
同事关系	启发式规则	0.69	0.66	0.67
	BERT-Template	0.76	0.73	0.74
	BiLSTM-CRF	0.74	0.70	0.72
校友关系	启发式规则	0.49	0.52	0.50
	BERT-Template	0.72	0.73	0.72
	BiLSTM-CRF	0.69	0.72	0.70

[1]	AN Fengping, LI Xiaowei, CAO Xiang. Medical Image Classification Algorithm Based on Weight Initialization-Sliding Window CNN [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1885-1897.
[2]	ZENG Fanzhi, XU Luqian, ZHOU Yan, ZHOU Yuexia, LIAO Junwei. Review of Knowledge Tracing Model for Intelligent Education [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1742-1763.
[3]	LIU Yi, LI Mengmeng, ZHENG Qibin, QIN Wei, REN Xiaoguang. Survey on Video Object Tracking Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1504-1515.
[4]	ZHAO Xiaoming, YANG Yijiao, ZHANG Shiqing. Survey of Deep Learning Based Multimodal Emotion Recognition [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1479-1503.
[5]	HAN Yi, QIAO Linbo, LI Dongsheng, LIAO Xiangke. Review of Knowledge-Enhanced Pre-trained Language Models [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1439-1461.
[6]	XIA Hongbin, XIAO Yifei, LIU Yuan. Long Text Generation Adversarial Network Model with Self-Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1603-1610.
[7]	LIU Yafen, ZHENG Yifeng, JIANG Lingyi, LI Guohe, ZHANG Wenjie. Survey on Pseudo-Labeling Methods in Deep Semi-supervised Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1279-1290.
[8]	SUN Fangwei, LI Chengyang, XIE Yongqiang, LI Zhongbo, YANG Caidong, QI Jin. Review of Deep Learning Applied to Occluded Object Detection [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1243-1259.
[9]	CHENG Weiyue, ZHANG Xueqin, LIN Kezheng, LI Ao. Deep Convolutional Neural Network Algorithm Fusing Global and Local Features [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1146-1154.
[10]	ZHONG Mengyuan, JIANG Lin. Review of Super-Resolution Image Reconstruction Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 972-990.
[11]	XU Jia, WEI Tingting, YU Ge, HUANG Xinyue, LYU Pin. Review of Question Difficulty Evaluation Approaches [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 734-759.
[12]	PEI Lishen, ZHAO Xuezhuan. Survey of Collective Activity Recognition Based on Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 775-790.
[13]	ZHU Weijie, CHEN Ying. Micro-expression Recognition Convolutional Network for Dual-stream Temporal-Domain Information Interaction [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 950-958.
[14]	JIANG Yi, XU Jiajie, LIU Xu, ZHU Junwu. Research on Edge-Guided Image Repair Algorithm [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(3): 669-682.
[15]	ZHANG Quangui, HU Jiayan, WANG Li. One Class Collaborative Filtering Recommendation Algorithm Coupled with User Common Characteristics [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(3): 637-648.