Document Classification Method Based on Context Awareness and Hierarchical Attention Network

doi:10.3778/j.issn.1673-9418.1912048

Abstract

Abstract:

Document classification is a basic problem in the field of natural language processing (NLP). In recent years, although hierarchical attention networks have made progress, because each sentence is coded independently, bidirectional encoder used in the model can only consider the adjacent sentence of the coded sentence, still focuses on the currently encoded sentences, and does not effectively integrate document structure knowledge into the archi-tecture. To solve this problem, document classification method based on context awareness and hierarchical atten-tion network (CAHAN) is proposed. This method uses a hierarchical structure to represent the hierarchical structure of the document, and uses the attention mechanism to consider the important sentences in the document and the important word factors in the sentence. At the word level and sentence level, it not only relies on the bidirectional encoder to obtain context information, but also introduces the context vector in the word-level attention mechanism to make the word-level encoder make attention decisions based on the context information to fully obtain the context information of the text, thereby extracting the depth document characteristics. In addition, the gating mechanism is used to accurately determine how much context information should be considered. The experimental results on two standard data sets show that the proposed CAHAN model has better classification effects than long short-term memory (LSTM), convolutional neural networks (CNN), and hierarchical attention network (HAN), which can improve the accuracy of document classification tasks.

Key words: natural language processing (NLP), document classification, context-aware, hierarchical attention, gating mechanism

摘要：

文档分类是自然语言处理（NLP）领域中的一个基本问题。近年来，尽管针对这一问题的层级注意力网络已经取得了进展，但由于每条句子被独立编码，使得模型中使用的双向编码器仅能考虑到所编码句子的相邻句子，仍然集中于当前所编码的句子，并没有有效地将文档结构知识整合到体系结构中。针对此问题，提出一种上下文感知与层级注意力网络的文档分类方法（CAHAN）。该方法采用分层结构来表示文档的层次结构，使用注意力机制考虑文档中重要的句子和句子中重要的单词因素，在单词级和句子级不仅依赖双向编码器来获取上下文信息，还通过在单词级注意机制中引入上下文向量，使单词级编码器基于上下文信息做出注意决策全面获取文本的上下文信息，从而提取出深度文档特征。此外，还利用门控机制准确地决定应该考虑多少上下文信息。在两个标准数据集上的实验结果表明，提出的CAHAN模型较长短时记忆网络（LSTM）、卷积神经网络（CNN）、分层注意网络（HAN）等模型分类效果更好，能够提高文档分类任务的准确度。

关键词: 自然语言处理（NLP）, 文档分类, 上下文感知, 层级注意力, 门控机制

REN Jianhua, LI Jing, MENG Xiangfu. Document Classification Method Based on Context Awareness and Hierarchical Attention Network[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(2): 305-314.

任建华, 李静, 孟祥福. 上下文感知与层级注意力网络的文档分类方法[J]. 计算机科学与探索, 2021, 15(2): 305-314.

References

[1] LIN R, LIU S J, YANG M Y, et al. Hierarchical recurrent neural network for document modeling[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Lan-guage Processing, Lisbon, Sep 17-22, 2015. Stroudsburg: ACL, 2015: 899-907.
[2] TANG D, BING Q, LIU T. Document modeling with gated recurrent neural network for sentiment classification[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Sep 17-22, 2015. Stroudsburg: ACL, 2015: 1422-1432.
[3] YANG Z C, YANG D Y, DYER C, et al. Hierarchical atten-tion networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Lan-guage Technologies, San Diego, Jun 12-17, 2016. Strouds-burg: ACL, 2016: 1480-1489.
[4] LUONG M T, MANNING C D. Achieving open vocabulary neural machine translation with hybrid word-character models[C]//Proceedings of the 54th Annual Meeting of the Associa-tion for Computational Linguistics, Berlin, Aug 7-12, 2016. Stroudsburg: ACL, 2016: 1054-1063.
[5] LUAN K X, DU X K, SUN C J, et al. Sentence ordering based on attention mechanism[J]. Journal of Chinese Infor-mation Processing, 2018, 32(1): 123-130.
栾克鑫, 杜新凯, 孙承杰, 等. 基于注意力机制的句子排序方法[J]. 中文信息学报, 2018, 32(1): 123-130.
[6] ZHOU Y J, XU J M, GAO J, et al. Hybrid attention networks for Chinese short text classification[J]. Computacióny Sis-temas, 2017, 21(4): 759-769.
[7] PAPPAS N, POPESCU-BELIS A. Multilingual hierarchical attention networks for document classification[C]//Proceed-ings of the 8th International Joint Conference on Natural Language Processing, Taipei, China, Nov 27-Dec 1, 2017. Stroudsburg: ACL, 2017: 1015-1025.
[8] TARNPRADAB S, LIU F, HUA K A. Toward extractive summarization of online forum discussions via hierarchical attention networks[C]//Proceedings of the 30th International Florida Artificial Intelligence Research Society Conference, Marco Island, May 22-24, 2017. Menlo Park: AAAI, 2017: 288-292.
[9] SUN Y, WANG X G, TANG X O. Deeply learned face re-presentations are sparse, selective, and robust[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Washington: IEEE Com-puter Society, 2015: 2892-2900.
[10] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 26th Annual Conference on Neural Infor-mation Processing Systems, Lake Tahoe, Dec 3-6, 2012: 2012-2018.
[11] KIM Y. Convolutional neural networks for sentence classi-fication[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Oct 25-29, 2014. Stroudsburg: ACL, 2014: 1746-1751.
[12] HOCHREITER S, SCHMIDHUBER J. Long short-term me-mory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[13] CHUNG J, GULCEHRE C, CHO K H, et al. Emprirical evaluation of gated recurrent networks on sequence modeling[J]. arXiv:1412.3555, 2014.
[14] LEI L, LU J, RUAN S. Hierarchical recurrent and convolu-tional neural network based on attention for Chinese document classification[C]//Proceedings of the 2019 Chinese Control and Decision Conference, Nanchang, Jun 3, 2019. Piscataway: IEEE, 2019: 809-814.
[15] HU C J, LIANG N. Deeper attention-based LSTM for aspect sentiment analysis[J]. Application Research of Computers, 2019, 36(4): 1075-1079.
胡朝举, 梁宁. 基于深层注意力的LSTM的特定主题情感分析[J]. 计算机应用研究, 2019, 36(4): 1075-1079.
[16] WANG Y, WANG S, TANG J, et al. Hierarchical attention network for action recognition in videos[J]. arXiv:1607.06416, 2016.
[17] WANG Y, SHEN F, ELAYAVILLI R K, et al. MayoNLP at the biocreative VI PM track: entity-enhanced hierarchical attention neural networks for mining protein interactions from biomedical text[C]//Proceedings of the BioCreative VI Chal-lenge Evaluation Workshop, Oct 2017: 127-130.
[18] GAO S, YOUNG M T, QIU J X, et al. Hierarchical attention networks for information extraction from cancer pathology reports[J]. Journal of the American Medical Informatics Associa-tion, 2018, 25(3): 321-330.
[19] YAN S, SMITH J S, LU W, et al. Hierarchical multi-scale attention networks for action recognition[J]. Signal Processing: Image Communication, 2018, 61: 73-84.
[20] JI W G. Research on text classification based on attention-based Bi-GRU model[D]. Chengdu: University of Electronic Science and Technology of China, 2019.
冀文光. 基于Attention-Based Bi-GRU模型的文本分类方法研究[D]. 成都: 电子科技大学, 2019.
[21] DIAO Q M, QIU M H, WU C Y, et al. Jointly modeling aspects, ratings and sentiments for movie recommendation (JMARS)[C]//Proceedings of the 20th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Min-ing, New York, Aug 24-27, 2014. New York: ACM, 2014: 193-202.
[22] MANNING C D, SURDEANU M, BAUER J, et al. The Stanford CoreNLP natural language processing Toolkit[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Jun 22-27, 2014. Stroudsburg: ACL, 2014: 55-60.
[23] KIRITCHENKO S, ZHU X D, MOHAMMAD S M. Senti-ment analysis of short informal texts[J]. Journal of Artificial Intelligence Research, 2014, 50: 723-762.