Journal of Frontiers of Computer Science and Technology ›› 2021, Vol. 15 ›› Issue (2): 305-314.DOI: 10.3778/j.issn.1673-9418.1912048

• Artificial Intelligence • Previous Articles     Next Articles

Document Classification Method Based on Context Awareness and Hierarchical Attention Network

REN Jianhua, LI Jing, MENG Xiangfu   

  1. School of Electronics and Information Engineering, Liaoning Technical University, Huludao, Liaoning 125105, China
  • Online:2021-02-01 Published:2021-02-01



  1. 辽宁工程技术大学 电子与信息工程学院, 辽宁 葫芦岛 125105


Document classification is a basic problem in the field of natural language processing (NLP). In recent years, although hierarchical attention networks have made progress, because each sentence is coded independently, bidirectional encoder used in the model can only consider the adjacent sentence of the coded sentence, still focuses on the currently encoded sentences, and does not effectively integrate document structure knowledge into the archi-tecture. To solve this problem, document classification method based on context awareness and hierarchical atten-tion network (CAHAN) is proposed. This method uses a hierarchical structure to represent the hierarchical structure of the document, and uses the attention mechanism to consider the important sentences in the document and the important word factors in the sentence. At the word level and sentence level, it not only relies on the bidirectional encoder to obtain context information, but also introduces the context vector in the word-level attention mechanism to make the word-level encoder make attention decisions based on the context information to fully obtain the context information of the text, thereby extracting the depth document characteristics. In addition, the gating mechanism is used to accurately determine how much context information should be considered. The experimental results on two standard data sets show that the proposed CAHAN model has better classification effects than long short-term memory (LSTM), convolutional neural networks (CNN), and hierarchical attention network (HAN), which can improve the accuracy of document classification tasks.

Key words: natural language processing (NLP), document classification, context-aware, hierarchical attention, gating mechanism



关键词: 自然语言处理(NLP), 文档分类, 上下文感知, 层级注意力, 门控机制