面向案件审判难度预测的神经网络模型研究

doi:10.3778/j.issn.1673-9418.2008098

摘要/Abstract

摘要：

审判难度预测（TDP）是指在给定案情描述文本的情况下，自动预测案件审判难易程度，其在司法智能化系统中具有广阔的应用前景。现阶段，案件审判难度预测工具严重依赖专家经验规则，存在较大偏差，相关的研究工作较少。针对此问题，将其归结为自然语言处理中的文本分类问题，通过分析发现传统分类方法未考虑起诉状中审判要素间的结构独特性和逻辑依赖性，导致难以准确预测案件难易程度。为解决上述挑战，通过对起诉状的研究，结合案件繁简审判要素，提出一种新的神经网络模型MAT-TAN。具体地，该模型首先采用一种掩码注意力网络（MAT）对案情描述文本进行细粒度分析。其中的掩码机制扮演智能门控者的角色，起到聚焦审判要素特定位置的作用，结合自注意力机制，实现了对各审判要素全面、准确的特征提取。其次，提出一种拓扑关联网络（TAN）对要素间的司法逻辑依赖关系进行建模，并有效融合不同要素的特征，最终实现案件审判难度预测。在法院真实数据上的实验结果表明，与基准的文本分类方法相比，该模型宏平均F1值提升了0.036，在审判难度预测上具备较好的使用效果。

关键词: 审判难度预测（TDP）, 审判要素, 掩码注意力网络（MAT）, 拓扑关联网络（TAN）

Abstract:

Trial difficulty prediction (TDP) is the task of automatically predicting the difficulty of a trial given the case text, which has a broad application prospect in judicial intelligent system. In practice, the tools of TDP rely heavily on the experience of experts, which leads different conclusions in predicting the difficulty of the trial. However, there are few related research work. To address these issues, this paper regards it as a text classification problem in natural language processing. Through the analysis, it is found that, traditional text classification methods don??t consider the structural uniqueness and logical dependence among trial elements in complaint, which makes it difficult to predict the difficulty of a trial accurately. In order to solve the mentioned challenges, this paper carefully studies indictments and considers the complex and simple trial elements for judging cases, presents an end-to-end model, MAT-TAN (mask-attention and topological association network). Specifically, this paper proposes a novel mask-attention network (MAT), to carry out fine-grained analysis of a case description text in indictments. The masking mechanism plays a role of the intelligent gatekeeper, focusing on the specific position of the trial elements in indictments. Together with the self-attention mechanism, it extracts the comprehensive and accurate characteristics of each trial element. This paper proposes a novel topological association network (TAN), which models the judicial logic dependency relationship between different elements, and effectively integrates the characteristics of different elements. Finally, the TDP is realized. The experimental results conducted on real-world datasets demonstrate that the MAT-TAN can improve the macro averaged F1 up to 0.036 compared with baselines, showing that it has a better performance in TDP.

Key words: trial difficulty prediction (TDP), trial elements, mask-attention network (MAT), topological association network (TAN)

王悦, 王平辉, 许诺, 陈龙, 杨鹏, 吴用. 面向案件审判难度预测的神经网络模型研究[J]. 计算机科学与探索, 2021, 15(12): 2345-2352.

WANG Yue, WANG Pinghui, XU Nuo, CHEN Long, YANG Peng, WU Yong. Research on Neural Network for Trial Difficulty Prediction[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(12): 2345-2352.

参考文献

[1] CAO X X. Exploration on the simplified and diversion mechanism of civil and commercial cases in basic courts[J]. Legality Vision, 2019(29): 173-174.
曹小小. 基层法院民商事案件繁简分流机制的探索[J]. 法制博览, 2019(29): 173-174.
[2] KALCHBRENNER N, GREFENSTETTE E, BLUNSOM P. A convolutional neural network for modelling sentences[J]. arXiv:1404.2188, 2014.
[3] LIU P, QIU X, HUANG X. Recurrent neural network for text classification with multi-task learning[J]. arXiv:1605. 05101, 2016.
[4] KIM Y. Convolutional neural networks for sentence classification[J]. arXiv:1408.5882, 2014.
[5] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[6] CHO K, VAN MERRI?NBOER B, GüL?EHRE ?, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv:1406.1078, 2014.
[7] HUANG Z H, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[J]. arXiv:1508.01991, 2015.
[8] ZHOU X J, WAN X J, XIAO J G. Attention-based LSTM network for cross-lingual sentiment classification[C]//Procee-dings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Nov 1-4, 2016. Stroudsburg: ACL, 2016: 247-256.
[9] YANG Z, YANG D, DYER C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Nov 1-4, 2016. Stroudsburg: ACL, 2016: 1480-1489.
[10] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2017, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 5998-6008.
[11] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. arXiv:1810.04805, 2018.
[12] XU N, WANG P, CHEN L, et al. Distinguish confusing law articles for legal judgment prediction[J]. arXiv:2004.02557, 2020.
[13] BHATTACHARYA P, GHOSH K, PAL A, et al. Methods for computing legal document similarity: a comparative study[J]. arXiv:2004.12307, 2020.
[14] ZHANG H, WANG X, TAN H Y, et al. Applying data discretization to DPCNN for law article prediction[C]//LNCS 11838: Proceedings of the 8th CCF International Conference on Natural Language Processing and Chinese Computing, Dunhuang, Oct 9-14, 2019. Cham: Springer, 2019: 459-470.
[15] IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[J]. arXiv:1502.03167, 2015.
[16] RISH I. An empirical study of the naive Bayes classifier[C]//Proceedings of the Workshop on Empirical Methods in Artificial Intelligence, Seattle, Aug 4, 2001: 41-46.
[17] KEERTHI S S, SHEVADE S K, BHATTACHARYYA C, et al. Improvements to Platt??s SMO algorithm for SVM classifier design[J]. Neural Computation, 2001, 13(3): 637-649.
[18] KINGMA D P, BA J. Adam: a method for stochastic optimization[C]//Proceedings of the 3rd International Conference on Learning Representations, San Diego, May 7-9, 2015: 1-15.
[19] SRIVASTAVA N, HINTON G E, KRIZHEVSKY A, et al. Dropout: a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15(1): 1929-1958.