Journal of Frontiers of Computer Science and Technology

• Science Researches •     Next Articles

Low-Resource Named Entity Recognition Method based on Diff-Cascade

QIU Yunfei,  DONG Libo,  ZHANG Wenwen   

  1. 1.School of Software, Liaoning Technology University, Huludao, Liaoning 125105, China
    2.School of Business Administration, Liaoning Technology University, Huludao, Liaoning 125105, China

基于Diff-Cascade的低资源命名实体识别方法

邱云飞,董丽波,张文文   

  1. 1.辽宁工程技术大学 软件学院,辽宁 葫芦岛 125000
    2.辽宁工程技术大学 工商管理学院,辽宁 葫芦岛 125105

Abstract: In low-resource Named Entity Recognition (NER) tasks, many transfer learning-based methods can alleviate data scarcity but often lead to the omission or incorrect recognition of some valid information, affecting the model's performance in resource-constrained environments. To address this issue, this paper proposes a multi-module collaborative NER model, Diff-Cascade-NER. First, a Variational Autoencoder (VAE) is used to learn data representations in the latent space and generate diverse samples. Then, context information, syntactic analysis, and VAE-reconstructed data are fed as conditional inputs to a Conditional Encoder (CE) for encoding. The encoded data is passed to a Cascade Diffusion Mechanism (CDM), which generates high-quality samples through multi-stage denoising and generation processes. Finally, Adversarial Learning (AL) optimizes the quality and diversity of generated samples. Experimental results show that, compared with existing models, Diff-Cascade-NER outperforms on eight low-resource datasets, particularly achieving F1 scores of 85.44% and 56.38% on the BC2GM and WNUT-16 datasets, respectively, demonstrating the effectiveness of the collaborative modules in low-resource NER tasks.

Key words: Low-resource Named Entity Recognition, Variational Autoencoder, Conditional Encoder, Cascade Diffusion Mechanism, Adversarial Learning

摘要: 在低资源命名实体识别(Named Entity Recognition ,NER)任务中,目前许多基于迁移学习的方法虽然能够缓解数据稀缺问题,但往往会导致句子中部分正确信息的遗漏或识别错误,从而影响模型在低资源环境中的效果。针对此问题,本文提出了一种基于多模块协同的NER模型——Diff-Cascade-NER。首先,利用变分自编码器(Variational Autoencoder, VAE)在潜在空间中学习数据表示,并生成多样化的样本;接着,将上下文信息、句法分析和VAE重构数据作为条件输入到条件编码器(Conditional Encoder, CE)进行编码;然后,将编码后的数据传递给级联扩散模型(Cascade Diffusion Mechanism, CDM),通过多阶段的去噪和生成过程产生高质量样本;最后,通过对抗学习阶段(Adversarial Learning, AL)优化生成样本的质量和多样性。实验结果表明,对比现有模型,Diff-Cascade-NER在8个低资源数据集上表现优越,特别是在BC2GM和WNUT-16数据集上,F1值分别达到85.44%和56.38%,验证了各模块协同作用在低资源NER任务中的有效性。

关键词: 低资源命名实体识别, 变分自编码器, 条件编码器, 级联扩散模型, 对抗学习