计算机科学与探索 ›› 2025, Vol. 19 ›› Issue (4): 1087-1094.DOI: 10.3778/j.issn.1673-9418.2407028

• 网络·安全 • 上一篇    下一篇

基于GPT-2模型的姓氏口令猜测方法

林嘉熹,钱秋妍,曾剑平,张尉东   

  1. 1. 复旦大学 计算机科学技术学院,上海 200433
    2. 教育部网络信息安全审计与监控工程研究中心,上海 200433
    3. 上海壁仞科技股份有限公司,上海 201100
  • 出版日期:2025-04-01 发布日期:2025-03-28

Surname Password Guessing Method Based on GPT-2

LIN Jiaxi, QIAN Qiuyan, ZENG Jianping, ZHANG Weidong   

  1. 1. School of Computer Science, Fudan University, Shanghai 200433, China
    2. Engineering Research Center of Cyber Security Auditing and Monitoring, Ministry of Education, Shanghai 200433, China
    3. Biren Technology, Shanghai 201100, China
  • Online:2025-04-01 Published:2025-03-28

摘要: 随着身份验证机制的多样化,口令作为一种传统且广泛采用的认证方法,其安全性面临着严峻的挑战。受到语言特性和文化差异的影响,中文用户的口令选择与英文用户有显著不同,这为猜测攻击提供了新的视角。为应对这一问题,提出了一种基于GPT-2模型的中文姓氏口令猜测方法,旨在有效提升对中文口令的猜测能力。该方法通过无监督微调,使预训练语言模型能够生成与姓氏密切相关的口令。为了弥补GPT-2对中文字符支持的不足,该模型利用新闻语料库作为预训练数据集,将中文文本转换为拼音形式,训练模型识别拼音,从而帮助模型更准确地理解中文用户的口令习惯。实验结果表明,该模型在口令猜测任务中显示出优越的性能,特别是在资源有限的情况下,相较于传统猜测方法和基于深度学习的口令攻击技术,实现了更高的攻击成功率。此外,还探讨了温度参数对口令猜测成功率的影响,指出了进一步提升口令安全性的潜在方向。

关键词: 口令安全, 中文口令, GPT-2模型, 口令猜测, 预训练语言模型

Abstract: As authentication mechanisms diversify, passwords, as a traditional and widely adopted authentication method, face severe security challenges. Due to linguistic characteristics and cultural differences, Chinese users?? password choices differ significantly from those of English-speaking users, providing new perspectives for guessability attacks. To address this issue, this paper proposes a Chinese surname-based password guessing method using the GPT-2 model, aiming to effectively enhance the guessing capability for Chinese passwords. The proposed method employs unsupervised fine-tuning to enable the pre-trained language model to generate passwords closely related to surnames. To compensate for GPT-2??s lack of support for Chinese characters, this model leverages a news corpus as the pre-training dataset, converting Chinese text into Pinyin and training the model to recognize Pinyin, thereby helping the model more accurately understand Chinese users' password habits. Experimental results demonstrate that the proposed model exhibits superior performance in password guessing tasks, particularly in resource-constrained environments, achieving higher success rates compared with traditional guessing methods and deep learning-based password attack techniques. Additionally, this paper explores the impact of temperature parameters on the success rate of password guessing, identifying potential directions for further improving password security.

Key words: password security, Chinese password, GPT-2 model, password guessing, pre-trained language model