计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (8): 1338-1347.DOI: 10.3778/j.issn.1673-9418.1909035

• 网络与信息安全 • 上一篇    下一篇

BiLSTM在跨站脚本检测中的应用研究

程琪芩,万良   

  1. 1. 贵州大学 计算机科学与技术学院,贵阳 550025
    2. 贵州大学 计算机软件与理论研究所,贵阳 550025
  • 出版日期:2020-08-01 发布日期:2020-08-07

Application Research of BiLSTM in Cross-Site Scripting Detection

CHENG Qiqin, WAN Liang   

  1. 1. College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
    2. Institute of Computer Software and Theory, Guizhou University, Guiyang 550025, China
  • Online:2020-08-01 Published:2020-08-07

摘要:

目前传统的跨站脚本(XSS)检测技术大多使用机器学习方法,存在代码被恶意混淆导致可读性不高、特征提取不充分并且效率低等缺陷,从而导致检测性能不佳。针对上述问题,提出了使用双向长短时记忆网络检测跨站脚本攻击的方法。首先,对数据进行预处理,使用解码技术将跨站脚本代码还原到未编码状态,从而提高跨站脚本代码的可读性,再使用深度学习工具word2vec将解码后的代码转换为向量作为神经网络的输入;其次,使用双向长短时记忆网络双向学习跨站脚本攻击的抽象特征;最后,使用softmax分类器对学习到的抽象特征进行分类,同时使用dropout算法避免模型出现过拟合。对收集到的数据集进行实验,结果表明,与几种传统机器学习方法和深度学习方法相比,该检测方法表现出更优的检测性能。

关键词: 跨站脚本(XSS), 解码技术, word2vec, 双向长短时记忆网络(BiLSTM)

Abstract:

At present, machine learning methods are used in the most traditional cross-site scripting (XSS) detection technologies, which have some defects, such as bad readability because of maliciously confused code, insufficient feature extraction and low efficiency, resulting in poor performance. According to these problems, a way used bidirectional long-short term memory (BiLSTM) network is proposed to detect the XSS attack. First, the data need to be preprocessed, the decoding technology is used to restore the XSS codes to the state before encoding to improve the readability, and the deep learning tool word2vec is used to convert the decoded codes into vectors as the input of the neural network. Then, BiLSTM network is used to bilaterally learn the abstract features of the attack. Finally, the softmax classifier is used to classify the learned abstract features and the dropout algorithm is used to avoid over fitting. The experimental results based on the collected datasets show that compared with several traditional machine learning methods and deep learning methods, this method has better detection performance.

Key words: cross-site scripting (XSS), decoding techniques, word2vec, bidirectional long-short term memory network (BiLSTM)