自编码网络在JavaScript恶意代码检测中的应用研究

doi:10.3778/j.issn.1673-9418.1901009

计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (12): 2073-2084.DOI: 10.3778/j.issn.1673-9418.1901009

自编码网络在JavaScript恶意代码检测中的应用研究

龙廷艳，万良，丁红卫

1.贵州大学计算机科学与技术学院，贵阳 550025
2.贵州大学计算机软件与理论研究所，贵阳 550025

出版日期:2019-12-01 发布日期:2019-12-10

Application Research of Autoencoder Network in Malicious JavaScript Code Detection

LONG Tingyan, WAN Liang, DING Hongwei

1.School of Computer Science and Technology, Guizhou University, Guiyang 550025, China
2.Institute of Computer Software and Theory, Guizhou University, Guiyang 550025, China

Online:2019-12-01 Published:2019-12-10

摘要/Abstract

摘要： 针对传统机器学习特征提取方法很难发掘JavaScript恶意代码深层次本质特征的问题，提出基于堆栈式稀疏降噪自编码网络（sSDAN）的JavaScript恶意代码检测方法。首先将JavaScript恶意代码进行数值化处理，然后在自编码网络的基础上加入稀疏性限制，同时加入一定概率分布的噪声进行染噪的学习训练，使得自动编码器模型能够获取数据不同层次的特征表达；再经过无监督逐层贪婪的预训练和有监督的微调过程可以得到有效去噪后的更深层次特征；最后利用[Softmax]函数对特征进行分类。实验结果表明，稀疏降噪自编码分类算法对JavaScript具有较好的分类能力，其准确率高于传统机器学习模型，相比随机森林的方法提高了0.717%，相比支持向量机（SVM）的方法提高了2.237%。

关键词: 堆栈式稀疏降噪自编码网络（sSDAN）, JavaScript恶意代码, 机器学习

Abstract: For the problem that it is difficult for traditional machine learning feature extraction methods to explore the deep essential features of JavaScript malicious code, a JavaScript malicious code detection method based on stacked sparse denoising autoencoder network (sSDAN) is proposed. Firstly, JavaScript malicious code is quantized. Through adding sparsity limitation to autoencoder network, and noise with a certain probability distribution is added for learning and training of noise dyeing, the automatic encoder model can obtain the feature expressions of different levels of data. Then, by unsupervised layer by layer greedy pre-training and supervised fine-tuning process, the deeper features of effective denoising are obtained. Finally, Softmax function is used to classify the features. Experimental results show that the sparse noise reduction autoencoder classification algorithm has a good classification ability for JavaScript, and its accuracy is higher than that of traditional machine learning models, e.g. it is 0.717% higher than that of the random forest method, and 2.237% higher than that of the SVM (support vector machine) method.

Key words: stacked sparse denoising autoencoder network (sSDAN), JavaScript malicious code, machine learning

龙廷艳，万良，丁红卫. 自编码网络在JavaScript恶意代码检测中的应用研究[J]. 计算机科学与探索, 2019, 13(12): 2073-2084.

LONG Tingyan, WAN Liang, DING Hongwei. Application Research of Autoencoder Network in Malicious JavaScript Code Detection[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(12): 2073-2084.

[1]	杨悦，王士同. 随机特征映射的四层神经网络及其增量学习[J]. 计算机科学与探索, 2021, 15(7): 1265-1278.
[2]	赵雪莉，卢光跃，吕少卿，张潘. 结合属性信息的二分网络表示学习[J]. 计算机科学与探索, 2021, 15(3): 495-505.
[3]	马永杰，徐小冬，张茹，谢艺蓉，陈宏. 生成式对抗网络及其在图像生成中的研究进展[J]. 计算机科学与探索, 2021, 15(10): 1795-1811.
[4]	宋雨萌，谷峪，李芳芳，于戈. 人工智能赋能的查询处理与优化新技术研究综述[J]. 计算机科学与探索, 2020, 14(7): 1081-1103.
[5]	马毓敏，王士同. 最大化AUC的正例未标注分类及其增量算法[J]. 计算机科学与探索, 2020, 14(11): 1879-1887.
[6]	梁俊杰，韦舰晶，蒋正锋. 生成对抗网络GAN综述[J]. 计算机科学与探索, 2020, 14(1): 1-17.
[7]	孙涛，周志华. 近似多元信息多样性[J]. 计算机科学与探索, 2019, 13(4): 639-646.
[8]	丁毅，王明亮，张道强. 差异性随机子空间集成[J]. 计算机科学与探索, 2018, 12(9): 1434-1443.
[9]	张贤贤，王浩宇，郭耀，徐国爱. 基于众包和机器学习的移动应用隐私评级研究[J]. 计算机科学与探索, 2018, 12(8): 1238-1251.
[10]	王建飞，亢良伊，刘杰，叶丹. 分布式随机方差消减梯度下降算法topkSVRG[J]. 计算机科学与探索, 2018, 12(7): 1047-1054.
[11]	李盼，赵文涛，刘强，崔建京，殷建平. 机器学习安全性问题及其防御技术研究综述[J]. 计算机科学与探索, 2018, 12(2): 171-184.
[12]	王蒙湘，李芳芳，谷峪，于戈. 交互式数据探索综述[J]. 计算机科学与探索, 2017, 11(2): 171-184.
[13]	陈茜，史殿习，杨若松. 多维数据特征融合的用户情绪识别[J]. 计算机科学与探索, 2016, 10(6): 751-760.
[14]	沈琰辉，刘华文，徐晓丹，赵建民，陈中育. 基于邻域离散度的异常点检测算法[J]. 计算机科学与探索, 2016, 10(12): 1763-1772.
[15]	唐俊，周志华. 基于多示例多标记学习的手机游戏道具推荐[J]. 计算机科学与探索, 2016, 10(1): 103-111.

自编码网络在JavaScript恶意代码检测中的应用研究

Application Research of Autoencoder Network in Malicious JavaScript Code Detection

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics