随机特征映射的四层神经网络及其增量学习

doi:10.3778/j.issn.1673-9418.2005028

摘要/Abstract

摘要：

提出了一种基于随机特征映射的四层神经网络（FRMFNN）及其增量学习算法。FRMFNN首先把原始输入特征通过特定的随机映射算法转化为随机映射特征存储于第一层隐藏层节点中，再经过激活函数对随机映射特征进行非线性转化生成第二层隐藏节点，最后将第二层隐藏层通过输出权重连接到输出层。由于第一层和第二层隐藏层的权重是根据任意连续采样分布概率随机生成的而不需要训练更新，且输出层的权重可以用岭回归算法快速求解，从而避免了传统反向传播神经网络耗时的训练过程。当FRMFNN没有达到期望精度时，借助于快速的增量算法可以持续改进网络性能，从而避免了重新训练整个网络。详细介绍了FRMFNN及其增量算法的结构原理，证明了FRMFNN的通用逼近性。与宽度学习（BLS）和极限学习机（ELM）的增量学习算法相比，在多个主流分类和回归数据集上的实验结果表明了FRMFNN及其增量学习算法的有效性。

关键词: 神经网络, 机器学习, 随机特征映射, 宽度学习, 通用逼近, 增量学习, 岭回归, 正则化

Abstract:

This paper proposes a four-layer neural network based on randomly feature mapping (FRMFNN) and its fast incremental learning algorithms. First, FRMFNN transforms the original input features into randomly mapped features by certain randomly mapping algorithm and stores them in its nodes of first hidden layer. Then, the FRMFNN generates its nodes of second hidden layer using non-linear activation function on all random mapping features. Finally, the second hidden layer is linked to the output layer through the output weights. Since the weights of the first and the second hidden layers are randomly generated according to certain continuous sampling probability distribution, without the updates of the weights, and the output weights can be quickly solved by the ridge regression, avoiding time-consuming training process of the traditional back propagation neural networks. When FRMFNN can??t reach the prescribed accuracy, its performance can be continuously improved by its rapid incremental algorithm, thereby avoiding retraining the whole network. This paper, a detail introduction of proposed FRMFNN and its incremental algorithms is provided. What??s more, a proof of universal approximation property of FRMFNN is also given. Compared with broad learning system (BLS) and the incremental learning of extreme learning machine (ELM), the experimental results on several popular classification and regression datasets demonstrate the effectiveness of the proposed FRMFNN and its incremental learning algorithms.

Key words: neural network, machine learning, randomly mapped feature, broad learning, universal approximation, incremental learning, ridge regression, regularization

杨悦, 王士同. 随机特征映射的四层神经网络及其增量学习[J]. 计算机科学与探索, 2021, 15(7): 1265-1278.

YANG Yue, WANG Shitong. Novel Four-Layer Neural Network and Its Incremental Learning Based on Randomly Mapped Features[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(7): 1265-1278.

参考文献

[1] MOOSAVI-DEZFOOLI S M, FAWZI A, FROSSARD P. Deepfool: a simple and accurate method to fool deep neural networks[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 2574-2582.
[2] GATYS L A, ECKER A S, BETHGE M. Image style transfer using convolutional neural networks[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 2414-2423.
[3] ESTEVA A, KUPREL B, NOVOA R A, et al. Dermatologist-level classification of skin cancer with deep neural networks[J]. Nature, 2017, 542(7639): 115-118.
[4] ROSKA T, CHUA L O. The CNN universal machine: an analogic array computer[J]. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 1993, 40(3): 163-173.
[5] LECUN Y, BOSER B E, DENKER J S, et al. Handwritten digit recognition with a back-propagation network[C]//Proceedings of the Advances in Neural Information Processing Systems 2, Denver, Nov 27-30, 1989. San Mateo: Morgan Kaufmann, 1990: 396-404.
[6] PINEDA F J. Generalization of back-propagation to recurrent neural networks[J]. Physical Review Letters, 1987, 59(19): 2229-2232.
[7] HUANG G B, ZHU Q Y, SIEW C K. Extreme learning machine: theory and applications[J]. Neurocomputing, 2006, 70: 489-501.
[8] TANG J, DENG C, HUANG G B. Extreme learning machine for multilayer perceptron[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 27(4): 809-821.
[9] WANG S T, WANG M. A new detection algorithm (NDA) based on fuzzy cellular neural networks for white blood cell detection[J]. IEEE Transactions on Information Technology in Biomedicine, 2006, 10(1): 5-10.
[10] CHEN C L P, LIU Z. Broad learning system: an effective and efficient incremental learning system without the need for deep architecture[J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 29(1): 10-24.
[11] CHEN C L P, LIU Z, FENG S. Universal approximation capability of broad learning system and its structural variations [J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 30(4): 1191-1204.
[12] CHEN C L P, LECLAIR S R, PAO Y H. An incremental adaptive implementation of functional-link processing for function approximation, time-series prediction, and system identification[J]. Neurocomputing, 1998, 18: 11-31.
[13] CHEN C L P, WAN J Z. A rapid learning and dynamic stepwise updating algorithm for flat neural networks and the application to time-series prediction[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 1999, 29(1): 62-72.
[14] MANDIC D P. A generalized normalized gradient descent algorithm[J]. IEEE Signal Processing Letters, 2004, 11(2): 115-118.
[15] PAO Y H, PARK G H, SOBAJIC D J. Learning and generali-zation characteristics of the random vector functional-link net[J]. Neurocomputing, 1994, 6(2): 163-180.
[16] IGELNIK B, PAO Y H. Stochastic choice of basis functions in adaptive function approximation and the functional-link net[J]. IEEE Transactions on Neural Networks, 1995, 6(6): 1320-1329.
[17] PAO Y H, TAKEFUJI Y. Functional-link net computing: theory, system architecture, and functionalities[J]. Computer, 1992, 25(5): 76-79.
[18] GONG M, LIU J, LI H, et al. A multiobjective sparse feature learning model for deep neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(12): 3263-3277.
[19] ALBERT A E. Regression and the Moore-Penrose pseudo-inverse[M]. New York: Academic Press, 1972.
[20] GOLDSTEIN T, O’DONOGHUE B, SETZER S, et al. Fast alternating direction optimization methods[J]. SIAM Journal on Imaging Sciences, 2014, 7(3): 1588-1623.
[21] YANG W, GAO Y, SHI Y, et al. MRM-lasso: a sparse multiview feature selection method via low-rank analysis[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(11): 2801-2815.
[22] OLSHAUSEN B A, FIELD D J. Sparse coding with an overcomplete basis set: a strategy employed by V1?[J]. Vision Research, 1997, 37(23): 3311-3325.
[23] TIBSHIRANI R. Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58(1): 267-88.
[24] RUDIN W. Real and complex analysis (higher mathematics series)[M]. New York: McGraw-Hill, 1987.
[25] HUANG G B, CHEN L, SIEW C K. Universal approximation using incremental constructive feedforward networks with random hidden nodes[J]. IEEE Transactions on Neural Networks, 2006, 17(4): 879-892.
[26] HOERL A E, KENNARD R W. Ridge regression: biased estimation for nonorthogonal problems[J]. Technometrics, 1970, 12(1): 55-67.
[27] BLAKE C L, MERZ C J. UCI repository of machine learning databases[EB/OL]. [2019-01-12]. http://archive.ics. uci.edu/ml/datasets.html.
[28] DENG L. The MNIST database of handwritten digit images for machine learning research[J]. IEEE Signal Processing Magazine, 2012, 29(6): 141-142.
[29] LECUN Y, HUANG F J, BOTTOU L. Learning methods for generic object recognition with invariance to pose and lighting[C]//Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, Jun 27- Jul 2, 2004. Washington: IEEE Computer Society, 2004: 97-104.