计算机科学与探索 ›› 2018, Vol. 12 ›› Issue (7): 1136-1144.DOI: 10.3778/j.issn.1673-9418.1705029

• 人工智能与模式识别 • 上一篇    下一篇

基于张量的正则化多线性回归算法及其应用

路子祥,黄嘉爽,屠黎阳,徐西嘉,张道强   

  1. 1. 南京航空航天大学 计算机科学与技术学院,南京 211106
    2. 南京医科大学附属南京脑科医院 精神科,南京 210029
  • 出版日期:2018-07-01 发布日期:2018-07-06

Tensor-Based Regularized Multilinear Regression and Its Application

LU Zixiang, HUANG Jiashuang, TU Liyang, XU Xijia, ZHANG Daoqiang   

  1. 1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
    2. Department of Psychiatry, Affiliated Nanjing Brain Hospital of Nanjing Medical University, Nanjing 210029, China
  • Online:2018-07-01 Published:2018-07-06

摘要:

常用的回归算法,如LASSO(least absolute shrinkage and selection operator)算法,是对数据向量化后进行分析处理。然而,数据向量化将破坏数据的原始结构和内在相关性,并且忽略数据的高阶依赖性。与此同时,数据向量化会导致数据维数过高,计算复杂和存储困难。因此,提出了一种基于张量的正则化多线性回归算法(multilinear LASSO,mLASSO)。该算法是LASSO算法在张量空间的一个扩展,首先使用加权向量对张量做模乘运算,将张量空间变换到向量空间;然后在该空间上使用LASSO算法对目标值进行回归分析,得到该方向上的加权向量,采用交替迭代算法依次优化各个方向的加权向量;最后,使用各个方向的最优加权向量和张量数据做模乘运算得到预测变量值。算法主要包含以下两个优点:(1)充分利用了数据的结构信息;(2)该算法使用的LASSO算法嵌入了特征选择功能,提高了模型的泛化能力。实验结果表明该方法在多线性数据上表现出了良好的性能。

关键词: 多线性回归, 正则化, 张量, 特征选择, LASSO

Abstract:

As one of the conventional regression algorithms, LASSO (least absolute shrinkage and selection operator) algorithm is mostly employed to analyze the vectorized dataset. However, the vectorization of a dataset may undermine the original structure and inner relations of the dataset and hide the high-order dependencies. Further, it also increases the data dimensionality as well as time and space complexity. This paper proposes a tensor-based regularized multilinear regression algorithm, named multilinear LASSO (mLASSO), by reformulating the LASSO algorithm for tensor space. The proposed algorithm firstly decomposes tensor space to vector space by applying mode production and employing weighted vectors. Then, the algorithm iteratively uses LASSO to update the weighted vectors for converging the proposed model. Finally, the optimum weighted vectors are applied to all direction in the tensor space in order to generate the regression model. The contribution of this paper is twofold: (1) The algorithm employs the whole structural information of the dataset for generating a regression model. (2) Since the proposed algorithm employs LASSO, it can significantly improve the performance of the generated model by using embedded feature selection. Experimental studies confirm that the proposed algorithm achieves satisfactory performance on the multilinear data.

Key words: multilinear regression, regularized, tensor, feature selection, LASSO