Journal of Frontiers of Computer Science and Technology ›› 2019, Vol. 13 ›› Issue (2): 322-329.DOI: 10.3778/j.issn.1673-9418.1804037

Previous Articles     Next Articles

Multi-Marginalized Denoising Autoencoders for Domain Adaptation

YANG Shuai+, HU Xuegang, ZHANG Yuhong   

  1. School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China
  • Online:2019-02-01 Published:2019-01-25


杨  帅+,胡学钢张玉红   

  1. 合肥工业大学 计算机与信息学院,合肥 230009

Abstract: Neural network models are used to address domain adaptation. As a model of neural network, marginalized stacked denoising autoencoders (mSDA) can extract and encode more robust feature space. mSDA tends to learn a common and robust feature representation to solve the problem of domain adaptation by marginalizing corruption with noise to the source and target domain data. However, mSDA uses the same marginalized and denoising method to corrupt all features. But in fact, features have different effects on the classification. This paper tries to corrupt the different features with a variant noise, and proposes the approach named multi-marginalized denoising autoencoders (M-MDA) for domain adaptation. Firstly, a polarity index WLLRU (weighted log-likelihood ratio update) which is improved from weight likelihood ratio, is proposed to distinguish the shared features from specific features. Then, the shared features and specific features are corrupted with different noises, and the noise is computed according to the distance of features between the source and target domain. And then marginalized denoising autoencoders (MDA) is used to learn a more robust feature space with the corrupted data. Lastly, the new feature space is corrupted again to enhance the proportion of shared features. The experimental results show that the proposed method outperforms state-of-the-art methods in cross-domain sentiment classification.

Key words: sentiment classification, cross-domain, noise, marginalized stacked denoising autoencoders (mSDA)

摘要: 神经网络模型被广泛用于跨领域分类学习。边缘堆叠降噪自动编码器(marginalized stacked denoising autoencoders,mSDA)作为一种神经网络模型,通过对源领域和目标领域数据进行边缘化加噪损坏,学习一个公共的、健壮的特征表示空间,从而解决领域适应问题。然而,mSDA对所有的特征都采取相同的边缘化加噪处理方式,没有考虑到不同特征对分类结果的影响不同。为此,对特征进行区分性的噪音系数干扰,提出多边缘降噪自动编码器(multi-marginalized denoising autoencoders,M-MDA)。首先,利用改进的权重似然率(weighted log-likelihood ratio update,WLLRU)区分出领域间的共享和特有特征;然后,通过计算特征在两个领域的距离,对共享特征和特有特征进行不同方式的边缘化降噪处理,并基于单层边缘降噪自动编码器(marginalized denoising autoencoders,MDA)学习获取更健壮的特征;最后,对新的特征空间进行二次损坏以强化共享特征的比例。实验结果表明,该方法在跨领域情感分类方面优于基线算法。

关键词: 情感分类, 跨领域, 噪音, 边缘堆叠降噪自动编码器(mSDA)