Multi-Marginalized Denoising Autoencoders for Domain Adaptation

doi:10.3778/j.issn.1673-9418.1804037

Abstract

Abstract: Neural network models are used to address domain adaptation. As a model of neural network, marginalized stacked denoising autoencoders (mSDA) can extract and encode more robust feature space. mSDA tends to learn a common and robust feature representation to solve the problem of domain adaptation by marginalizing corruption with noise to the source and target domain data. However, mSDA uses the same marginalized and denoising method to corrupt all features. But in fact, features have different effects on the classification. This paper tries to corrupt the different features with a variant noise, and proposes the approach named multi-marginalized denoising autoencoders (M-MDA) for domain adaptation. Firstly, a polarity index WLLRU (weighted log-likelihood ratio update) which is improved from weight likelihood ratio, is proposed to distinguish the shared features from specific features. Then, the shared features and specific features are corrupted with different noises, and the noise is computed according to the distance of features between the source and target domain. And then marginalized denoising autoencoders (MDA) is used to learn a more robust feature space with the corrupted data. Lastly, the new feature space is corrupted again to enhance the proportion of shared features. The experimental results show that the proposed method outperforms state-of-the-art methods in cross-domain sentiment classification.

Key words: sentiment classification, cross-domain, noise, marginalized stacked denoising autoencoders (mSDA)

摘要： 神经网络模型被广泛用于跨领域分类学习。边缘堆叠降噪自动编码器（marginalized stacked denoising autoencoders，mSDA）作为一种神经网络模型，通过对源领域和目标领域数据进行边缘化加噪损坏，学习一个公共的、健壮的特征表示空间，从而解决领域适应问题。然而，mSDA对所有的特征都采取相同的边缘化加噪处理方式，没有考虑到不同特征对分类结果的影响不同。为此，对特征进行区分性的噪音系数干扰，提出多边缘降噪自动编码器（multi-marginalized denoising autoencoders，M-MDA）。首先，利用改进的权重似然率（weighted log-likelihood ratio update，WLLRU）区分出领域间的共享和特有特征；然后，通过计算特征在两个领域的距离，对共享特征和特有特征进行不同方式的边缘化降噪处理，并基于单层边缘降噪自动编码器（marginalized denoising autoencoders，MDA）学习获取更健壮的特征；最后，对新的特征空间进行二次损坏以强化共享特征的比例。实验结果表明，该方法在跨领域情感分类方面优于基线算法。

关键词: 情感分类, 跨领域, 噪音, 边缘堆叠降噪自动编码器（mSDA）

YANG Shuai, HU Xuegang, ZHANG Yuhong. Multi-Marginalized Denoising Autoencoders for Domain Adaptation[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(2): 322-329.

杨帅，胡学钢，张玉红. 用于域适应的多边缘降噪自动编码器[J]. 计算机科学与探索, 2019, 13(2): 322-329.

[1]	YANG Chen, SONG Xiaoning, SONG Wei. SentiBERT: Pre-training Language Model Combining Sentiment Information [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(9): 1563-1570.
[2]	ZHANG Zhoubin, XIANG Yan, LIANG Junge, YANG Jialin, MA Lei. Using Position-Enhanced Attention Mechanism for Aspect-Based Sentiment Classi-fication [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(4): 619-627.
[3]	REN Hao, LIU Baisong, SUN Jinyang. Advances and Perspectives on Knowledge Transfer Based Cross-Domain Recom-mendation [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(11): 1813-1827.
[4]	SHAO Junjian, WANG Shitong. Incremental Clustering Algorithm with Anti-Noise Performance and Suitable for High Dimensional Data [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(9): 1553-1566.
[5]	ZHANG Tao, LIU Yang, REN Xiangying. Voice Activity Detection Based on Long-Term Power Spectrum Variability [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(9): 1534-1542.
[6]	ZHANG Tao, REN Xiangying, LIU Yang, GENG Yanzhang. Acoustic Features Extraction of Speech Enhancement Based on Auto-Encoder Feature [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(8): 1341-1350.
[7]	CHEN Jiayi, ZHAN Yinwei, CAO Huiying, WU Xingda, LI Xiaofei. Adaptive Weighted Median Filtering Algorithm Based on Detection with Trimmed Median [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(3): 505-513.
[8]	YANG Zhangjing, ZHANG Fanlong, ZHANG Hui, YANG Guowei, LI Zuoyong, LUO Limin. Tri-Decomposition Model and Algorithm with Application in Image Recovery [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(12): 1940-1949.
[9]	XING Yujuan, GUO Xian, TAN Ping, LI Ming. Text Sentiment Classification Based on Cloud Model Clustering and Mixed-Fisher Feature [J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(9): 1320-1331.
[10]	HE Qian, HUANG Lihong. PET Image Reconstruction Algorithm Combined with Anisotropic Median-Diffusion [J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(8): 1166-1175.
[11]	TANG Shaojie, HUANG Kuidong, WU Qing, WANG Zhengyao, FAN Jiulun. Improved Dual-Domain Filtering Based on Bilateral Filtering and Short-Time Fourier Transform [J]. Journal of Frontiers of Computer Science and Technology, 2015, 9(11): 1371-1381.
[12]	QI Baoyuan, SHI Zhongzhi. An Algorithm for Computing Sentiment Description Value of Text [J]. Journal of Frontiers of Computer Science and Technology, 2014, 8(5): 608-613.
[13]	YANG Aimin, LIN Jianghao, ZHOU Yongmei. Method on Building Chinese Text Sentiment Lexicon [J]. Journal of Frontiers of Computer Science and Technology, 2013, 7(11): 1033-1039.
[14]	LIU Fengzeng, LI Guohui, LI Bo. Speech Enhancement with OM-LSA Incorporating Wavelet Thresholding [J]. Journal of Frontiers of Computer Science and Technology, 2011, 5(6): 547-552.
[15]	WEI Zhisheng, JI Yangsheng, LUO Chunyong, CHEN Jiajun. Generative Sentiment Classification Model Affiliating Domain-Specific Senti-ment Lexicons [J]. Journal of Frontiers of Computer Science and Technology, 2011, 5(12): 1105-1113.

Multi-Marginalized Denoising Autoencoders for Domain Adaptation

用于域适应的多边缘降噪自动编码器

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics