计算机科学与探索 ›› 2018, Vol. 12 ›› Issue (7): 1154-1161.DOI: 10.3778/j.issn.1673-9418.1705043

• 人工智能与模式识别 • 上一篇    下一篇

基于组合模型的转录调控网络构建算法研究

刘晓燕,张诚诚,郭茂祖,邢林林   

  1. 1. 哈尔滨工业大学 计算机科学与技术学院,哈尔滨 150001
    2. 北京建筑大学 电气与信息工程学院,北京 100044
  • 出版日期:2018-07-01 发布日期:2018-07-06

Research on Transcriptional Regulatory Network Based on Combined Model

LIU Xiaoyan, ZHANG Chengcheng, GUO Maozu, XING Linlin   

  1. 1. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
    2. School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China
  • Online:2018-07-01 Published:2018-07-06

摘要:

转录调控网络一直是系统生物学和生物信息学领域的一个研究热点。构建转录调控网络为揭示细胞内的生化反应机制提供了重要的手段。目前该领域的研究存在生物数据利用不充分,基因转录调控网络构建精度低等问题,尤其是在比较大的数据集上。针对以上问题,充分利用基因表达数据、基因序列数据和基因注释数据,提出了基于深度自编码器的XGBoost和逻辑回归组合模型DAXL(combined model with XGBoost and logistic regression based on deep AutoEncoder)。最后,在拟南芥数据集上进行了实验,结果表明DAXL方法提高了转录调控网络的预测精度,并且较对比方法优势明显。

关键词: 转录调控网络, 深度自编码器, XGBoost, 逻辑回归

Abstract:

The transcriptional regulatory network has been a hot research topic in the field of systems biology and bioinformatics. The transcriptional regulatory network provides the necessary means to reveal the mechanism of biochemical reactions within the cell. At present, the research in this field has some problems, such as inadequate utilization of biological data and low precision of gene transcriptional regulatory network, especially in larger data sets. To solve the above problems, this paper involves the gene expression data, gene sequence data and gene annotation data, and proposes DAXL (combined model with XGBoost and logistic regression based on deep autoencoder). The experimental results in Arabidopsis data set show that DAXL method improves the accuracy of predicting transcriptional regulatory network, and has obvious advantages compared with the contrast method.

Key words: transcriptional regulatory network, deep autoencoder, XGBoost, logistic regression