计算机科学与探索

• 学术研究 •    

迭代式特征选择的单细胞分化轨迹推断算法

何鸿坚,殷依婷,谢江   

  1. 上海大学 计算机工程与科学学院,上海 200444

Single-cell Differentiation Trajectory Inference Algorithm with Iterative Feature Selection

HE Hongjian, YIN Yiting, XIE Jiang   

  1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China

摘要: 通过单细胞轨迹推断方法从单细胞转录组学数据或蛋白质组学数据构建细胞的分化轨迹,有助于理解正常组织的发育过程或者提供病理学相关的信息。然而当前的单细胞轨迹推断算法在精确度和鲁棒性的提升上仍然是一个难题,原因之一是在单细胞测序中检测到大量不相关的基因而产生的噪声。针对这一问题,迭代式特征选择的轨迹推断方法iterTIPD被提出。其创新点体现在,将广泛用于筛选差异表达基因的特征选择方法迭代式地用于线性或分支结构的单细胞RNA测序数据上,通过筛选出对构建的分化轨迹贡献最大的基因子集来提高细胞伪时间排序的精确度和鲁棒性。在4种scRNA-seq数据集上的实验结果表明,iterTIPD可以有效地提高单细胞轨迹推断算法的精确度和鲁棒性。同样,iterTIPD也使其他的轨迹推断算法的性能得到的提升,以此证明了iterTIPD具有泛化性。iterTIPD算法成功重构了神经干细胞的分化轨迹,并通过对比发现,该分化轨迹与已知的神经干细胞分化轨迹高度一致。同时,发现Top2a和Gja1可能是定义活化的神经干细胞亚群的新的标志物。

关键词: 单细胞RNA测序技术, 基因差异性表达, 单细胞分化轨迹推断, 迭代式特征选择, 生物信息学

Abstract: The construction of cell differentiation trajectories from single-cell transcriptomic data or proteomic data by single-cell trajectory inference methods can help to understand the developmental process of normal tissues or provide pathologically relevant information. However, the accuracy and robustness of current single-cell trajectory inference algorithms are still a challenge, one of the reasons is the noise caused by the detection of a large number of unrelated genes in single-cell sequencing. In order to solve this problem, a trajectory inference method iterTIPD based on iterative feature selection is proposed. The innovations embodied in will be widely used to screen differentially expressed genes of iterative feature selection method for linear or branch of the structure of the single cell RNA sequencing data, through selected for the construction of the largest contribution to the differentiation of track to improve gene subset cells pseudo time line order accuracy and robustness. The experimental results on four scRNA-seq datasets show that iterTIPD can effectively improve the accuracy and robustness of the single-cell trajectory inference algorithm. IterTIPD also improves the performance of other trajectory inference algorithms, proving iterTIPD's generalization. The differentiation trajectory of neural stem cells was reconstructed by iterTIPD algorithm, and the comparison showed that the differentiation trajectory was highly consistent with the known neural stem cell differentiation trajectory.Meanwhile,Top2a and Gja1 may be novel markers defining activated neural stem cell subpopulations.

Key words: Single-cell RNA Sequencing, Differential Gene Expression, Single-cell Differentiation Trajectory Inference, Iterative feature Selection, Bioinformatics