计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (7): 1609-1621.DOI: 10.3778/j.issn.1673-9418.2203047

• 理论·算法 • 上一篇    下一篇

迭代式特征选择的单细胞分化轨迹推断算法

何鸿坚,殷依婷,谢江   

  1. 上海大学 计算机工程与科学学院,上海 200444
  • 出版日期:2023-07-01 发布日期:2023-07-01

Single-Cell Differentiation Trajectory Inference Algorithm with Iterative Feature Selection

HE Hongjian, YIN Yiting, XIE Jiang   

  1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
  • Online:2023-07-01 Published:2023-07-01

摘要: 通过单细胞轨迹推断方法从单细胞转录组学数据或蛋白质组学数据构建细胞的分化轨迹,有助于理解正常组织的发育过程或者提供病理学相关的信息。然而当前的单细胞轨迹推断算法在精确度和鲁棒性的提升上仍然是一个难题,原因之一是在单细胞测序中检测到大量不相关的基因而产生噪声。针对这一问题,迭代式特征选择的轨迹推断方法iterTIPD被提出。其创新点体现在,将广泛用于筛选差异表达基因的特征选择方法迭代式地用于线性或分支结构的单细胞RNA测序数据上,通过筛选出对构建的分化轨迹贡献最大的基因子集来提高细胞伪时间排序的精确度和鲁棒性。在四种scRNA-seq数据集上的实验结果表明,iterTIPD可以有效地提高单细胞轨迹推断算法的精确度和鲁棒性。同样,iterTIPD也使其他的轨迹推断算法的性能得到提升,以此证明iterTIPD具有泛化性。iterTIPD算法成功重构了神经干细胞的分化轨迹,通过对比发现,该分化轨迹与已知的神经干细胞分化轨迹高度一致。同时发现Top2a和Gja1可能是定义活化的神经干细胞亚群的新的标志物。

关键词: 单细胞RNA测序技术, 基因差异性表达, 单细胞分化轨迹推断, 迭代式特征选择, 生物信息学

Abstract: The construction of cell differentiation trajectories from single-cell transcriptomic data or proteomic data by single-cell trajectory inference methods can help to understand the developmental process of normal tissues or provide pathologically relevant information. However, the accuracy and robustness of current single-cell trajectory inference algorithms are still a challenge, one of the reasons is the noise caused by the detection of a large number of unrelated genes in single-cell sequencing. In order to solve this problem, a trajectory inference method iterTIPD (iterative trajectory inference based on probability distribution) based on iterative feature selection is proposed. Its innovation lies in iteratively applying feature selection methods widely used for screening differentially expressed genes to linear or branching single-cell RNA sequencing data, and improving the accuracy and robustness of cell pseudotime ordering by selecting the gene subset that contributes the most to the constructed differentiation trajectory. Experimental results on four scRNA-seq datasets show that iterTIPD can effectively improve the accuracy and robustness of the single-cell trajectory inference algorithm. IterTIPD also improves the performance of other trajectory inference algorithms, proving generalization of iterTIPD. The differentiation trajectory of neural stem cells is reconstructed by iterTIPD algorithm, and the comparison shows that the differentiation trajectory is highly consistent with the known neural stem cell differentiation trajectory. Meanwhile, Top2a and Gja1 may be novel markers defining activated neural stem cell subpopulations.

Key words: single-cell RNA sequencing, differential gene expression, single-cell differentiation trajectory inference, iterative feature selection, bioinformatics