|
基于t-SNE与UMAP降维的细胞分类及差异化基因筛选研究
|
Abstract:
单细胞RNA测序技术已经广泛地应用于细胞异质性等关键生物学问题的研究中,与此同时该技术的发展也为基因数据分析提出了很大的挑战。本文基于t-SNE和UMAP两种非线性降维方法,对单细胞RNA数据进行降维、聚类并与线性主成分降维聚类结果进行对比,得出结论:UMAP方法针对单细胞RNA数据降维聚类的效果更为理想。最后以UMAP非线性降维聚类的结果为例筛选出不同细胞类别中的显著差异化基因。
Single-cell RNA sequencing technology has been widely used in key biological problems such as cell heterogeneity, and at the same time, the development of this technology also poses great challenges in gene data analysis. In this paper, based on two nonlinear dimensionality reduction methods, t-SNE and UMAP, the dimensionality reduction and clustering of single-cell RNA data were carried out and compared with the results of linear principal component dimensionality reduction cluster-ing. The conclusion was drawn that the UMAP method was more ideal for the dimensionality reduc-tion clustering of single-cell RNA data. Finally, the results of UMAP nonlinear dimensionality reduc-tion clustering were taken as an example to screen out the significantly differentiated genes in dif-ferent cell categories.
[1] | Kiselev, V.Y., Kirschner, K., Schaub, M.T., et al. (2017) SC3: Consensus Clustering of Single-Cell RNA-Seq Data. Nature Methods, 14, 483-486. https://doi.org/10.1038/nmeth.4236 |
[2] | Guo, M., Wang, H., Potter, S.S., et al. (2015) SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis. PLOS Computational Biology, 11, e1004575. https://doi.org/10.1371/journal.pcbi.1004575 |
[3] | Yang, L., Liu, J., Lu, Q., et al. (2017) SAIC: An Iter-ative Clustering Approach for Analysis of Single Cell RNA-Seq Data. BMC Genomics, 18, 689-697. https://doi.org/10.1186/s12864-017-4019-5 |
[4] | Van der Maaten, L. and Hinton, G. (2008) Visualizing Data Using t-SNE. Journal of Machine Learning Research, 9, 2679-2605. |
[5] | Wang, Y., Huang, H, Rudin, C, et al. (2021) Un-derstanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t-SNE, UMAP, TriMAP, and PaCMAP for Data Visualization. Journal of Machine Learning Research, 22, 1-73. https://doi.org/10.48550/arXiv.2012.04456 |
[6] | 顾君垚, 丁强, 夏宇栋, 江爱朋, 丁晓雯. 基于UMAP-AdamDD的冷水机组故障诊断方法[J]. 低温与超导, 2022, 50(1): 81-87. |
[7] | http://mas.ruc.edu.cn/syxwlm/MASkx/5da681cd2206452ebebc141ff5121548.htm |
[8] | Kiselev, V.Y., An-drews, T.S. and Hemberg, M. (2019) Challenges in Unsupervised Clustering of Single-Cell RNA-Seq Data. Nature Re-views Genetics, 20, 273-282. https://doi.org/10.1038/s41576-018-0088-9 |
[9] | Suvà, M.L. and Tirosh, I. (2019) Single-Cell RNA Sequencing in Cancer: Lessons Learned and Emerging Challenges. Molecular Cell, 75, 7-12. https://doi.org/10.1016/j.molcel.2019.05.003 |
[10] | 吴德亮. 基于降维与聚类的单细胞RNA测序数据分析[D]: [硕士学位论文]. 哈尔滨: 哈尔滨工业大学, 2018. |