|
基于图协同过滤的单细胞RNA测序数据填补
|
Abstract:
单细胞RNA测序(Single-cell RNA Sequencing, scRNA-seq)技术能以单细胞的分辨率分析转录组数据,在生物学研究中展现出广泛的应用前景。然而技术问题会导致scRNA-seq数据存在部分基因表达缺失的情况,称之为零膨胀事件。这种情况严重阻碍了下游分析,故需要对scRNA-seq数据进行填补。本文提出了一种基于图协同过滤的单细胞RNA测序数据填补算法,为scRNA-seq分析提供了一个深度学习框架。它通过结构邻居对比的图协同过滤方法提取细胞特征表示和基因特征表示,并将两者的内积应用于零膨胀负二项分布自编码器来填补scRNA-seq数据。仿真实验结果验证了该算法在仿真数据集上的填补能力,且通过下游聚类分析实验表明该算法在公共真实数据集上细胞聚类的性能。
Single-cell RNA sequencing (scRNA-seq) technology can analyze transcriptome data at the single-cell level and is widely used in biology. However, technical issues can lead to missing gene expression in scRNA-seq data, which is called zero-inflation event. This situation seriously hinders downstream analysis, so it is necessary to impute the scRNA-seq data. This article proposes an imputation algorithm of scRNA-seq data based on graph collaborative filtering, providing a deep learning framework for scRNA-seq analysis. It extracts cell feature representations and gene feature representations through the graph collaborative filtering method of comparing structural neighbors, and applies the inner product of the two to the zero-inflated negative binomial distribution autoencoder to impute scRNA-seq data. The simulation experiment results have verified the imputation ability of the algorithm on the simulation dataset, and downstream clustering analysis experiments have shown the performance of the algorithm on cell clustering on public real datasets.
[1] | Luecken, M.D. and Theis, F.J. (2019) Current Best Practices in Single-Cell RNA-Seq Analysis: A Tutorial. Molecular Systems Biology, 15, e8746. https://doi.org/10.15252/msb.20188746 |
[2] | Shapiro, E., Biezuner, T. and Linnarsson, S. (2013) Single-Cell Sequencing-Based Technologies Will Revolutionize Whole-Organism Science. Nature Reviews Genetics, 14, 618-630. https://doi.org/10.1038/nrg3542 |
[3] | Patel, A.P., Tirosh, I., Trombetta, J.J., et al. (2014) Single-Cell RNA-Seq Highlights Intratumoral Heterogeneity in Primary Glioblastoma. Science, 344, 1396-1401. https://doi.org/10.1126/science.1254257 |
[4] | Zeisel, A., Mu?oz-Manchado, A.B., Codeluppi, S., et al. (2015) Cell Types in the Mouse Cortex and Hippocampus Revealed by Single-Cell RNA-Seq. Science, 347, 1138-1142. https://doi.org/10.1126/science.aaa1934 |
[5] | Li, W.V. and Li, J.J. (2018) An Accurate and Robust Imputation Method sCimpute for Single-Cell RNA-Seq Data. Nature Communications, 9, 997. https://doi.org/10.1038/s41467-018-03405-7 |
[6] | Van Dijk, D., Sharma, R., Nainys, J., et al. (2018) Recovering Gene Interactions from Single-Cell Data Using Data Diffusion. Cell, 174, 716-729. https://doi.org/10.1016/j.cell.2018.05.061 |
[7] | Linderman, G.C., Zhao, J. and Kluger, Y. (2022) Zero-Preserving Imputation of scRNA-seq Data Using Low-Rank Approximation. Nature Communications, 36, 3139-3147. |
[8] | Eraslan, G., Simon, L.M., Mircea, M., et al. (2019) Single-Cell RNA-seq Denoising Using a Deep Count Autoencoder. Nature Communications, 10, 390. https://doi.org/10.1038/s41467-018-07931-2 |
[9] | Lopez, R., Regier, J., Cole, M.B., et al. (2018) Deep Generative Modeling for Single-Cell Transcriptomics. Nature Methods, 15, 1053-1058. https://doi.org/10.1038/s41592-018-0229-2 |
[10] | Wang, J., Ma, A., Chang, Y., et al .(2021) scGNN Is A Novel Graph Neural Network Framework for Single-Cell RNA-Seq Analyses. Nature Communications, 12, 1882. https://doi.org/10.1038/s41467-021-22197-x |
[11] | Zappia, L., Phipson, B. and Oshlack, A. (2017) Splatter: Simulation of Single-Cell RNA Sequencing Data. Genome Biology, 18, 174. https://doi.org/10.1186/s13059-017-1305-0 |
[12] | Zheng, G.X., Terry, J.M., Belgrader, P., et al. (2017) Massively Parallel Digital Transcriptional Profiling of Single Cells. Nature Communications, 8, 14049. https://doi.org/10.1038/ncomms14049 |
[13] | Young, M.D., Mitchell, T.J., Vieira Braga, F.A., et al. (2018) Single-Cell Transcriptomes from Human Kidneys Reveal the Cellular Identity of Renal Tumors. Science, 361, 594-599. https://doi.org/10.1126/science.aat1699 |
[14] | Adam, M., Potter, A.S. and Potter, S.S. (2017) Psychrophilic Proteases Dramatically Reduce Single-Cell RNA-Seq Artifacts: A Molecular Atlas of Kidney Development. Development, 144, 3625-3632. https://doi.org/10.1242/dev.151142 |
[15] | Blondel, V.D., Guillaume, J.L., Lambiotte, R. and Lefebvre, E. (2008) Fast Unfolding of Communities in Large Networks. Journal of Statistical Mechanics: Theory and Experiment, 2008, P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008 |