|
Biophysics 2025
基于双流Mamba神经网络预测多基因转录结果
|
Abstract:
基因扰动技术与单细胞RNA测序、CRISPR编辑等技术的结合,推动了单细胞水平扰动效应研究的标准化,但人类细胞类型多样性及扰动组合的计算复杂性,使得实验穷举不可行。在此背景下,多基因扰动转录预测模型经历了从传统统计建模到深度学习驱动的范式变革。但仍存在如难以适配全基因组规模应用以及仅处理单基因等问题。针对上述问题,本文提出了一种基于Mamba的深度学习算法,构建双流整合框架以实现多基因扰动响应预测。该模型构建双流框架捕获基因表达数据的统计和生物特征,规避单一数据或知识驱动的局限性。核心组件Mamba凭借其线性复杂度与选择性记忆机制,在保持高效计算的同时实现基因互作关系的精准建模,其输入依赖的参数化策略可挖掘复杂数据特征。最后的分析结果也表明该模型在基因扰动预测上具有更加优秀的性能,且能够提供生物解释性,是一种有效的方法。
The integration of gene perturbation technology with single-cell RNA sequencing and CRISPR editing has promoted the standardization of perturbation effect studies at the single-cell level. However, the diversity of human cell types and computational complexity of perturbation combinations make exhaustive experimental exploration infeasible. In this context, transcriptional prediction models for multi-gene perturbations have undergone a paradigm shift from traditional statistical modeling to deep learning-driven approaches. Nevertheless, challenges persist, including difficulties in adapting to genome-scale applications and limitations to single-gene perturbation analysis. To address these issues, this study proposes a Mamba-based deep learning algorithm that constructs a dual-stream integration framework for multi-gene perturbation response prediction. The model establishes dual-stream architecture to capture both statistical and biological features of gene expression data, circumventing the limitations of single data- or knowledge-driven approaches. The core Mamba component leverages its linear computational complexity and selective memory mechanism to achieve precise modeling of gene-gene interactions while maintaining computational efficiency. Its input-dependent parameterization strategy enables effective mining of complex data features. Analytical results demonstrate that the proposed model exhibits superior performance in gene perturbation prediction while providing biological interpretability, establishing it as an effective methodology.
[1] | Jaitin, D.A., Weiner, A., Yofe, I., Lara-Astiaso, D., Keren-Shaul, H., David, E., et al. (2016) Dissecting Immune Circuits by Linking Crispr-Pooled Screens with Single-Cell RNA-Seq. Cell, 167, 1883-1896.e15. https://doi.org/10.1016/j.cell.2016.11.039 |
[2] | Katti, A., Diaz, B.J., Caragine, C.M., Sanjana, N.E. and Dow, L.E. (2022) CRISPR in Cancer Biology and Therapy. Nature Reviews Cancer, 22, 259-279. https://doi.org/10.1038/s41568-022-00441-w |
[3] | Adamson, B., Norman, T.M., Jost, M., Cho, M.Y., Nuñez, J.K., Chen, Y., et al. (2016) A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response. Cell, 167, 1867-1882.e21. https://doi.org/10.1016/j.cell.2016.11.048 |
[4] | Hanna, R.E. and Doench, J.G. (2020) Design and Analysis of CRISPR-Cas Experiments. Nature Biotechnology, 38, 813-823. https://doi.org/10.1038/s41587-020-0490-7 |
[5] | Nakamura, M., Gao, Y., Dominguez, A.A. and Qi, L.S. (2021) CRISPR Technologies for Precise Epigenome Editing. Nature Cell Biology, 23, 11-22. https://doi.org/10.1038/s41556-020-00620-7 |
[6] | Frangieh, C.J., Melms, J.C., Thakore, P.I., Geiger-Schuller, K.R., Ho, P., Luoma, A.M., et al. (2021) Multimodal Pooled Perturb-Cite-Seq Screens in Patient Models Define Mechanisms of Cancer Immune Evasion. Nature Genetics, 53, 332-341. https://doi.org/10.1038/s41588-021-00779-1 |
[7] | Przybyla, L. and Gilbert, L.A. (2021) A New Era in Functional Genomics Screens. Nature Reviews Genetics, 23, 89-103. https://doi.org/10.1038/s41576-021-00409-w |
[8] | Stringer, C., Wang, T., Michaelos, M. and Pachitariu, M. (2020) Cellpose: A Generalist Algorithm for Cellular Segmentation. Nature Methods, 18, 100-106. https://doi.org/10.1038/s41592-020-01018-x |
[9] | Littman, R., Hemminger, Z., Foreman, R., Arneson, D., Zhang, G., Gómez‐Pinilla, F., et al. (2021) Joint Cell Segmentation and Cell Type Annotation for Spatial Transcriptomics. Molecular Systems Biology, 17, e10108. https://doi.org/10.15252/msb.202010108 |
[10] | Zeng, Y., Zhou, X., Rao, J., Lu, Y. and Yang, Y. (2020) Accurately Clustering Single-Cell RNA-Seq Data by Capturing Structural Relations between Cells through Graph Convolutional Network. 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Seoul, 16-19 December 2020, 519-522. https://doi.org/10.1109/bibm49941.2020.9313569 |
[11] | Fan, Z., Zhao, H., Zhou, J., Li, D., Fan, Y., Bi, Y., et al. (2024) A Versatile Attention-Based Neural Network for Chemical Perturbation Analysis and Its Potential to Aid Surgical Treatment: An Experimental Study. International Journal of Surgery, 110, 7671-7686. https://doi.org/10.1097/js9.0000000000001781 |
[12] | Lotfollahi, M., Wolf, F.A. and Theis, F.J. (2019) scGen Predicts Single-Cell Perturbation Responses. Nature Methods, 16, 715-721. https://doi.org/10.1038/s41592-019-0494-8 |
[13] | Aibar, S., González-Blas, C.B., Moerman, T., Huynh-Thu, V.A., Imrichova, H., Hulselmans, G., et al. (2017) SCENIC: Single-Cell Regulatory Network Inference and Clustering. Nature Methods, 14, 1083-1086. https://doi.org/10.1038/nmeth.4463 |
[14] | Roohani, Y., Huang, K. and Leskovec, J. (2023) Predicting Transcriptional Outcomes of Novel Multigene Perturbations with Gears. Nature Biotechnology, 42, 927-935. https://doi.org/10.1038/s41587-023-01905-6 |
[15] | Kamimoto, K., Stringa, B., Hoffmann, C.M. et al. (2023) Dissecting Cell Identity via Network Inference and in Silico Gene Perturbation. Nature, 614, 742-751. https://doi.org/10.1038/s41586-022-05688-9 |
[16] | Yu, H. and Welch, J.D. (2024) Perturbnet Predicts Single-Cell Responses to Unseen Chemical and Genetic Perturbations. bioRxiv. https://doi.org/10.1101/2022.07.20.500854 |
[17] | Lotfollahi, M., Klimovskaia Susmelj, A., De Donno, C., Hetzel, L., Ji, Y., Ibarra, I.L., et al. (2023) Predicting Cellular Responses to Complex Perturbations in High‐Throughput Screens. Molecular Systems Biology, 19, e11517. https://doi.org/10.15252/msb.202211517 |
[18] | Bai, D., Ellington, C.N., Mo, S., Song, L. and Xing, E.P. (2024) AttentionPert: Accurately Modeling Multiplexed Genetic Perturbations with Multi-Scale Effects. Bioinformatics, 40, i453-i461. https://doi.org/10.1093/bioinformatics/btae244 |
[19] | Tan, Y., Xie, L., Yang, H., Zhang, Q., Luo, J. and Zhang, Y. (2024) BioDSNN: A Dual-Stream Neural Network with Hybrid Biological Knowledge Integration for Multi-Gene Perturbation Response Prediction. Briefings in Bioinformatics, 26, bbae617. https://doi.org/10.1093/bib/bbae617 |
[20] | Vaswani, A., Shazeer, N., Parmar, N. et al. (2017) Attention Is All You Need. arXiv: 1706.03762. https://doi.org/10.48550/arXiv.1706.03762 |
[21] | Gu, A. and Dao, T. (2023) Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv: 2312.00752. https://doi.org/10.48550/arXiv.2312.00752 |
[22] | Replogle, J.M., Saunders, R.A., Pogson, A.N., Hussmann, J.A., Lenail, A., Guna, A., et al. (2022) Mapping Information-Rich Genotype-Phenotype Landscapes with Genome-Scale Perturb-Seq. Cell, 185, 2559-2575.e28. https://doi.org/10.1016/j.cell.2022.05.013 |
[23] | Norman, T.M., Horlbeck, M.A., Replogle, J.M., et al. (2019) Exploring Genetic InterAction Manifolds Constructed from Rich Single-Cell Phenotypes. Science, 365, 786-793. |
[24] | Kendall, A. and Gal, Y. (2017) What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? Proceeding of 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 5580-5590. |
[25] | Replogle, J.M., Norman, T.M., Xu, A., Hussmann, J.A., Chen, J., Cogan, J.Z., et al. (2020) Combinatorial Single-Cell CRISPR Screens by Direct Guide RNA Capture and Targeted Sequencing. Nature Biotechnology, 38, 954-961. https://doi.org/10.1038/s41587-020-0470-y |
[26] | Bock, C., Datlinger, P., Chardon, F., Coelho, M.A., Dong, M.B., Lawson, K.A., et al. (2022) High-Content CRISPR Screening. Nature Reviews Methods Primers, 2, Article No. 8. https://doi.org/10.1038/s43586-021-00093-4 |