|
矩阵补全算法在预测长链非编码RNA与蛋白质关联中的应用
|
Abstract:
长链非编码RNA (Long non-coding RNA, lncRNA)指的是序列长度大于200 nt,且不能直接翻译成蛋白质的一类RNA,伴随着生物信息学的不断发展进步,研究人员已经在很多实验中证实长链非编码RNA在人体发育过程中扮演着至关重要的作用,它通常会与蛋白质发生相互作用来实现其生物学功能,因此预测长链非编码RNA与蛋白质的潜在关联有着十分重要的意义。在本文中,我们提出了一种利用矩阵补全算法来预测长链非编码RNA与蛋白质相互作用的模型,称为LPIMC。它能够利用由长链非编码RNA相似性网络、蛋白质相似性网络、长链非编码RNA与蛋白质相互作用矩阵结合而来的异构网络,通过最小化核范数实现矩阵补全来生成新的相互作用邻接矩阵。5折交叉验证下证明,该模型能够有效预测长链非编码RNA-蛋白质关联。
Long non-coding RNA (lncRNA) refers to a class of RNA whose sequence length is more than 200 nt and cannot be directly translated into protein. With the continuous development and progress of bioinformatics, researchers have confirmed in many experiments that long non-coding RNA plays a crucial role in human development. It usually interacts with proteins to fulfill its biological functions, so it is very important to predict the potential association between long non-coding RNAs and proteins. In this paper, we propose a model called LPIMC that uses matrix completion algorithms to predict the interaction between long non-coding RNAs and proteins. It can generate a new adjacency matrix by using heterogeneous networks combining long non-coding RNA similarity network, protein similarity network and long non-coding RNA and protein interaction matrix, and achieve matrix completion by minimizing the nuclear norm. The model can effectively predict the long non-coding RNA-protein association under 5-fold cross validation.
[1] | Wapinski, O. and Chang, H.Y. (2011) Corrigendum: Long Noncoding RNAs and Human Disease. Trends in Cell Biol-ogy, 21, 354-361. https://doi.org/10.1016/j.tcb.2011.04.001 |
[2] | Pan, X., Fan, Y.X., Yan, J., et al. (2016) IP-Miner: Hidden ncRNA-Protein Interaction Sequential Pattern Mining with Stacked Autoencoder for Accurate Computa-tional Prediction. BMC Genomics, 17, Article No. 582.
https://doi.org/10.1186/s12864-016-2931-8 |
[3] | Xiao, Y., Zhang, J. and Deng, L. (2017) Prediction of lncRNA-Protein Interactions Using HeteSim Scores Based on Heterogeneous Networks. Scientific Reports, 7, Article No. 3664. https://doi.org/10.1038/s41598-017-03986-1 |
[4] | Hu, H., Zhang, L., Ai, H., et al. (2018) HLPI-Ensemble: Prediction of Human lncRNA-Protein Interactions Based on Ensemble Strategy. RNA Biology, 15, 797-806. https://doi.org/10.1080/15476286.2018.1457935 |
[5] | Zhan, Z.H., Jia, L.N., Zhou, Y., et al. (2019) BGFE: A Deep Learning Model for ncRNA-Protein Interaction Predictions Based on Improved Sequence Information. International Journal of Molecular Sciences, 20, Article No. 978.
https://doi.org/10.3390/ijms20040978 |
[6] | Yi, H.C., You, Z.H., Wang, M.N., et al. (2020) RPI-SE: A Stacking Ensemble Learning Framework for ncRNA-Protein Interactions Prediction Using Sequence Information. BMC Bioin-formatics, 21, Article No. 60.
https://doi.org/10.1186/s12859-020-3406-0 |
[7] | Zhang, S.W., Zhang, X.X., Fan, X.N. and Li, W.N. (2020) LPI-CNNCP: Prediction of lncRNA-Protein Interactions by Using Convolutional Neural Network with the Copy-Padding Trick. Analytical Biochemistry, 601, Article ID: 113767.
https://doi.org/10.1016/j.ab.2020.113767 |
[8] | Shen, Z.A., Luo, T., Zhou, Y.K., Yu, H. and Du, P.F. (2021) NPI-GNN: Predicting ncRNA-Protein Interactions with Deep Graph Neural Networks. Briefings in Bioinformatics, 22, bbab051. https://doi.org/10.1093/bib/bbab051 |
[9] | Li, Y., Sun, H., Feng, S., et al. (2021) Capsule-LPI: A LncRNA-Protein Interaction Predicting Tool Based on a Capsule Network. BMC Bioinformatics, 22, Article No. 246. https://doi.org/10.1186/s12859-021-04171-y |
[10] | Jin, C., Shi, Z., Zhang, H. and Yin, Y. (2021) Predicting lncRNA-Protein Interactions Based on Graph Autoencoders and Collaborative Training. 2021 IEEE International Con-ference on Bioinformatics and Biomedicine (BIBM), Houston, 9-12 December 2021, 38-43. https://doi.org/10.1109/BIBM52615.2021.9669316 |
[11] | Ge, M., Li, A. and Wang, M. (2016) A bipartite Net-work-Based Method for Prediction of Long Non-Coding RNA- Protein Interactions. Genomics, Proteomics & Bioin-formatics, 14, 62-71. https://doi.org/10.1016/j.gpb.2016.01.004 |
[12] | Zhang, W., Qu, Q., Zhang, Y., et al. (2018) The Linear Neighborhood Propagation Method for Predicting Long Non- Coding RNA-Protein Interactions. Neurocom-puting, 273, 526-534. https://doi.org/10.1016/j.neucom.2017.07.065 |
[13] | Zhao, Q., Zhang, Y., Hu, H., et al. (2018) IRWNRLPI: Integrating Random Walk and Neighborhood Regularized Logistic Matrix Factorization for lncRNA-Protein Interaction Prediction. Frontiers in Genetics, 9, Article No. 239.
https://doi.org/10.3389/fgene.2018.00239 |
[14] | Zhang, T., Wang, M., Xi, J., et al. (2018) LPGNMF: Predicting Long Non-Coding RNA and Protein Interaction Using Graph Regularized Nonnegative Matrix Factorization. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 17, 189-197. https://doi.org/10.1109/TCBB.2018.2861009 |
[15] | Yuan, J., Wu, W., Xie, C., et al. (2014) NPInter v2.0: An Up-dated Database of ncRNA Interactions. Nucleic Acids Research, 42, D104-D108. https://doi.org/10.1093/nar/gkt1057 |
[16] | Bu, D., Yu, K., Sun, S., et al. (2012) NONCODE v3.0: Integrative An-notation of Long Noncoding RNAs. Nucleic Acids Research, 40, D210-D215. https://doi.org/10.1093/nar/gkr1175 |
[17] | Apweiler, R., Bairoch, A., Wu, C.H., et al. (2004) UniProt: The Univer-sal Protein Knowledgebase. Nucleic Acids Research, 32, D115-D119. https://doi.org/10.1093/nar/gkh131 |
[18] | Bhartiya, D., Pal, K., Ghosh, S., et al. (2013) lncRNome: A Comprehen-sive Knowledgebase of Human Long Noncoding RNAs. Database, 2013, bat034. https://doi.org/10.1093/database/bat034 |
[19] | Chen, X., Yan, C.C., Zhang, X., You, Z.H., Huang, Y.A. and Yan, G.Y. (2016) HGIMDA: Heterogeneous Graph Inference for miRNA-Disease Association Prediction. Oncotarget, 7, 65257-65269.
https://doi.org/10.18632/oncotarget.11251 |
[20] | Ramlatchan, A., Yang, M., Liu, Q., et al. (2018) A Survey of Ma-trix Completion Methods for Recommendation Systems. Big Data Mining and Analytics, 1, 308-323. https://doi.org/10.26599/BDMA.2018.9020008 |
[21] | Candes, E. and Recht, B. (2013) Simple Bounds for Recov-ering Low-Complexity Models. Mathematical Programming, 141, 577-589. https://doi.org/10.1007/s10107-012-0540-0 |
[22] | Boyd, S., Parikh, N., Chu, E., et al. (2011) Distributed Optimiza-tion and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends? in Machine Learning, 3, 1-122.
https://doi.org/10.1561/2200000016 |