|
- 2017
基于部分互信息和贝叶斯打分函数的基因调控网络构建算法
|
Abstract:
从基因表达数据出发重构基因调控网络,可有效挖掘基因间调控关系,深层次地理解生物调控过程。传统的相关性系数模型、偏相关系数模型仅能发现基因间线性关系,而互信息和条件互信息可用于发现基因间的非线性关系,且能够处理高维低样本基因表达数据。但互信息过高估计基因间的相关性,条件互信息过低估计基因间的相关性,从而导致推断出的基因网络假阳性率和假阴性率较高,且不能推断基因调控方向。因而,基于部分互信息和贝叶斯打分函数,提出一种新的基因调控网络构建算法(命名为PMIBSF)。基于部分互信息,PMIBSF算法首先删除初始基因相关网络中的冗余关联边,然后采用贝叶斯网络互信息测试打分函数学习贝叶斯网络结构,快速构建基因调控网络。在计算机模拟网络和真实生物分子网络上,仿真实验结果表明:PMIBSF性能优于目前较流行的LP、PC-alg、NARROMI和ARACNE算法,可高精度构建基因调控网络。
The inference of gene regulatory networks (GRNs) from expression data can mine the direct regulations among genes and gain deep insights into biological processes at a network level. The most widely used criteria are the Pearson correlation coefficient and partial correlation, but they can only measure linearly direct association and miss nonlinear associations. Mutual information (MI) and conditional Mutual information (CMI) not only can overcome those disadvantages, but also can process the gene expression data which are high dimensional and low samples. MI and CMI are widely used in quantifying both linear and nonlinear associations, but they suffer from the serious problems of overestimation and underestimation. GRNS based on MI and CMI suffer from higher false-positive and false-negative problem and can't identify the directions of regulatory interactions. By using the partial mutual information (PMI) and Bayesian scoring function (BSF), in this work, we present a novel algorithm (namely PMIBSF). Tested on the Synthetic networks as well as real biological molecular networks with different sizes and topologies, the results show that PMIBSF can infer RGNs with higher accuracy. The PMIBSF's performance outperforms other state-of-the-art methods, such as LP, PC-alg, NARROMI and ARACNE