|
Selection of Statistical Thresholds in Graphical ModelsDOI: 10.1155/2009/878013 Abstract: The reconstruction of gene regulatory networks using gene expression data has become an important computational tool in systems biology. A relationship among a set of genes can be established either by measuring the effect of the experimental perturbation of one or more selected genes on the remaining genes or from the use of measures of coexpression from observational data. The data is then incorporated into a suitable mathematical model of gene regulation. Such models vary in level of detail, but most are based on a gene graph, in which nodes represent individual genes, while edges between nodes indicate a regulatory relationship.One important issue that arises is the variability of the data due to biological and technological sources. This leads to imperfect resolution of gene relationships and the need for principled statistical methodology with which to assign statistical significance to any inferred feature.In many models, the existence or absence of an edge in the gene graph is resolved by a statistical hypothesis test. A natural first step is the ranking of potential edges based on the strength of the statistical evidence for the existence of the implied regulatory relationship. The intuitive approach is to construct a graph consisting of the highest ranking edges, defined by a -value threshold. The choice of threshold may be ad hoc, typically a conservative significance level such as 0.01. A more rigorous approach is to select the threshold using principles of multiple hypothesis testing (see, e.g., [1]), which may yield an estimate of the error rates of edge classification.There is a fundamental drawback to this approach, in that the lack of statistical evidence of a regulatory relationship may be as much a consequence of small sample size as of biological fact. Under this scenario, we note that selection of a -value threshold generates a graph of, say, edges, with increasing in . Under a null hypothesis of no regulatory structure, -values are randomly ran
|