|
BMC Bioinformatics 2009
Robust extraction of functional signals from gene set analysis using a generalized threshold free scoring functionAbstract: We introduce a scoring function, Gene Set Z-score (GSZ), for the analysis of functional class over-representation that combines two previous analysis methods. GSZ encompasses popular functions such as correlation, hypergeometric test, Max-Mean and Random Sets as limiting cases. GSZ is stable against changes in class size as well as across different positions of the analysed gene list in tests with randomized data. GSZ shows the best overall performance in a detailed comparison to popular functions using artificial data. Likewise, GSZ stands out in a cross-validation of methods using split real data. A comparison of empirical p-values further shows a strong difference in favour of GSZ, which clearly reports better p-values for top classes than the other methods. Furthermore, GSZ detects relevant biological themes that are missed by the other methods. These observations also hold when comparing GSZ with popular program packages.GSZ and improved versions of earlier methods are a useful contribution to the analysis of differential gene expression. The methods and supplementary material are available from the website http://ekhidna.biocenter.helsinki.fi/users/petri/public/GSZ/GSZscore.html webcite.The analysis of differential gene expression between two sample types, such as pathological and healthy tissues, is one of the cornerstones of the modern biomedical science. Here typically the up or down-regulation of each gene in the pathological samples is measured. The obtained expression data can be considered as Ordered Gene List, OGL, by sorting it according to gene regulation. The upper end of the OGL represents the strongest up-regulation and the lower end the strongest down-regulation in the pathological sample. The middle area of the list represents genes with insignificant regulation. Similar gene lists can also be generated with various other data sources, like sequence similarity searches, high throughput screening of gene knock-outs, or protein expression arrays.O
|