|
BMC Bioinformatics 2007
A novel approach to detect hot-spots in large-scale multivariate dataAbstract: Our results show that a first-order phase transition is observable whose critical point separates the hot-spot set from the remaining variables. Its application is also found to be more successful than existing approaches in identifying statistically significant hot-spots both with simulated data sets and in real large-scale multivariate data sets from gene arrays, electrophysiological recording and functional magnetic resonance imaging experiments.In summary, this new statistical algorithm should provide a powerful new analytical tool to extract the maximum information from complex biological multivariate data.Increasingly, experiments in many areas of biological research simultaneously record activity changes in hundreds or even thousands of variables (i.e. channels, cells, genes, proteins etc) over a time window T [1-5]. Due to both internal and external noise, the recorded activity is stochastic. Traditionally all collected data variables are subjected to statistical analysis although, in most cases, not all of them change in response to applied stimuli [6]. Including large numbers of non-responsive variables in the analysis can simply bury the true information carried by a small number of responsive ones leading to the erroneous conclusion that no changes have occurred. In other words, due to the current technology development in biology we often face too much rather than too little data and paradoxically this may sometimes actually impose constraints which prevent us from detecting important information patterns contained within it [7]. In this way important information can be lost. It is therefore of crucial importance to find a way of first filtering out non-responsive variables before performing any further statistical analysis [8].In [3] we have already proposed a way to handle this problem. MANOVA [9] is applied to analyze data collected from multi-electrode array electrophysiological recordings of activity changes in ensembles of 100 or more different ne
|