Abstract:
Previous analytical studies of on-line Independent Component Analysis (ICA) learning rules have focussed on asymptotic stability and efficiency. In practice the transient stages of learning will often be more significant in determining the success of an algorithm. This is demonstrated here with an analysis of a Hebbian ICA algorithm which can find a small number of non-Gaussian components given data composed of a linear mixture of independent source signals. An idealised data model is considered in which the sources comprise a number of non-Gaussian and Gaussian sources and a solution to the dynamics is obtained in the limit where the number of Gaussian sources is infinite. Previous stability results are confirmed by expanding around optimal fixed points, where a closed form solution to the learning dynamics is obtained. However, stochastic effects are shown to stabilise otherwise unstable sub-optimal fixed points. Conditions required to destabilise one such fixed point are obtained for the case of a single non-Gaussian component, indicating that the initial learning rate \eta required to successfully escape is very low (\eta = O(N^{-2}) where N is the data dimension) resulting in very slow learning typically requiring O(N^3) iterations. Simulations confirm that this picture holds for a finite system.

Abstract:
We quantify the effect of tag-position bias in Classic and Signature MPSS technology using published data from Arabidopsis, rice and human. We investigate the relationship between measured concentration and tag-position using nonlinear regression methods. The observed relationship is shown to be broadly consistent across different data sets. We find that there exist different and significant biases in both Classic and Signature MPSS data. For Classic MPSS data, genes with tag-position in the middle-range have highest measured abundance on average while genes with tag-position in the high-range, far from the 3' end, show a significant decrease. For Signature MPSS data, high-range tag-position genes tend to have a flatter relationship between tag-position and measured abundance. Thus, our results confirm that the Signature MPSS method fixes a substantial problem with the Classic MPSS method. For both Classic and Signature MPSS data there is a positive correlation between measured abundance and tag-position for low-range tag-position genes. Compared with the effects of mRNA length and number of exons, tag-position bias seems to be more significant in Arabadopsis. The tag-position bias is reflected both in the measured abundance of genes with a significant tag count and in the proportion of unexpressed genes identified.Tag-position bias should be taken into consideration when measuring mRNA transcript abundance using MPSS technology, both in Classic and Signature MPSS methods.A number of high-throughput technologies have been developed that are able to measure the abundance of many mRNA transcripts within a sample. These include microarray technology[1,2], SAGE (Serial Analysis of Gene Expression) technology[3,4] and most recently MPSS (Massively Parallel Signature Sequencing) technology[5,6]. Compared with microarray technology, SAGE and MPSS technologies have some clear advantages. In these tag-based technologies, transcript abundance is measured by counting signature

Abstract:
Natural gradient descent is a principled method for adapting the parameters of a statistical model on-line using an underlying Riemannian parameter space to redefine the direction of steepest descent. The algorithm is examined via methods of statistical physics which accurately characterize both transient and asymptotic behavior. A solution of the learning dynamics is obtained for the case of multilayer neural network training in the limit of large input dimension. We find that natural gradient learning leads to optimal asymptotic performance and outperforms gradient descent in the transient, significantly shortening or even removing plateaus in the transient generalization performance which typically hamper gradient descent training.

Abstract:
A formalism for describing the dynamics of Genetic Algorithms (GAs) using methods from statistical mechanics is applied to the problem of generalization in a perceptron with binary weights. The dynamics are solved for the case where a new batch of training patterns is presented to each population member each generation, which considerably simplifies the calculation. The theory is shown to agree closely to simulations of a real GA averaged over many runs, accurately predicting the mean best solution found. For weak selection and large problem size the difference equations describing the dynamics can be expressed analytically and we find that the effects of noise due to the finite size of each training batch can be removed by increasing the population size appropriately. If this population resizing is used, one can deduce the most computationally efficient size of training batch each generation. For independent patterns this choice also gives the minimum total number of training patterns used. Although using independent patterns is a very inefficient use of training patterns in general, this work may also prove useful for determining the optimum batch size in the case where patterns are recycled.

Abstract:
The learning dynamics of on-line independent component analysis is analysed in the limit of large data dimension. We study a simple Hebbian learning algorithm that can be used to separate out a small number of non-Gaussian components from a high-dimensional data set. The de-mixing matrix parameters are confined to a Stiefel manifold of tall, orthogonal matrices and we introduce a natural gradient variant of the algorithm which is appropriate to learning on this manifold. For large input dimension the parameter trajectory of both algorithms passes through a sequence of unstable fixed points, each described by a diffusion process in a polynomial potential. Choosing the learning rate too large increases the escape time from each of these fixed points, effectively trapping the learning in a sub-optimal state. In order to avoid these trapping states a very low learning rate must be chosen during the learning transient, resulting in learning time-scales of $O(N^2)$ or $O(N^3)$ iterations where $N$ is the data dimension. Escape from each sub-optimal state results in a sequence of symmetry breaking events as the algorithm learns each source in turn. This is in marked contrast to the learning dynamics displayed by related on-line learning algorithms for multilayer neural networks and principal component analysis. Although the natural gradient variant of the algorithm has nice asymptotic convergence properties, it has an equivalent transient dynamics to the standard Hebbian algorithm.

Abstract:
Recent advances in molecular biology allow the quantification of the transcriptome and scoring transcripts as differentially or equally expressed between two biological conditions. Although these two tasks are closely linked, the available inference methods treat them separately: a primary model is used to estimate expression and its output is post-processed using a differential expression model. In this paper, both issues are simultaneously addressed by proposing the joint estimation of expression levels and differential expression: the unknown relative abundance of each transcript can either be equal or not between two conditions. A hierarchical Bayesian model builds upon the BitSeq framework and the posterior distribution of transcript expression and differential expression is inferred using Markov Chain Monte Carlo (MCMC). It is shown that the proposed model enjoys conjugacy for fixed dimension variables, thus the full conditional distributions are analytically derived. Two samplers are constructed, a reversible jump MCMC sampler and a collapsed Gibbs sampler, and the latter is found to perform best. A cluster representation of the aligned reads to the transcriptome is introduced, allowing parallel estimation of the marginal posterior distribution of subsets of transcripts under reasonable computing time. The proposed algorithm is benchmarked against alternative methods using synthetic datasets and applied to real RNA-sequencing data. Source code is available online (https://github.com/mqbssppe/cjBitSeq).

Abstract:
We revisit the classical population genetics model of a population evolving under multiplicative selection, mutation and drift. The number of beneficial alleles in a multi-locus system can be considered a trait under exponential selection. Equations of motion are derived for the cumulants of the trait distribution in the diffusion limit and under the assumption of linkage equilibrium. Because of the additive nature of cumulants, this reduces to the problem of determining equations of motion for the expected allele distribution cumulants at each locus. The cumulant equations form an infinite dimensional linear system and in an authored appendix Adam Prugel-Bennett provides a closed form expression for these equations. We derive approximate solutions which are shown to describe the dynamics well for a broad range of parameters. In particular, we introduce two approximate analytical solutions: (1) Perturbation theory is used to solve the dynamics for weak selection and arbitrary mutation rate. The resulting expansion for the system's eigenvalues reduces to the known diffusion theory results for the limiting cases with either mutation or selection absent. (2) For low mutation rates we observe a separation of time-scales between the slowest mode and the rest which allows us to develop an approximate analytical solution for the dominant slow mode. The solution is consistent with the perturbation theory result and provides a good approximation for much stronger selection intensities.

Abstract:
We present a general method for deriving collapsed variational inference algo- rithms for probabilistic models in the conjugate exponential family. Our method unifies many existing approaches to collapsed variational inference. Our collapsed variational inference leads to a new lower bound on the marginal likelihood. We exploit the information geometry of the bound to derive much faster optimization methods based on conjugate gradients for these models. Our approach is very general and is easily applied to any model where the mean field update equations have been derived. Empirically we show significant speed-ups for probabilistic models optimized using our bound.

Abstract:
In this publication, we combine two Bayesian non-parametric models: the Gaussian Process (GP) and the Dirichlet Process (DP). Our innovation in the GP model is to introduce a variation on the GP prior which enables us to model structured time-series data, i.e. data containing groups where we wish to model inter- and intra-group variability. Our innovation in the DP model is an implementation of a new fast collapsed variational inference procedure which enables us to optimize our variationala pproximation significantly faster than standard VB approaches. In a biological time series application we show how our model better captures salient features of the data, leading to better consistency with existing biological classifications, while the associated inference algorithm provides a twofold speed-up over EM-based variational inference.

Abstract:
Motivation: High-throughput sequencing enables expression analysis at the level of individual transcripts. The analysis of transcriptome expression levels and differential expression estimation requires a probabilistic approach to properly account for ambiguity caused by shared exons and finite read sampling as well as the intrinsic biological variance of transcript expression. Results: We present BitSeq (Bayesian Inference of Transcripts from Sequencing data), a Bayesian approach for estimation of transcript expression level from RNA-seq experiments. Inferred relative expression is represented by Markov chain Monte Carlo (MCMC) samples from the posterior probability distribution of a generative model of the read data. We propose a novel method for differential expression analysis across replicates which propagates uncertainty from the sample-level model while modelling biological variance using an expression-level-dependent prior. We demonstrate the advantages of our method using simulated data as well as an RNA-seq dataset with technical and biological replication for both studied conditions. Availability: The implementation of the transcriptome expression estimation and differential expression analysis, BitSeq, has been written in C++.