Abstract:
We consider covariance estimation in the multivariate generalized Gaussian distribution (MGGD) and elliptically symmetric (ES) distribution. The maximum likelihood optimization associated with this problem is non-convex, yet it has been proved that its global solution can be often computed via simple fixed point iterations. Our first contribution is a new analysis of this likelihood based on geodesic convexity that requires weaker assumptions. Our second contribution is a generalized framework for structured covariance estimation under sparsity constraints. We show that the optimizations can be formulated as convex minimization as long the MGGD shape parameter is larger than half and the sparsity pattern is chordal. These include, for example, maximum likelihood estimation of banded inverse covariances in multivariate Laplace distributions, which are associated with time varying autoregressive processes.

Abstract:
A generalized Gaussian process model (GGPM) is a unifying framework that encompasses many existing Gaussian process (GP) models, such as GP regression, classification, and counting. In the GGPM framework, the observation likelihood of the GP model is itself parameterized using the exponential family distribution (EFD). In this paper, we consider efficient algorithms for approximate inference on GGPMs using the general form of the EFD. A particular GP model and its associated inference algorithms can then be formed by changing the parameters of the EFD, thus greatly simplifying its creation for task-specific output domains. We demonstrate the efficacy of this framework by creating several new GP models for regressing to non-negative reals and to real intervals. We also consider a closed-form Taylor approximation for efficient inference on GGPMs, and elaborate on its connections with other model-specific heuristic closed-form approximations. Finally, we present a comprehensive set of experiments to compare approximate inference algorithms on a wide variety of GGPMs.

Abstract:
We expand a framework for Bayesian variable selection for Gaussian process (GP) models by employing spiked Dirichlet process (DP) prior constructions over set partitions containing covariates. Our approach results in a nonparametric treatment of the distribution of the covariance parameters of the GP covariance matrix that in turn induces a clustering of the covariates. We evaluate two prior constructions: the first one employs a mixture of a point-mass and a continuous distribution as the centering distribution for the DP prior, therefore, clustering all covariates. The second one employs a mixture of a spike and a DP prior with a continuous distribution as the centering distribution, which induces clustering of the selected covariates only. DP models borrow information across covariates through model-based clustering. Our simulation results, in particular, show a reduction in posterior sampling variability and, in turn, enhanced prediction performances. In our model formulations, we accomplish posterior inference by employing novel combinations and extensions of existing algorithms for inference with DP prior models and compare performances under the two prior constructions. 1. Introduction In this paper, we expand a framework for Bayesian variable selection for Gaussian process (GP) models by employing spiked Dirichlet process (DP) prior constructions over set partitions containing covariates. Savitsky et al. [1] incorporate Gaussian processes in the generalized linear model framework of McCullagh and Nelder [2] by expanding the flexibility for the response surface to lie in the space of continuous functions. Their modeling approach results in a class of nonparametric regression models where the covariance matrix depends on the predictors. GP models, in particular, accommodate high-dimensional heterogenous covariate spaces where covariates possess different degrees of linear and non-linear association to the response, Rasmussen and Williams [3]. In this paper, we investigate mixture prior models that induce a nonparametric treatment of the distribution of the covariance parameters of the GP covariance matrix that, in turn, induces a clustering of the covariates. Mixture priors that employ a spike at zero are now routinely used for variable selection—see for example, George and McCulloch [4] and Brown et al. [5] for univariate and multivariate regression settings, respectively, and Sha et al. [6] for probit models—and have been particularly successful in applications to high-dimensional settings. These approaches employ mixture prior formulations for

Abstract:
We introduce Gaussian Process Topic Models (GPTMs), a new family of topic models which can leverage a kernel among documents while extracting correlated topics. GPTMs can be considered a systematic generalization of the Correlated Topic Models (CTMs) using ideas from Gaussian Process (GP) based embedding. Since GPTMs work with both a topic covariance matrix and a document kernel matrix, learning GPTMs involves a novel component-solving a suitable Sylvester equation capturing both topic and document dependencies. The efficacy of GPTMs is demonstrated with experiments evaluating the quality of both topic modeling and embedding.

Abstract:
We propose a general framework for non-normal multivariate data analysis called multivariate covariance generalized linear models (McGLMs), designed to handle multivariate response variables, along with a wide range of temporal and spatial correlation structures defined in terms of a covariance link function combined with a matrix linear predictor involving known matrices. The method is motivated by three data examples that are not easily handled by existing methods. The first example concerns multivariate count data, the second involves response variables of mixed types, combined with repeated measures and longitudinal structures, and the third involves a spatio-temporal analysis of rainfall data. The models take non-normality into account in the conventional way by means of a variance function, and the mean structure is modelled by means of a link function and a linear predictor. The models are fitted using an efficient Newton scoring algorithm based on quasi-likelihood and Pearson estimating functions, using only second-moment assumptions. This provides a unified approach to a wide variety of different types of response variables and covariance structures, including multivariate extensions of repeated measures, time series, longitudinal, spatial and spatio-temporal structures.

Abstract:
Due to its heavy-tailed and fully parametric form, the multivariate generalized Gaussian distribution (MGGD) has been receiving much attention for modeling extreme events in signal and image processing applications. Considering the estimation issue of the MGGD parameters, the main contribution of this paper is to prove that the maximum likelihood estimator (MLE) of the scatter matrix exists and is unique up to a scalar factor, for a given shape parameter \beta\in(0,1). Moreover, an estimation algorithm based on a Newton-Raphson recursion is proposed for computing the MLE of MGGD parameters. Various experiments conducted on synthetic and real data are presented to illustrate the theoretical derivations in terms of number of iterations and number of samples for different values of the shape parameter. The main conclusion of this work is that the parameters of MGGDs can be estimated using the maximum likelihood principle with good performance.

Abstract:
In this paper we propose a generalized Gaussian process concurrent regression model for functional data where the functional response variable has a binomial, Poisson or other non-Gaussian distribution from an exponential family while the covariates are mixed functional and scalar variables. The proposed model offers a nonparametric generalized concurrent regression method for functional data with multi-dimensional covariates, and provides a natural framework on modeling common mean structure and covariance structure simultaneously for repeatedly observed functional data. The mean structure provides an overall information about the observations, while the covariance structure can be used to catch up the characteristic of each individual batch. The prior specification of covariance kernel enables us to accommodate a wide class of nonlinear models. The definition of the model, the inference and the implementation as well as its asymptotic properties are discussed. Several numerical examples with different non-Gaussian response variables are presented. Some technical details and more numerical examples as well as an extension of the model are provided as supplementary materials.

Abstract:
We consider a Gaussian process formulation of the multiple kernel learning problem. The goal is to select the convex combination of kernel matrices that best explains the data and by doing so improve the generalisation on unseen data. Sparsity in the kernel weights is obtained by adopting a hierarchical Bayesian approach: Gaussian process priors are imposed over the latent functions and generalised inverse Gaussians on their associated weights. This construction is equivalent to imposing a product of heavy-tailed process priors over function space. A variational inference algorithm is derived for regression and binary classification.

Abstract:
Copulas allow to learn marginal distributions separately from the multivariate dependence structure (copula) that links them together into a density function. Vine factorizations ease the learning of high-dimensional copulas by constructing a hierarchy of conditional bivariate copulas. However, to simplify inference, it is common to assume that each of these conditional bivariate copulas is independent from its conditioning variables. In this paper, we relax this assumption by discovering the latent functions that specify the shape of a conditional copula given its conditioning variables We learn these functions by following a Bayesian approach based on sparse Gaussian processes with expectation propagation for scalable, approximate inference. Experiments on real-world datasets show that, when modeling all conditional dependencies, we obtain better estimates of the underlying copula of the data.

Abstract:
Learning curves for Gaussian process regression are well understood when the `student' model happens to match the `teacher' (true data generation process). I derive approximations to the learning curves for the more generic case of mismatched models, and find very rich behaviour: For large input space dimensionality, where the results become exact, there are universal (student-independent) plateaux in the learning curve, with transitions in between that can exhibit arbitrarily many over-fitting maxima. In lower dimensions, plateaux also appear, and the asymptotic decay of the learning curve becomes strongly student-dependent. All predictions are confirmed by simulations.