Abstract:
Multiple kernel learning algorithms are proposed to combine kernels in order to obtain a better similarity measure or to integrate feature representations coming from different data sources. Most of the previous research on such methods is focused on the computational efficiency issue. However, it is still not feasible to combine many kernels using existing Bayesian approaches due to their high time complexity. We propose a fully conjugate Bayesian formulation and derive a deterministic variational approximation, which allows us to combine hundreds or thousands of kernels very efficiently. We briefly explain how the proposed method can be extended for multiclass learning and semi-supervised learning. Experiments with large numbers of kernels on benchmark data sets show that our inference method is quite fast, requiring less than a minute. On one bioinformatics and three image recognition data sets, our method outperforms previously reported results with better generalization performance.

Abstract:
We develop a Bayesian nonparametric model for reconstructing magnetic resonance images (MRI) from highly undersampled k-space data. We perform dictionary learning as part of the image reconstruction process. To this end, we use the beta process as a nonparametric dictionary learning prior for representing an image patch as a sparse combination of dictionary elements. The size of the dictionary and the patch-specific sparsity pattern are inferred from the data, in addition to other dictionary learning variables. Dictionary learning is performed directly on the compressed image, and so is tailored to the MRI being considered. In addition, we investigate a total variation penalty term in combination with the dictionary learning model, and show how the denoising property of dictionary learning removes dependence on regularization parameters in the noisy setting. We derive a stochastic optimization algorithm based on Markov Chain Monte Carlo (MCMC) for the Bayesian model, and use the alternating direction method of multipliers (ADMM) for efficiently performing total variation minimization. We present empirical results on several MRI, which show that the proposed regularization framework can improve reconstruction accuracy over other methods.

Abstract:
Signal processing tasks as fundamental as sampling, reconstruction, minimum mean-square error interpolation and prediction can be viewed under the prism of reproducing kernel Hilbert spaces. Endowing this vantage point with contemporary advances in sparsity-aware modeling and processing, promotes the nonparametric basis pursuit advocated in this paper as the overarching framework for the confluence of kernel-based learning (KBL) approaches leveraging sparse linear regression, nuclear-norm regularization, and dictionary learning. The novel sparse KBL toolbox goes beyond translating sparse parametric approaches to their nonparametric counterparts, to incorporate new possibilities such as multi-kernel selection and matrix smoothing. The impact of sparse KBL to signal processing applications is illustrated through test cases from cognitive radio sensing, microarray data imputation, and network traffic prediction.

Abstract:
Nonlinear kernels are used extensively in regression models in statistics and machine learning since they often improve predictive accuracy. Variable selection is a challenge in the context of kernel based regression models. In linear regression the concept of an effect size for the regression coefficients is very useful for variable selection. In this paper we provide an analog for the effect size of each explanatory variable for Bayesian kernel regression models when the kernel is shift-invariant---for example the Gaussian kernel. The key idea that allows for the extraction of effect sizes is a random Fourier expansion for shift-invariant kernel functions. These random Fourier bases serve as a linear vector space in which a linear model can be defined and regression coefficients in this vector space can be projected onto the original explanatory variables. This projection serves as the analog for effect sizes. We apply this idea to specify a class of scalable Bayesian kernel regression models (SBKMs) for both nonparametric regression and binary classification. We also demonstrate how this framework encompasses both fixed and mixed effects modeling characteristics. We illustrate the utility of our approach on simulated and real data.

Abstract:
Semi-supervised clustering is the task of clustering data points into clusters where only a fraction of the points are labelled. The true number of clusters in the data is often unknown and most models require this parameter as an input. Dirichlet process mixture models are appealing as they can infer the number of clusters from the data. However, these models do not deal with high dimensional data well and can encounter difficulties in inference. We present a novel nonparameteric Bayesian kernel based method to cluster data points without the need to prespecify the number of clusters or to model complicated densities from which data points are assumed to be generated from. The key insight is to use determinants of submatrices of a kernel matrix as a measure of how close together a set of points are. We explore some theoretical properties of the model and derive a natural Gibbs based algorithm with MCMC hyperparameter learning. The model is implemented on a variety of synthetic and real world data sets.

Abstract:
In this study, we enrich the framework of nonparametric kernel Bayesian inference via the flexible incorporation of certain probabilistic models, such as additive Gaussian noise models. Nonparametric inference expressed in terms of kernel means, which is called kernel Bayesian inference, has been studied using basic rules such as the kernel sum rule (KSR), kernel chain rule, kernel product rule, and kernel Bayes' rule (KBR). However, the current framework used for kernel Bayesian inference deals only with nonparametric inference and it cannot allow inference when combined with probabilistic models. In this study, we introduce a novel KSR, called model-based KSR (Mb-KSR), which exploits the knowledge obtained from some probabilistic models of conditional distributions. The incorporation of Mb-KSR into nonparametric kernel Bayesian inference facilitates more flexible kernel Bayesian inference than nonparametric inference. We focus on combinations of Mb-KSR, Non-KSR, and KBR, and we propose a filtering algorithm for state space models, which combines nonparametric learning of the observation process using kernel means and additive Gaussian noise models of the transition dynamics. The idea of the Mb-KSR for additive Gaussian noise models can be extended to more general noise model cases, including a conjugate pair with a positive-definite kernel and a probabilistic model.

Abstract:
In this paper, using kernel convolution of order based dependent Dirichlet process (Griffin & Steel (2006)) we construct a nonstationary, nonseparable, nonparametric space-time process, which, as we show, satisfies desirable properties, and includes the stationary, separable, parametric processes as special cases. We also investigate the smoothness properties of our proposed model. Since our model entails an infinite random series, for Bayesian model fitting purpose we must either truncate the series or more appropriately consider a random number of summands, which renders the model dimension a random variable. We attack the vari- able dimensionality problem using the novel Transdimensional Transformation based Markov Chain Monte Carlo (TTMCMC) methodology introduced by Das & Bhat- tacharya (2014b), which can update all the variables and also change dimensions in a single block using a single random variable drawn from some arbitrary density defined on a relevant support. For the sake of completeness we also address the problem of truncating the infinite series by providing a uniform bound on the error incurred by truncating the infinite series. We illustrate our model and the methodologies on a simulated data set and also fit a real, ozone data set. The results that we obtain from both the studies are quite encouraging.

Abstract:
It is now practically the norm for data to be very high dimensional in areas such as genetics, machine vision, image analysis and many others. When analyzing such data, parametric models are often too inflexible while nonparametric procedures tend to be non-robust because of insufficient data on these high dimensional spaces. It is often the case with high-dimensional data that most of the variability tends to be along a few directions, or more generally along a much smaller dimensional submanifold of the data space. In this article, we propose a class of models that flexibly learn about this submanifold and its dimension which simultaneously performs dimension reduction. As a result, density estimation is carried out efficiently. When performing classification with a large predictor space, our approach allows the category probabilities to vary nonparametrically with a few features expressed as linear combinations of the predictors. As opposed to many black-box methods for dimensionality reduction, the proposed model is appealing in having clearly interpretable and identifiable parameters. Gibbs sampling methods are developed for posterior computation, and the methods are illustrated in simulated and real data applications.

Abstract:
When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experimental results on a variety of natural and artificial domains, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using nonparametric kernel density estimation. We observe large reductions in error on several natural and artificial data sets, which suggests that kernel estimation is a useful tool for learning Bayesian models.

Abstract:
Inference in popular nonparametric Bayesian models typically relies on sampling or other approximations. This paper presents a general methodology for constructing novel tractable nonparametric Bayesian methods by applying the kernel trick to inference in a parametric Bayesian model. For example, Gaussian process regression can be derived this way from Bayesian linear regression. Despite the success of the Gaussian process framework, the kernel trick is rarely explicitly considered in the Bayesian literature. In this paper, we aim to fill this gap and demonstrate the potential of applying the kernel trick to tractable Bayesian parametric models in a wider context than just regression. As an example, we present an intuitive Bayesian kernel machine for density estimation that is obtained by applying the kernel trick to a Gaussian generative model in feature space.