Abstract:
Kernel methods are ubiquitous tools in machine learning. They have proven to be effective in many domains and tasks. Yet, kernel methods often require the user to select a predefined kernel to build an estimator with. However, there is often little reason for the a priori selection of a kernel. Even if a universal approximating kernel is selected, the quality of the finite sample estimator may be greatly effected by the choice of kernel. Furthermore, when directly applying kernel methods, one typically needs to compute a $N \times N$ Gram matrix of pairwise kernel evaluations to work with a dataset of $N$ instances. The computation of this Gram matrix precludes the direct application of kernel methods on large datasets. In this paper we introduce Bayesian nonparmetric kernel (BaNK) learning, a generic, data-driven framework for scalable learning of kernels. We show that this framework can be used for performing both regression and classification tasks and scale to large datasets. Furthermore, we show that BaNK outperforms several other scalable approaches for kernel learning on a variety of real world datasets.

Abstract:
Inference in popular nonparametric Bayesian models typically relies on sampling or other approximations. This paper presents a general methodology for constructing novel tractable nonparametric Bayesian methods by applying the kernel trick to inference in a parametric Bayesian model. For example, Gaussian process regression can be derived this way from Bayesian linear regression. Despite the success of the Gaussian process framework, the kernel trick is rarely explicitly considered in the Bayesian literature. In this paper, we aim to fill this gap and demonstrate the potential of applying the kernel trick to tractable Bayesian parametric models in a wider context than just regression. As an example, we present an intuitive Bayesian kernel machine for density estimation that is obtained by applying the kernel trick to a Gaussian generative model in feature space.

Abstract:
Multiple kernel learning algorithms are proposed to combine kernels in order to obtain a better similarity measure or to integrate feature representations coming from different data sources. Most of the previous research on such methods is focused on the computational efficiency issue. However, it is still not feasible to combine many kernels using existing Bayesian approaches due to their high time complexity. We propose a fully conjugate Bayesian formulation and derive a deterministic variational approximation, which allows us to combine hundreds or thousands of kernels very efficiently. We briefly explain how the proposed method can be extended for multiclass learning and semi-supervised learning. Experiments with large numbers of kernels on benchmark data sets show that our inference method is quite fast, requiring less than a minute. On one bioinformatics and three image recognition data sets, our method outperforms previously reported results with better generalization performance.

Abstract:
Nonlinear kernels are used extensively in regression models in statistics and machine learning since they often improve predictive accuracy. Variable selection is a challenge in the context of kernel based regression models. In linear regression the concept of an effect size for the regression coefficients is very useful for variable selection. In this paper we provide an analog for the effect size of each explanatory variable for Bayesian kernel regression models when the kernel is shift-invariant---for example the Gaussian kernel. The key idea that allows for the extraction of effect sizes is a random Fourier expansion for shift-invariant kernel functions. These random Fourier bases serve as a linear vector space in which a linear model can be defined and regression coefficients in this vector space can be projected onto the original explanatory variables. This projection serves as the analog for effect sizes. We apply this idea to specify a class of scalable Bayesian kernel regression models (SBKMs) for both nonparametric regression and binary classification. We also demonstrate how this framework encompasses both fixed and mixed effects modeling characteristics. We illustrate the utility of our approach on simulated and real data.

Abstract:
Approximate Bayesian computation (ABC) is a likelihood-free approach for Bayesian inferences based on a rejection algorithm method that applies a tolerance of dissimilarity between summary statistics from observed and simulated data. Although several improvements to the algorithm have been proposed, none of these improvements avoid the following two sources of approximation: 1) lack of sufficient statistics: sampling is not from the true posterior density given data but from an approximate posterior density given summary statistics; and 2) non-zero tolerance: sampling from the posterior density given summary statistics is achieved only in the limit of zero tolerance. The first source of approximation can be improved by adding a summary statistic, but an increase in the number of summary statistics could introduce additional variance caused by the low acceptance rate. Consequently, many researchers have attempted to develop techniques to choose informative summary statistics. The present study evaluated the utility of a kernel-based ABC method (Fukumizu et al. 2010, arXiv:1009.5736 and 2011, NIPS 24: 1549-1557) for complex problems that demand many summary statistics. Specifically, kernel ABC was applied to population genetic inference. We demonstrate that, in contrast to conventional ABCs, kernel ABC can incorporate a large number of summary statistics while maintaining high performance of the inference.

Abstract:
The kernel least mean squares (KLMS) algorithm is a computationally efficient nonlinear adaptive filtering method that "kernelizes" the celebrated (linear) least mean squares algorithm. We demonstrate that the least mean squares algorithm is closely related to the Kalman filtering, and thus, the KLMS can be interpreted as an approximate Bayesian filtering method. This allows us to systematically develop extensions of the KLMS by modifying the underlying state-space and observation models. The resulting extensions introduce many desirable properties such as "forgetting", and the ability to learn from discrete data, while retaining the computational simplicity and time complexity of the original algorithm.

Abstract:
Approximate Bayesian Computation (ABC) is a popular sampling method in
applications involving intractable likelihood functions. Instead of evaluating
the likelihood function, ABC approximates the posterior distribution by a set
of accepted samples which are simulated from a generating model. Simulated
samples are accepted if the distances between the samples and the observation are smaller than some
threshold. The distance is calculated in terms of summary statistics. This
paper proposes Local Gradient Kernel Dimension Reduction (LGKDR) to construct
low dimensional summary statistics for ABC. The proposed method identifies a
sufficient subspace of the original summary statistics by implicitly
considering all non-linear transforms therein, and a weighting kernel is used
for the concentration of the projections. No strong assumptions are made on the
marginal distributions, nor
the regression models, permitting usage in a wide range of applications.
Experiments are done with simple rejection ABC and sequential Monte Carlo ABC
methods. Results are reported as competitive in the former and substantially
better in the latter cases in which Monte Carlo errors are compressed as much
as possible.

Abstract:
We introduce a family of positive definite kernels specifically optimized for the manipulation of 3D structures of molecules with kernel methods. The kernels are based on the comparison of the three-points pharmacophores present in the 3D structures of molecul es, a set of molecular features known to be particularly relevant for virtual screening applications. We present a computationally demanding exact implementation of these kernels, as well as fast approximations related to the classical fingerprint-based approa ches. Experimental results suggest that this new approach outperforms state-of-the-art algorithms based on the 2D structure of mol ecules for the detection of inhibitors of several drug targets.

Abstract:
In this paper, we propose an outlier-robust regularized kernel-based method for linear system identification. The unknown impulse response is modeled as a zero-mean Gaussian process whose covariance (kernel) is given by the recently proposed stable spline kernel, which encodes information on regularity and exponential stability. To build robustness to outliers, we model the measurement noise as realizations of independent Laplacian random variables. The identification problem is cast in a Bayesian framework, and solved by a new Markov Chain Monte Carlo (MCMC) scheme. In particular, exploiting the representation of the Laplacian random variables as scale mixtures of Gaussians, we design a Gibbs sampler which quickly converges to the target distribution. Numerical simulations show a substantial improvement in the accuracy of the estimates over state-of-the-art kernel-based methods.

Abstract:
Asymptotic theory of tail index estimation has been studied extensively in the frequentist literature on extreme values, but rarely in the Bayesian context. We investigate whether popular Bayesian kernel mixture models are able to support heavy tailed distributions and consistently estimate the tail index. We show that posterior inconsistency in tail index is surprisingly common for both parametric and nonparametric mixture models. We then present a set of sufficient conditions under which posterior consistency in tail index can be achieved, and verify these conditions for Pareto mixture models under general mixing priors.