Abstract:
Most of the consistency analyses of Bayesian procedures for variable selection in regression refer to pairwise consistency, that is, consistency of Bayes factors. However, variable selection in regression is carried out in a given class of regression models where a natural variable selector is the posterior probability of the models. In this paper we analyze the consistency of the posterior model probabilities when the number of potential regressors grows as the sample size grows. The novelty in the posterior model consistency is that it depends not only on the priors for the model parameters through the Bayes factor, but also on the model priors, so that it is a useful tool for choosing priors for both models and model parameters. We have found that some classes of priors typically used in variable selection yield posterior model inconsistency, while mixtures of these priors improve this undesirable behavior. For moderate sample sizes, we evaluate Bayesian pairwise variable selection procedures by comparing their frequentist Type I and II error probabilities. This provides valuable information to discriminate between the priors for the model parameters commonly used for variable selection.

Abstract:
While Jeffreys priors usually are well-defined for the parameters of mixtures of distributions, they are not available in closed form. Furthermore, they often are improper priors. Hence, they have never been used to draw inference on the mixture parameters. We study in this paper the implementation and the properties of Jeffreys priors in several mixture settings, show that the associated posterior distributions most often are improper, and then propose a noninformative alternative for the analysis of mixtures.

Abstract:
This paper deals with Bayesian inference of a mixture of Gaussian distributions. A novel formulation of the mixture model is introduced, which includes the prior constraint that each Gaussian component is always assigned a minimal number of data points. This enables noninformative improper priors such as the Jeffreys prior to be placed on the component parameters. We demonstrate difficulties involved in specifying a prior for the standard Gaussian mixture model, and show how the new model can be used to overcome these. MCMC methods are given for efficient sampling from the posterior of this model.

Abstract:
In the class of normal regression models with a finite number of regressors, and for a wide class of prior distributions, a Bayesian model selection procedure based on the Bayes factor is consistent [Casella and Moreno J. Amer. Statist. Assoc. 104 (2009) 1261--1271]. However, in models where the number of parameters increases as the sample size increases, properties of the Bayes factor are not totally understood. Here we study consistency of the Bayes factors for nested normal linear models when the number of regressors increases with the sample size. We pay attention to two successful tools for model selection [Schwarz Ann. Statist. 6 (1978) 461--464] approximation to the Bayes factor, and the Bayes factor for intrinsic priors [Berger and Pericchi J. Amer. Statist. Assoc. 91 (1996) 109--122, Moreno, Bertolino and Racugno J. Amer. Statist. Assoc. 93 (1998) 1451--1460]. We find that the the Schwarz approximation and the Bayes factor for intrinsic priors are consistent when the rate of growth of the dimension of the bigger model is $O(n^b)$ for $b<1$. When $b=1$ the Schwarz approximation is always inconsistent under the alternative while the Bayes factor for intrinsic priors is consistent except for a small set of alternative models which is characterized.

Abstract:
When using mixture models it may be the case that the modeller has a-priori beliefs or desires about what the components of the mixture should represent. For example, if a mixture of normal densities is to be fitted to some data, it may be desirable for components to focus on capturing differences in location rather than scale. We introduce a framework called proximity penalty priors (PPPs) that allows this preference to be made explicit in the prior information. The approach is scale-free and imposes minimal restrictions on the posterior; in particular no arbitrary thresholds need to be set. We show the theoretical validity of the approach, and demonstrate the effects of using PPPs on posterior distributions with simulated and real data.

Abstract:
In this paper the asymptotic distribution of estimators is derived in a general regression setting where rank restrictions on a submatrix of the coefficient matrix are imposed and the regressors can include stationary or I(1) processes. Such a setting occurs e.g. in factor models. Rates of convergence are derived and the asymptotic distribution is given for least squares estimators as well as fully-modified estimators. The gains in imposing the rank restrictions are investigated. A number of special cases are discussed including the Johansen results in the case of cointegrated VAR(p) processes.

Abstract:
We propose an efficient way to sample from a class of structured multivariate Gaussian distributions which routinely arise as conditional posteriors of model parameters that are assigned a conditionally Gaussian prior. The proposed algorithm only requires matrix operations in the form of matrix multiplications and linear system solutions. We exhibit that the computational complexity of the proposed algorithm grows linearly with the dimension unlike existing algorithms relying on Cholesky factorizations with cubic orders of complexity. The algorithm should be broadly applicable in settings where Gaussian scale mixture priors are used on high dimensional model parameters. We provide an illustration through posterior sampling in a high dimensional regression setting with a horseshoe prior on the vector of regression coefficients.

Abstract:
Zellner's g-prior is a popular prior choice for the model selection problems in the context of normal regression models. Wang and Sun (2014) recently adopt this prior and put a special hyper-prior for g, which results in a closed-form expression of Bayes factor for nested linear model comparisons. They have shown that under very general conditions, the Bayes factor is consistent when two competing models are of order O(n^tau) for tau <1 and for tau=1 is almost consistent except a small inconsistency region around the null hypothesis. In this paper, we study Bayes factor consistency for nonnested linear models with a growing number of parameters. Some of the proposed results generalize the ones of the Bayes factor for the case of nested linear models. Specifically, we compare the asymptotic behaviors between the proposed Bayes factor and the intrinsic Bayes factor in the literature.

Abstract:
We study location-scale mixture priors for nonparametric statistical problems, including multivariate regression, density estimation and classification. We show that a rate-adaptive procedure can be obtained if the prior is properly constructed. In particular, we show that adaptation is achieved if a kernel mixture prior on a regression function is constructed using a Gaussian kernel, an inverse gamma bandwidth, and Gaussian mixing weights.

Abstract:
Variable selection has received widespread attention over the last decade as we routinely encounter high-throughput datasets in complex biological and environment research. Most Bayesian variable selection methods are restricted to mixture priors having separate components for characterizing the signal and the noise. However, such priors encounter computational issues in high dimensions. This has motivated continuous shrinkage priors, resembling the two-component priors facilitating computation and interpretability. While such priors are widely used for estimating high-dimensional sparse vectors, selecting a subset of variables remains a daunting task. In this article, we propose a general approach for variable selection with shrinkage priors. The presence of very few tuning parameters makes our method attractive in comparison to adhoc thresholding approaches. The applicability of the approach is not limited to continuous shrinkage priors, but can be used along with any shrinkage prior. Theoretical properties for near-collinear design matrices are investigated and the method is shown to have good performance in a wide range of synthetic data examples.