Abstract:
The data augmentation (DA) algorithm is a widely used Markov chain Monte Carlo (MCMC) algorithm that is based on a Markov transition density of the form $p(x|x')=\int_{\mathsf{Y}}f_{X|Y}(x|y)f_{Y|X}(y|x') dy$, where $f_{X|Y}$ and $f_{Y|X}$ are conditional densities. The PX-DA and marginal augmentation algorithms of Liu and Wu [J. Amer. Statist. Assoc. 94 (1999) 1264--1274] and Meng and van Dyk [Biometrika 86 (1999) 301--320] are alternatives to DA that often converge much faster and are only slightly more computationally demanding. The transition densities of these alternative algorithms can be written in the form $p_R(x|x')=\int_{\mathsf{Y}}\int _{\mathsf{Y}}f_{X|Y}(x|y')R(y,dy')f_{Y|X}(y|x') dy$, where $R$ is a Markov transition function on $\mathsf{Y}$. We prove that when $R$ satisfies certain conditions, the MCMC algorithm driven by $p_R$ is at least as good as that driven by $p$ in terms of performance in the central limit theorem and in the operator norm sense. These results are brought to bear on a theoretical comparison of the DA, PX-DA and marginal augmentation algorithms. Our focus is on situations where the group structure exploited by Liu and Wu is available. We show that the PX-DA algorithm based on Haar measure is at least as good as any PX-DA algorithm constructed using a proper prior on the group.

Abstract:
Let $\pi$ denote the intractable posterior density that results when the likelihood from a multivariate linear regression model with errors from a scale mixture of normals is combined with the standard non-informative prior. There is a simple data augmentation algorithm (based on latent data from the mixing density) that can be used to explore $\pi$. Hobert et al. (2015) [arXiv:1506.03113v1] recently performed a convergence rate analysis of the Markov chain underlying this MCMC algorithm in the special case where the regression model is univariate. These authors provide simple sufficient conditions (on the mixing density) for geometric ergodicity of the Markov chain. In this note, we extend Hobert et al.'s (2015) result to the multivariate case.

Abstract:
The data augmentation (DA) algorithm is a widely used Markov chain Monte Carlo algorithm that is easy to implement but often suffers from slow convergence. The sandwich algorithm is an alternative that can converge much faster while requiring roughly the same computational effort per iteration. Theoretically, the sandwich algorithm always converges at least as fast as the corresponding DA algorithm in the sense that $\Vert {K^*}\Vert \le \Vert {K}\Vert$, where $K$ and $K^*$ are the Markov operators associated with the DA and sandwich algorithms, respectively, and $\Vert\cdot\Vert$ denotes operator norm. In this paper, a substantial refinement of this operator norm inequality is developed. In particular, under regularity conditions implying that $K$ is a trace-class operator, it is shown that $K^*$ is also a positive, trace-class operator, and that the spectrum of $K^*$ dominates that of $K$ in the sense that the ordered elements of the former are all less than or equal to the corresponding elements of the latter. Furthermore, if the sandwich algorithm is constructed using a group action, as described by Liu and Wu [J. Amer. Statist. Assoc. 94 (1999) 1264--1274] and Hobert and Marchev [Ann. Statist. 36 (2008) 532--554], then there is strict inequality between at least one pair of eigenvalues. These results are applied to a new DA algorithm for Bayesian quantile regression introduced by Kozumi and Kobayashi [J. Stat. Comput. Simul. 81 (2011) 1565--1578].

Abstract:
Exploration of the intractable posterior distributions associated with Bayesian versions of the general linear mixed model is often performed using Markov chain Monte Carlo. In particular, if a conditionally conjugate prior is used, then there is a simple two-block Gibbs sampler available. Roman & Hobert (2015) showed that, when the priors are proper and the $X$ matrix has full column rank, the Markov chains underlying these Gibbs samplers are nearly always geometrically ergodic. In this paper, Roman & Hobert's (2015) result is extended by allowing improper priors on the variance components, and, more importantly, by removing all assumptions on the $X$ matrix. So, not only is $X$ allowed to be (column) rank deficient, which provides additional flexibility in parameterizing the fixed effects, it is also allowed to have more columns than rows, which is necessary in the increasingly important situation where $p > N$. The full rank assumption on $X$ is at the heart of Roman & Hobert's (2015) proof. Consequently, the extension to unrestricted $X$ requires a substantially different analysis.

Abstract:
Let X={X_n:n=0,1,2,...} be an irreducible, positive recurrent Markov chain with invariant probability measure \pi. We show that if X satisfies a one-step minorization condition, then \pi can be represented as an infinite mixture. The distributions in the mixture are associated with the hitting times on an accessible atom introduced via the splitting construction of Athreya and Ney [Trans. Amer. Math. Soc. 245 (1978) 493-501] and Nummelin [Z. Wahrsch. Verw. Gebiete 43 (1978) 309-318]. When the small set in the minorization condition is the entire state space, our mixture representation of \pi reduces to a simple formula, first derived by Breyer and Roberts [Methodol. Comput. Appl. Probab. 3 (2001) 161-177] from which samples can be easily drawn. Despite the fact that the derivation of this formula involves no coupling or backward simulation arguments, the formula can be used to reconstruct perfect sampling algorithms based on coupling from the past

Abstract:
We consider Gibbs and block Gibbs samplers for a Bayesian hierarchical version of the one-way random effects model. Drift and minorization conditions are established for the underlying Markov chains. The drift and minorization are used in conjunction with results from J. S. Rosenthal [J. Amer. Statist. Assoc. 90 (1995) 558-566] and G. O. Roberts and R. L. Tweedie [Stochastic Process. Appl. 80 (1999) 211-229] to construct analytical upper bounds on the distance to stationarity. These lead to upper bounds on the amount of burn-in that is required to get the chain within a prespecified (total variation) distance of the stationary distribution. The results are illustrated with a numerical example.

Abstract:
Consider a parametric statistical model $P(\mathrm{d}x|\theta)$ and an improper prior distribution $\nu(\mathrm{d}\theta)$ that together yield a (proper) formal posterior distribution $Q(\mathrm{d}\theta|x)$. The prior is called strongly admissible if the generalized Bayes estimator of every bounded function of $\theta$ is admissible under squared error loss. Eaton [Ann. Statist. 20 (1992) 1147--1179] has shown that a sufficient condition for strong admissibility of $\nu$ is the local recurrence of the Markov chain whose transition function is $R(\theta,\mathrm{d}\eta)=\int Q(\mathrm{d}\eta|x)P(\mathrm {d}x|\theta)$. Applications of this result and its extensions are often greatly simplified when the Markov chain associated with $R$ is irreducible. However, establishing irreducibility can be difficult. In this paper, we provide a characterization of irreducibility for general state space Markov chains and use this characterization to develop an easily checked, necessary and sufficient condition for irreducibility of Eaton's Markov chain. All that is required to check this condition is a simple examination of $P$ and $\nu$. Application of the main result is illustrated using two examples.

Abstract:
Bayesian analysis of data from the general linear mixed model is challenging because any nontrivial prior leads to an intractable posterior density. However, if a conditionally conjugate prior density is adopted, then there is a simple Gibbs sampler that can be employed to explore the posterior density. A popular default among the conditionally conjugate priors is an improper prior that takes a product form with a flat prior on the regression parameter, and so-called power priors on each of the variance components. In this paper, a convergence rate analysis of the corresponding Gibbs sampler is undertaken. The main result is a simple, easily-checked sufficient condition for geometric ergodicity of the Gibbs-Markov chain. This result is close to the best possible result in the sense that the sufficient condition is only slightly stronger than what is required to ensure posterior propriety. The theory developed in this paper is extremely important from a practical standpoint because it guarantees the existence of central limit theorems that allow for the computation of valid asymptotic standard errors for the estimates computed using the Gibbs sampler.

Abstract:
The reversible Markov chains that drive the data augmentation (DA) and sandwich algorithms define self-adjoint operators whose spectra encode the convergence properties of the algorithms. When the target distribution has uncountable support, as is nearly always the case in practice, it is generally quite difficult to get a handle on these spectra. We show that, if the augmentation space is finite, then (under regularity conditions) the operators defined by the DA and sandwich chains are compact, and the spectra are finite subsets of $[0,1)$. Moreover, we prove that the spectrum of the sandwich operator dominates the spectrum of the DA operator in the sense that the ordered elements of the former are all less than or equal to the corresponding elements of the latter. As a concrete example, we study a widely used DA algorithm for the exploration of posterior densities associated with Bayesian mixture models [J. Roy. Statist. Soc. Ser. B 56 (1994) 363--375]. In particular, we compare this mixture DA algorithm with an alternative algorithm proposed by Fr\"{u}hwirth-Schnatter [J. Amer. Statist. Assoc. 96 (2001) 194--209] that is based on random label switching.

Abstract:
The errors in a standard linear regression model are iid with common density $\frac{1}{\sigma} \phi \big( \frac{\varepsilon}{\sigma} \big)$, where $\sigma$ is an unknown scale parameter, and $\phi(\cdot)$ is the standard normal density. In situations where Gaussian errors are inappropriate, e.g., when the data contain outliers, $\phi(z)$ is often replaced by a scale mixture of Gaussians, i.e, by $f(z) = \int_0^\infty \sqrt{u} \, \phi(\sqrt{u} z) \, h(u) \, du$, where $h(\cdot)$ is a density with support in $(0,\infty)$. Combining this alternative regression model with a default prior on the unknown parameters results in a highly intractable posterior density. Fortunately, there is a simple dataaugmentation (DA) algorithm that can be used to explore this posterior. This paper provides conditions (on $h$) for geometric ergodicity of the Markov chain underlying this MCMC algorithm. These results are extremely important from a practical standpoint because geometric ergodicity guarantees the existence of the central limit theorems that form the basis of all the standard methods of calculating valid asymptotic standard errors for MCMC-based estimators. The main result is that, if $h$ converges to 0 at the origin at an appropriate rate, and $\int_0^\infty \sqrt{u} \, h(u) \, du < \infty$, then the corresponding DA Markov chain is geometrically ergodic. This result is quite far-reaching. For example, it implies the geometric ergodicity of the DA Markov chain whenever $h$ is generalized inverse Gaussian, log-normal, Fr\'{e}chet, or inverted gamma (with shape parameter larger than 1/2). The result also applies to certain subsets of the gamma, $F$, and Weibull families.