Abstract:
This paper describes a framework for flexible multiple hypothesis testing of autoregressive time series. The modeling approach is Bayesian, though a blend of frequentist and Bayesian reasoning is used to evaluate procedures. Nonparametric characterizations of both the null and alternative hypotheses will be shown to be the key robustification step necessary to ensure reasonable Type-I error performance. The methodology is applied to part of a large database containing up to 50 years of corporate performance statistics on 24,157 publicly traded American companies, where the primary goal of the analysis is to flag companies whose historical performance is significantly different from that expected due to chance.

Abstract:
This paper uses Bayesian tree models for statistical benchmarking in data sets with awkward marginals and complicated dependence structures. The method is applied to a very large database on corporate performance over the last four decades. The results of this study provide a formal basis for making cross-peer-group comparisons among companies in very different industries and operating environments. This is done by using models for Bayesian multiple hypothesis testing to determine which firms, if any, have systematically outperformed their peer groups over time. We conclude that systematic outperformance, while it seems to exist, is quite rare worldwide.

Abstract:
This paper considers the problem of using MCMC to fit sparse Bayesian models based on normal scale-mixture priors. Examples of this framework include the Bayesian LASSO and the horseshoe prior. We study the usefulness of parameter expansion (PX) for improving convergence in such models, which is notoriously slow when the global variance component is near zero. Our conclusion is that parameter expansion does improve matters in LASSO-type models, but only modestly. In most cases this improvement, while noticeable, is less than what might be expected, especially compared to the improvements that PX makes possible for models very similar to those considered here. We give some examples, and we attempt to provide some intuition as to why this is so. We also describe how slice sampling may be used to update the global variance component. In practice, this approach seems to perform almost as well as parameter expansion. As a practical matter, however, it is perhaps best viewed not as a replacement for PX, but as a tool for expanding the class of models to which PX is applicable.

Abstract:
We propose a new algorithm for solving the graph-fused lasso (GFL), a method for parameter estimation that operates under the assumption that the signal tends to be locally constant over a predefined graph structure. Our key insight is to decompose the graph into a set of trails which can then each be solved efficiently using techniques for the ordinary (1D) fused lasso. We leverage these trails in a proximal algorithm that alternates between closed form primal updates and fast dual trail updates. The resulting techinque is both faster than previous GFL methods and more flexible in the choice of loss function and graph structure. Furthermore, we present two algorithms for constructing trail sets and show empirically that they offer a tradeoff between preprocessing time and convergence rate.

Abstract:
We present a family of expectation-maximization (EM) algorithms for binary and negative-binomial logistic regression, drawing a sharp connection with the variational-Bayes algorithm of Jaakkola and Jordan (2000). Indeed, our results allow a version of this variational-Bayes approach to be re-interpreted as a true EM algorithm. We study several interesting features of the algorithm, and of this previously unrecognized connection with variational Bayes. We also generalize the approach to sparsity-promoting priors, and to an online method whose convergence properties are easily established. This latter method compares favorably with stochastic-gradient descent in situations with marked collinearity.

Abstract:
This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham's-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains.

Abstract:
This paper proposes an empirical test of financial contagion in European equity markets during the tumultuous period of 2008-2011. Our analysis shows that traditional GARCH and Gaussian stochastic-volatility models are unable to explain two key stylized features of global markets during presumptive contagion periods: shocks to aggregate market volatility can be sudden and explosive, and they are associated with specific directional biases in the cross-section of country-level returns. Our model repairs this deficit by assuming that the random shocks to volatility are heavy-tailed and correlated cross-sectionally, both with each other and with returns. The fundamental conclusion of our analysis is that great care is needed in modeling volatility if one wishes to characterize the relationship between volatility and contagion that is predicted by economic theory. In analyzing daily data, we find evidence for significant contagion effects during the major EU crisis periods of May 2010 and August 2011, where contagion is defined as excess correlation in the residuals from a factor model incorporating global and regional market risk factors. Some of this excess correlation can be explained by quantifying the impact of shocks to aggregate volatility in the cross-section of expected returns - but only, it turns out, if one is extremely careful in accounting for the explosive nature of these shocks. We show that global markets have time-varying cross-sectional sensitivities to these shocks, and that high sensitivities strongly predict periods of financial crisis. Moreover, the pattern of temporal changes in correlation structure between volatility and returns is readily interpretable in terms of the major events of the periods in question.

Abstract:
We use the theory of normal variance-mean mixtures to derive a data-augmentation scheme for a class of common regularization problems. This generalizes existing theory on normal variance mixtures for priors in regression and classification. It also allows variants of the expectation-maximization algorithm to be brought to bear on a wider range of models than previously appreciated. We demonstrate the method on several examples, including sparse quantile regression and binary logistic regression. We also show that quasi-Newton acceleration can substantially improve the speed of the algorithm without compromising its robustness.

Abstract:
This paper argues that the half-Cauchy distribution should replace the inverse-Gamma distribution as a default prior for a top-level scale parameter in Bayesian hierarchical models, at least for cases where a proper prior is necessary. Our arguments involve a blend of Bayesian and frequentist reasoning, and are intended to complement the original case made by Gelman (2006) in support of the folded-t family of priors. First, we generalize the half-Cauchy prior to the wider class of hypergeometric inverted-beta priors. We derive expressions for posterior moments and marginal densities when these priors are used for a top-level normal variance in a Bayesian hierarchical model. We go on to prove a proposition that, together with the results for moments and marginals, allows us to characterize the frequentist risk of the Bayes estimators under all global-shrinkage priors in the class. These theoretical results, in turn, allow us to study the frequentist properties of the half-Cauchy prior versus a wide class of alternatives. The half-Cauchy occupies a sensible 'middle ground' within this class: it performs very well near the origin, but does not lead to drastic compromises in other parts of the parameter space. This provides an alternative, classical justification for the repeated, routine use of this prior. We also consider situations where the underlying mean vector is sparse, where we argue that the usual conjugate choice of an inverse-gamma prior is particularly inappropriate, and can lead to highly distorted posterior inferences. Finally, we briefly summarize some open issues in the specification of default priors for scale terms in hierarchical models.

Abstract:
We develop a connection between mixture and envelope representations of objective functions that arise frequently in statistics. We refer to this connection using the term "hierarchical duality." Our results suggest an interesting and previously under-exploited relationship between marginalization and profiling, or equivalently between the Fenchel--Moreau theorem for convex functions and the Bernstein--Widder theorem for Laplace transforms. We give several different sets of conditions under which such a duality result obtains. We then extend existing work on envelope representations in several ways, including novel generalizations to variance-mean models and to multivariate Gaussian location models. This turns out to provide an elegant missing-data interpretation of the proximal gradient method, a widely used algorithm in machine learning. We show several statistical applications in which the proposed framework leads to easily implemented algorithms, including a robust version of the fused lasso, nonlinear quantile regression via trend filtering, and the binomial fused double Pareto model. Code for the examples is available on GitHub at https://github.com/jgscott/hierduals.